[Haskell-cafe] haskell wiki indexing

Jason Dagit dagit at codersbase.com
Thu Jun 7 22:04:28 EDT 2007


On 5/22/07, Robin Green <greenrd at greenrd.org> wrote:
> On Tue, 22 May 2007 15:05:48 +0100
> Duncan Coutts <duncan.coutts at worc.ox.ac.uk> wrote:
>
> > On Tue, 2007-05-22 at 14:40 +0100, Claus Reinke wrote:
> >
> > > so the situation for mailing lists and online docs seems to have
> > > improved, but there is still the wiki indexing/rogue bot issue,
> > > and lots of fine tuning (together with watching the logs to spot
> > > any issues arising out of relaxing those restrictions). perhaps
> > > someone on this list would be willing to volunteer to look into
> > > those robots/indexing issues on haskell.org?-)
> >
> > The main problem, and the reason for the original (temporary!) measure
> > was bots indexing all possible diffs between old versions of wiki
> > pages. URLs like:
> >
> > http://haskell.org/haskellwiki/?title=Quicksort&diff=9608&oldid=9607
> >
> > For pages with long histories this O(n^2) number of requests starts to
> > get quite large and the wiki engine does not seem well optimised for
> > getting arbitrary diffs. So we ended up with bots holding open many
> > http server connections. They were not actually causing much server
> > cpu load or generating much traffic but once the number of nearly hung
> > connections got up to the http child process limit then we are
> > effectively in a DOS situation.
> >
> > So if we can ban bots from the page histories or turn them off for the
> > bot user agents or something then we might have a cure. Perhaps we
> > just need to upgrade our media wiki software or find out how other
> > sites using this software deal with the same issue of bots reading
> > page histories.
>
> http://en.wikipedia.org/robots.txt
>
> Wikipedia uses URLs starting with /w/ for "dynamic" pages (well, all
> pages are dynamic in a sense, but you know what I mean I hope.) And
> then puts /w/ in robots.txt.

Does anyone know the status of applying a workaround such as this?  I
really miss being able to find things on the haskell wiki via google
search.  I don't like the mediawiki search at all.

I did a google search earlier tonight but I didn't get wiki pages so I
assume nothing has been done yet.  Please make the wiki indexed again
as soon as possible (if at all possible).  Otheriwise, I feel like
it's a waste of time to keep contributing to wiki pages.

Thanks,
Jason


More information about the Haskell-Cafe mailing list