[Haskell] Spam on HaskellWiki

Sat Dec 20 21:02:20 EST 2008

 > This is beginning to annoy people. Actually, someone registered several 
 > thousand accounts (of the form XxxxxXxxxx), though almost all of them 
 > have not been used. The others have been used to add spam.

For almost 2 years I have been working with Fidelis Assis to adapt his
email spam filter OSBF-Lua to broader purposes.  We would love to see
if it is possible to detect Wiki spam.  I am sorry to say that none of
the code is written in Haskell :-)

OSBF-Lua uses machine learning and probably requires on the order of
100 samples each of ham and spam before it starts to be useful (on
email).  If you have samples, especially if they are tagged with
username and IP address, please send them and I will run an experiment
and let you know if we can help.

Our tool divides messages into three classes:

  Confidently ham
  Confidently spam
  Low confidence

For the tool to work, a significant fraction of messages with low
confidence need to be trained by a person.  A major engineering
question is who gets training privileges: there need to be enough
people so that training is not burdensome, yet few enough so that one
doesn't grant training privileges to spammers.

Fidelis and I can think about adding some sort of audit trail that
would make it possible undo all trainings done by a given user, for
example. 

Norman