[Haskell-cafe] NLP libraries and tools?

Rogan Creswick creswick at gmail.com
Fri Jul 1 19:34:10 CEST 2011


On Fri, Jul 1, 2011 at 3:31 AM, Dmitri O.Kondratiev <dokondr at gmail.com> wrote:
> Hi,
> Please advise on NLP libraries similar to Natural Language Toolkit

There is a (slowly?) growing NLP community for haskell over at:

http://projects.haskell.org/nlp/

The nlp mailing list may be a better place to ask for details.  To the
best of my knowledge, most of the NLTK / OpenNLP capabilities have yet
to be implemented/ported to Haskell, but there are some packages to
take a look at on Hackage.

> First of all I need:
> - tools to construct 'bag of words'
> (http://en.wikipedia.org/wiki/Bag_of_words_model), which is a list of words
> in the
> article.

This is trivially implemented if you have a natural language tokenizer
you're happy with.

Toktok might be worth looking at:
http://hackage.haskell.org/package/toktok but I *think* it takes a
pretty simple view of tokens (assume it is the tokenizer I've been
using within the GF).

Eric Kow (?) has a tokenizer implementation, which I can't seem to
find at the moment - if I recall correctly, it is also very simple,
but it would be a great place to implement a more complex tokenizer :)

> - tools to prune common words, such as prepositions and conjunctions, as
> well as extremely rare words, such as the ones with typos.

I'm not sure what you mean by 'prune'.  Are you looking for a stopword
list to remove irrelevant / confusing words from something like a
search query? (that's not hard to do with a stemmer and a set)

> - stemming tools

There is an implementation of the porter stemmer on Hackage:

 - http://hackage.haskell.org/package/porter

> - Naive Bayes classifier

I'm not aware of a general-purpose bayesian classifier lib. for
haskell, but it *would* be great to have :)  There are probably some
general-purpose statistical packages that I'm unaware of that offer a
larger set of capabilities...

> - SVM classifier

There are a few of these.  Take a look at the AI category on hackage:

 - http://hackage.haskell.org/packages/archive/pkg-list.html#cat:ai

--Rogan

> -  k-means clustering



More information about the Haskell-Cafe mailing list