[Haskell-cafe] Optimizing a title matcher

Brandon Moore brandonm at yahoo-inc.com
Wed Sep 27 14:10:25 EDT 2006


Ketil Malde wrote:
> Lyle Kopnicky <lists at qseep.net> writes:
>
>   
>>> If you have some other metric other than prefix in mind for partial
>>> matches, then things probably get a lot more complicated.  You're
>>> probably looking at calculating minimum distances in some
>>> feature-space, which calls for pretty sophisticated algorithms if
>>> you need good performance.
>>>       
>
>   
>> Yes, that's the kind of thing I'm looking at doing. Looking at edit
>> distance.
>>     
>
> Do you really need that to search for movie titles?  At any rate, an
> exact-match finite-map implementation is a good start - to get good
> performance, you probably will need to use some kind of index to
> reduce the amount of data to search exhaustively (all-against-all).
>
> For text searching I think it is effective to use an index that
> maps from words (so that looking up a word gives you all the movies
> with that word in the title). 
>
> -k
>   
If you really need to do edit distance, perhaps something like
"Low Distortion Embeddings for Edit Distance"
http://www.cs.ucla.edu/~rafail/PUBLIC/68.pdf
would help? This is just one of the first result from a search,
probably there are plenty of things out there.
Make up from any constant factors from your language choice
with advanced algorithms! (and any constant is getting
pretty small these days)

Brandon


More information about the Haskell-Cafe mailing list