[Haskell-cafe] hamming distance allocation

Arnoldo Muller arnoldomuller at gmail.com
Mon Apr 19 11:14:58 EDT 2010


Hello John:

Well I could use a packed type. The only letters that will be found in the
string are "ATCG" so yeah I don't need unicode and those things.

Will try out with vector or ByteString. Thanks! :)

On Mon, Apr 19, 2010 at 2:37 PM, John Lato <jwlato at gmail.com> wrote:

> > Subject: Re: [Haskell-cafe] hamming distance allocation
> >
> > Am Montag 19 April 2010 01:03:14 schrieb Arnoldo Muller:
> >> Hello all:
> >>
> >> I want to generate some hamming distance statistics about a set of
> >> strings. As explained in another e-mail in this list, I used the
> >> following code to call the
> >> functions:
> >> (exampl holds the list of strings of size w)
> >> filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- exampl,
> >> ys <- exampl]
> >>
> >> I have two hamming functions:
> >> -- hamming distance for variable length strings
> >> hamming :: String -> String -> Int
> >> hamming x y = hamming' x y 0
> >>     where
> >>       hamming' [] _ !c = c
> >>       hamming' _ [] !c = c
> >>       hamming' (x:xs) (y:ys) !c
> >>           | x == y = hamming' xs ys c
> >>           | otherwise = hamming' xs ys (c + 1)
> >>
> >> -- function posted in this mailing list
> >> hamming2 :: String -> String -> Int
> >> hamming2 xs ys = length (filter not (zipWith (==) xs ys))
> >>
> >> I am executing these functions millions of times and the bottleneck of
> >> my program is in them as explained by running in profiling mode with
> >> +RTS -K400M -p -RTS
> >>
> >> The costlier function is the hamming distance
> >> COST CENTRE                    MODULE               %time %alloc
> >>
> >> hamming                        Distances             66.6   41.9
> >>
> >> It says that it is performing 41% of the allocations. In the case of
> >> hamming2 the allocations go as far as 52%.
> >
> > Allocations are cheap, so that's not necessarily a problem. More
> important
> > is, what's the maximum residency and how much is copied during GC?
> > Are you compiling with -O2 ?
> >
> >> I could understand that
> >> there are allocations in "hamming2" because we are creating pairs, but
> >> in the case of "hamming" there should be no allocation.
> >
> > Why not? I don't know how GHC counts allocations, but everytime you go
> from
> > (x:xs) to xs, you need a new pointer to the tail. If that counts as
> > allocation, hamming must allocate a lot, too.
>
> Is it really necessary to use Strings?  I think a packed type, e.g.
> Vector or ByteString, would be much more efficient here.  Of course
> this is only likely to be a benefit if you can move away from String
> entirely.
>
> I suspect that "hamming2" would perform better then.
>
> John
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100419/7907eb97/attachment.html


More information about the Haskell-Cafe mailing list