[Haskell-cafe] attoparsec and vectors

Roman Leshchinskiy rl at cse.unsw.edu.au
Wed Jun 29 10:16:26 CEST 2011


Gregory Collins wrote:
> On Tue, Jun 28, 2011 at 6:20 PM, Eric Rasmussen <ericrasmussen at gmail.com>
> wrote:
>
>> It runs quickly, but I naively thought I could outperform it by
>> reworking "many" to build a vector directly, instead of having to build
>> a list first and then convert it to a vector:
>>
>> manyVec :: Alternative f => f a -> f (V.Vector a) manyVec v = many_v  
>> where many_v = some_v <|> pure V.empty         some_v = V.cons
>> <$> v <*> many_v
>>
>
> That's an O(n^2) loop, and a thunk leak to boot. If you don't know the
> size of the vector ahead of time, the only way I can think of to beat
> Vector.fromList is to use a mutable vector with a "highwater" mark,
> and double the size if you fill it. At the end, you'd use "unsafeFreeze" to
> turn the mutable vector into a pure one, and "unsafeTake" to truncate the
> vector into the correct size.

That's basically what fromList does. You could do this at a higher
abstraction level by generating a Stream rather than a list and then using
unstream to create a vector. I don't know if it's possible to do that with
attoparsec. But you'd only save allocating and deallocating a lazily
consumed list anyway. I'm not sure if it will be even noticable compared
to how much parsing costs.

> For an example of a similar technique (minus the freezing part), I did
> a similar thing in the hashtables library:

You might be interested in 'grow' :-)

http://hackage.haskell.org/packages/archive/vector/0.7.1/doc/html/Data-Vector-Generic-Mutable.html#g:8

Roman






More information about the Haskell-Cafe mailing list