[Haskell-cafe] Ideas on a fast and tidy CSV library

Ben Gamari bgamari.foss at gmail.com
Tue Jul 23 17:45:18 CEST 2013


Justin Paston-Cooper <paston.cooper at gmail.com> writes:

> Dear All,
>
> Recently I have been doing a lot of CSV processing. I initially tried to
> use the Data.Csv (cassava) library provided on Hackage, but I found this to
> still be too slow for my needs. In the meantime I have reverted to hacking
> something together in C, but I have been left wondering whether a tidy
> solution might be possible to implement in Haskell.
>
Have you tried profiling your cassava implementation? In my experience
I've found it's quite quick. If you have an example of a slow path I'm
sure Johan (cc'd) would like to know about it.

> I would like to build a library that satisfies the following:
>
> 1) Run a function <<f :: a_1 -> ... -> a_n -> m (Maybe (b_1, ..., b_n))>>,
> with <<m>> some monad and the <<a>>s and <<b>>s being input and output.
>
> 2) Be able to specify a maximum record string length and output record
> string length, so that the string buffers used for reading and outputting
> lines can be reused, preventing the need for allocating new strings for
> each record.
>
> 3) Allocate only once, the memory where the parsed input values, and output
> values are put.
>
Ultimately this could be rather tricky to enforce. Haskell code
generally does a lot of allocation and the RTS is well optimized to
handle this.

I've often found that trying to shoehorn a non-idiomatic "optimal"
imperative approach into Haskell produces worse performance than the
more readable, idiomatic approach.

I understand this leaves many of your questions unanswered, but I'd give
the idiomatic approach a bit more time before trying to coerce C into
Haskell. Profile, see where the hotspots are and optimize
appropriately. If the profile has you flummoxed, the lists and #haskell
are always willing to help given the time.

Cheers,

- Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130723/eb3db1f9/attachment.pgp>


More information about the Haskell-Cafe mailing list