[Haskell-cafe] Why is this so inefficient?

Wed Feb 6 16:23:43 EST 2008

Jefferson Heard wrote:
> I thought this was fairly straightforward, but where the marked line
> finishes in 0.31 seconds on my machine, the actual transpose takes
> more than 5 minutes.  I know it must be possible to read data in
[snip]

> dataFromFile :: String -> IO (M.Map String [S.ByteString])
> dataFromFile filename = do
>     f <- S.readFile filename
>     print . length . map (S.split ',' $!) . S.lines $ f
>  -- finishes in 0.31 seconds

The S.split applications will never be evaluated - the list that you produce
is full of thunks of the form (S.split ',' $! <some bytestring>) The $! will
only take effect if those thunks are forced, and length doesn't do that. Try

    print . sum . map (length . S.split ',') . S.lines $ f

instead, to force S.split to produce a result. (In fact, S.split is strict
in its second argument, so the $! shouldn't have any effect on the running
time at all. I didn't measure that though.)

>     return . transposeCSV . map (S.split ',' $!) . S.lines $ f  --
> this takes 5 minutes and 6 seconds

HTH,

Bertram