[Haskell-beginners] Data.Binary.Get for large files

Daniel Fischer daniel.is.fischer at web.de
Fri Apr 30 18:10:48 EDT 2010


Am Freitag 30 April 2010 23:06:07 schrieb Philip Scott:
> Hi Daniel
>
> > Replace getFloat64le with e.g. getWord64le to confirm.
> > The reading of IEEE754 floating point numbers seems rather
> > complicated. Maybe doing it differently could speed it up, maybe not.
>
> That speeds things up by a factor of about 100 :)

Yes, I too.

>
> I think there must be some efficiency to be extracted from there
> somewhere.. Either the IEEE module

Look at the code. It does a lot of hard work. That is probably necessary to 
ensure correctness, but it's sloow.

If you feel like playing with fire,

{-# LANGUAGE BangPatterns, MagicHash #-}
import qualified Data.ByteString.Lazy as BL
import Data.Binary
import Data.Binary.Get
import GHC.Prim
import Data.Word

getFloat64le :: Get Double
getFloat64le = fmap unsafeCoerce# getWord64le

myGetter !acc = do
     e <- isEmpty
     if e then
             return acc
         else do
             !t <- getFloat64le
             myGetter ((t+acc)/2)

may work on your system (no warranties, you know what 'unsafe' means, don't 
you?).

> or the Data.Binary.Get.

Considering that it's quick enough getting Word64, you won't get much 
improvement there.

>
> Is it possible to get the profiler to look deeper than the top level
> module?

Lots of {-# SCC "foo" #-} pragmas. Or create a local copy of the module and 
import that, then -prof -auto-all should give more info.

> With all the options I could find, it only ever tells me about
> things in the file I am dealing with..Hm, 200MB file => ~25 million
> Doubles, such a list needs at least 400MB.
>
> > Still a long way to 2GB. I suspect you construct a list of thunks, not
> > Doubles.
>
> I think you are almost certainly right. Is there an easy way to see
> if/how/where this is happening?

Read the core, profile with all -hx flags and look at the profiles, show 
the code to more experienced Haskellers.

>
> Thanks once again,
>
> Philip



More information about the Beginners mailing list