[Haskell-cafe] Histogram creation

Bertram Felgenhauer bertram.felgenhauer at googlemail.com
Mon Nov 10 12:05:34 EST 2008


Alexey Khudyakov wrote:
> Hello!
> 
> I'm tryig to write efficient code for creating histograms. I have following
> requirements for it:
> 
> 1. O(1) element insertion
> 2. No reallocations. Thus in place updates are needed.
> 
> accumArray won't go because I need to fill a lot of histograms (hundrends)
> from vely long list of data (possibly millions of elements) and it will
> traverse input data for each histogram.

That's just not true, for GHC's implementation of accumArray at least,
which goes via the ST monad. It creates a mutable array, fills it,
traversing the input list exactly once, and finally freezes the array
and returns it. This is just what you suggest below.

If you still run into performance problems, try out unboxed arrays.

If that isn't enough, unsafeAccumArray from Data.Base may help.

I'd try both before using the ST monad directly.

> It seems that I need to use mutable array and/or ST monad or something else.
> Sadly both of them are tricky and difficult to understand. So good examples
> or any other ideas greatly appreciated.

http://www.haskell.org/haskellwiki/Shootout/Nsieve_Bits

perhaps. There must be better examples out there.

I can think of two common problems with mutable arrays and ST:

1) You need to specify a type signature for the array being created,
   because the compiler can't guess the MArray instance that you want.

   For example, from the shootout entry:

       arr <- newArray (0,m) False :: IO (IOUArray Int Bool)

   In ST, this is slightly trickier, because the phantom 's' type
   parameter has to be mirrord in the ST*Array type constructor. You
   can use scoped type variables, which allow you to write

       {-# LANGUAGE ScopedTypeVariables #-}
       import Control.Monad.ST
       import Data.Array.ST

       foo :: forall s . ST s ()
       foo = do
           arr <- newArray (0,42) False :: ST s (STUArray s Int Bool)
	   ...

   Alternatively you can define helper functions to specify just the
   part of the type signature that you care about.

       stuArray :: ST s (STUArray s i e) -> ST s (STUArray s i e)
       stuArray = id

       foo :: ST s ()
       foo = do
           arr <- stuArray $ newArray (0,42 :: Int) False
           ...

2) runST $ foo bar   doesn't work. You have to write  runST (foo bar)

But in the end it's just imperative array programming with a rather
verbose syntax -- you can do only one array access per statement, and
'readArray' and 'writeArray' are rather long names.

HTH,

Bertram


More information about the Haskell-Cafe mailing list