[Haskell-cafe] serializing large data structures, stack overflow

friggin friggin frigginfriggins at gmail.com
Sat Mar 7 14:57:04 EST 2009


I'm playing around with Netflix, implementing a simple KNN-algorithm, I will
later try SVD which seems to be the most successful approach.

Using a database like Postgresqk is to slow so I want to serialize a
datastructure containing the ratings. I'm not sure about the
representation I will use just yet, if I should use multiple arrays or an
Map/IntMap.

However I tried Data.Binary and already for small sizes I get stack overflow
when deserializing.
The serializing works fine but when bringing it back it overflows.
How can I solve this? This is just 2MB, I will eventually need soemthing
like 2-500MB to store everything depending on what representatin I choose.

module Serialize where
import qualified Data.Binary as B
import qualified Data.Binary.Put as P
import qualified Data.Map as M
import qualified Data.List as L

genTest :: Int -> M.Map (Int,Int) Int
genTest n = let movies = take n $ repeat 1
                grades = take n $ repeat 4 in
            M.fromList $ ([1..n] `zip` movies) `zip` grades

main = do
  let a = genTest 50000
  B.encodeFile "C:/users/saftarn/desktop/bintest.txt" a
  print "Success"

dec = B.decodeFile "C:/users/saftarn/desktop/bintest.txt" >>= \a ->
      return $ (a :: M.Map (Int,Int) Int)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20090307/3bf22223/attachment.htm


More information about the Haskell-Cafe mailing list