[Haskell-cafe] Re: Data.Binary stack overflow with Data.Sequence String

ChrisK haskell at list.mightyreason.com
Wed Mar 4 05:08:19 EST 2009


I have collected some of the backing code to "decode".  This is all pasted below 
so we can look at it.  I will not improperly guess at the cause of the problem, 
and be totally wrong.

I observe Get is a lazy monad:

Prelude Data.Binary Data.Binary.Get Data.Monoid> "World" == runGet ((return $! 
undefined) >> return "World") mempty
True

Prelude Data.Binary Data.Binary.Get Data.Monoid> 'W' == head (runGet ((return $! 
undefined) >>= \t -> return $! ('W':t))  mempty)
True

Prelude Data.Binary Data.Binary.Get Data.Monoid> "orld" == tail (runGet ((return 
$! undefined) >>= \h -> return $! (h:"orld"))  mempty)
True

This may have implication for building the "String" from "Char".  The get for 
"Char" uses "return $! char" but this is no good unless the Char is being 
forced, as the (return $! undefined) above shows.

The instance Get [a] inherits the laziness of replicateM which is sequence.

The instance (Seq.Seq e) does not force the "x :: String" value.  And even if it 
did it would only force the leading (:) cons cell and not the characters 
themselves.  The instance is strict in what passes for the spine of the Seq, not 
the contents, and certainly not the deep contents.

You might try using "newtype" when deserializing ArticleDB and make a much 
stricter version of the code.

All the relevant code (yi & binary & ghc):

> type Article = String
> type ArticleDB = Seq Article
> 
> -- | Read in database from 'dbLocation' and then parse it into an 'ArticleDB'.
> readDB :: YiM ArticleDB
> readDB = io $ (dbLocation >>= r) `catch` (\_ -> return empty)
>           where r x = fmap (decode . BL.fromChunks . return) $ B.readFile x
>                 -- We read in with strict bytestrings to guarantee the file is closed,
>                 -- and then we convert it to the lazy bytestring data.binary expects.
>                 -- This is inefficient, but alas...
> 
> decode :: Binary a => ByteString -> a
> decode = runGet get
> 
> instance (Binary e) => Binary (Seq.Seq e) where
>     put s = put (Seq.length s) >> Fold.mapM_ put s
>     get = do n <- get :: Get Int
>              rep Seq.empty n get
>       where rep xs 0 _ = return $! xs
>             rep xs n g = xs `seq` n `seq` do
>                            x <- g
>                            rep (xs Seq.|> x) (n-1) g
> 
> instance Binary Int where
>     put i   = put (fromIntegral i :: Int64)
>     get     = liftM fromIntegral (get :: Get Int64)
> 
> instance Binary Int64 where
>     put i   = put (fromIntegral i :: Word64)
>     get     = liftM fromIntegral (get :: Get Word64)
> 
> instance Binary Word64 where
>     put     = putWord64be
>     get     = getWord64be
> 
> instance Binary a => Binary [a] where
>     put l  = put (length l) >> mapM_ put l
>     get    = do n <- get :: Get Int
>                 replicateM n get
> 
> -- Char is serialised as UTF-8
> instance Binary Char where
>     put a | c <= 0x7f     = put (fromIntegral c :: Word8)
>           | c <= 0x7ff    = do put (0xc0 .|. y)
>                                put (0x80 .|. z)
>           | c <= 0xffff   = do put (0xe0 .|. x)
>                                put (0x80 .|. y)
>                                put (0x80 .|. z)
>           | c <= 0x10ffff = do put (0xf0 .|. w)
>                                put (0x80 .|. x)
>                                put (0x80 .|. y)
>                                put (0x80 .|. z)
>           | otherwise     = error "Not a valid Unicode code point"
>      where
>         c = ord a
>         z, y, x, w :: Word8
>         z = fromIntegral (c           .&. 0x3f)
>         y = fromIntegral (shiftR c 6  .&. 0x3f)
>         x = fromIntegral (shiftR c 12 .&. 0x3f)
>         w = fromIntegral (shiftR c 18 .&. 0x7)
> 
>     get = do
>         let getByte = liftM (fromIntegral :: Word8 -> Int) get
>             shiftL6 = flip shiftL 6 :: Int -> Int
>         w <- getByte
>         r <- case () of
>                 _ | w < 0x80  -> return w
>                   | w < 0xe0  -> do
>                                     x <- liftM (xor 0x80) getByte
>                                     return (x .|. shiftL6 (xor 0xc0 w))
>                   | w < 0xf0  -> do
>                                     x <- liftM (xor 0x80) getByte
>                                     y <- liftM (xor 0x80) getByte
>                                     return (y .|. shiftL6 (x .|. shiftL6
>                                             (xor 0xe0 w)))
>                   | otherwise -> do
>                                 x <- liftM (xor 0x80) getByte
>                                 y <- liftM (xor 0x80) getByte
>                                 z <- liftM (xor 0x80) getByte
>                                 return (z .|. shiftL6 (y .|. shiftL6
>                                         (x .|. shiftL6 (xor 0xf0 w))))
>         return $! chr r
> 
> 
> replicateM        :: (Monad m) => Int -> m a -> m [a]
> replicateM n x    = sequence (replicate n x)
> 
> sequence       :: Monad m => [m a] -> m [a] 
> {-# INLINE sequence #-}
> sequence ms = foldr k (return []) ms
>             where
>               k m m' = do { x <- m; xs <- m'; return (x:xs) }

-- 
Chris



More information about the Haskell-Cafe mailing list