Proposal: Add {to,from}Strict conversion functions between strict and lazy ByteStrings

Duncan Coutts duncan.coutts at googlemail.com
Mon Nov 7 13:55:01 CET 2011


On Sat, 2011-10-29 at 12:05 +0200, Herbert Valerio Riedel wrote:
> I propose to add optimized {to,from}Strict conversion functions between
> strict and lazy ByteStrings to the Data.ByteString.Lazy API.
> 
> Discussion deadline: 2 weeks from now (12 November)

I see we're still before the deadline, but it seems like unanimous
support. I've added the functions. They'll be included in
bytestring-0.10.x.

Thanks Herbert and others who chimed in.

While I was at it, I also exported foldrChunks and foldlChunks so we now
match the Text API in this area.

As Don pointed out, we deliberately didn't include {to,from}Strict
functions to discourage people from converting back and forth, since
it's expensive. Since that has not proved popular I've just documented
it instead:

-- |/O(n)/ Convert a lazy 'ByteString' into a strict 'ByteString'.
--
-- Note that this is an /expensive/ operation that forces the whole lazy
-- ByteString into memory and then copies all the data. If possible, try to
-- avoid converting back and forth between strict and lazy bytestrings.
--
toStrict :: ByteString -> S.ByteString

BTW, I'm slightly sceptical of the benchmarks, it's using tiny chunk
sizes, in practice I don't expect the performance of toStrict to be much
different from B.concat . BL.toChunks.

Duncan

> = Current State =
> 
> The current Data.ByteString.Lazy API doesn't provide direct conversion
> functions to/from single strict ByteStrings
> 
> Currently, there are only `fromChunks` and `toChunks`, by which convert
> to/from a list of strict ByteStrings.
> 
> A possible reference implementation of the missing conversion functions
> is:
> 
> fromStrict = BL.fromChunks . (:[])
> 
> and
> 
> toStrict = B.concat . BL.toChunks
> 
> 
> == The Issues ==
> 
> The lack of `fromStrict`/`toStrict` in the Data.ByteString.Lazy API has
> the following issues:
> 
>  - Convenience: If the single-strict-bytestring conversion is often
> needed, one tends to define module- or package-local helper functions
> for convenience/readability to perform the desired conversion. This
> violates the DRY principle.
> 
>  - Principle of least suprise: Might be confusing to users new to
> `Data.ByteString.Lazy` why there is no direct conversion.
> 
>  - Symmetry with `Data.Text.Lazy` API which does provide such
> single-strict-text conversion functions (`fromStrict`/`toStrict`)
> 
>  - Performance: The above provided "naive" `toStrict` definition has a
> roughly 2 to 4 times higher overhead than a manually fused version
> (which was kindly provided by Bas van Dijk -- whom I'd like to thank for
> providing me with the optimized versions of toStrict and fromStrict) --
> see end of this mail for criterion benchmark code and results
> 
> 
> = Proposed Enhancement =
> 
> Enhance the Data.ByteString.Lazy API by adding the following conversion
> functions (suggestions for improvements are highly welcome):
> 
> -- see benchmark code at end of mail for the qualified imports
> 
> -- |/O(n)/ Convert a strict ByteString into a lazy ByteString.
> fromStrict :: B.ByteString -> BL.ByteString
> fromStrict = flip BLI.chunk BLI.Empty
> 
> -- |/O(n)/ Convert a lazy ByteString into a strict ByteString.
> toStrict :: BL.ByteString -> B.ByteString
> toStrict lb = BI.unsafeCreate len $ go lb
>   where
>     len = BLI.foldlChunks (\l sb -> l + B.length sb) 0 lb
> 
>     go  BLI.Empty                   _   = return ()
>     go (BLI.Chunk (BI.PS fp s l) r) ptr =
>         withForeignPtr fp $ \p -> do
>             BI.memcpy ptr (p `plusPtr` s) (fromIntegral l)
>             go r (ptr `plusPtr` l)
> 
> 
> 
> == Benchmark Code & Results ==
> 
> ------------------------------------------------------------------------
> {-# LANGUAGE OverloadedStrings #-}
> 
> import           Criterion
> import           Criterion.Main
> import qualified Data.ByteString               as B
> import qualified Data.ByteString.Internal      as BI
> import qualified Data.ByteString.Lazy          as BL
> import qualified Data.ByteString.Lazy.Internal as BLI
> import           Foreign.ForeignPtr
> import           Foreign.Ptr
> 
> toStrict1 :: BL.ByteString -> B.ByteString
> toStrict1 = B.concat . BL.toChunks
> 
> toStrict2 :: BL.ByteString -> B.ByteString
> toStrict2 lb = BI.unsafeCreate len $ go lb
>   where
>     len = BLI.foldlChunks (\l sb -> l + B.length sb) 0 lb
> 
>     go  BLI.Empty                   _   = return ()
>     go (BLI.Chunk (BI.PS fp s l) r) ptr =
>         withForeignPtr fp $ \p -> do
>             BI.memcpy ptr (p `plusPtr` s) (fromIntegral l)
>             go r (ptr `plusPtr` l)
> 
> 
> main :: IO ()
> main = do
>     let lbs1 = "abcdefghij"
>         lbs2 = BL.fromChunks (replicate 10 "abcdefghij")
>         lbs3 = BL.fromChunks (replicate 1000 "abcdefghij")
> 
>     -- force evaluation of lbs{1,2,3} and verify validity
>     print $ toStrict1 lbs1 == toStrict2 lbs1
>     print $ toStrict1 lbs2 == toStrict2 lbs2
>     print $ toStrict1 lbs3 == toStrict2 lbs3
> 
>     defaultMain
>         [ bgroup "toStrict"
>           [ bench "simple #1" $ whnf toStrict1 lbs1
>           , bench "simple #2" $ whnf toStrict1 lbs2
>           , bench "simple #3" $ whnf toStrict1 lbs3
> 
>           , bench "optimized #1" $ whnf toStrict2 lbs1
>           , bench "optimized #2" $ whnf toStrict2 lbs2
>           , bench "optimized #3" $ whnf toStrict2 lbs3
>           ]
>         ]
> 
> 
> {-
> 
> True
> True
> True
> warming up
> estimating clock resolution...
> mean is 2.302557 us (320001 iterations)
> found 2039 outliers among 319999 samples (0.6%)
>   1658 (0.5%) high severe
> estimating cost of a clock call...
> mean is 54.99870 ns (14 iterations)
> found 1 outliers among 14 samples (7.1%)
>   1 (7.1%) low mild
> 
> benchmarking toStrict/simple #1
> mean: 28.96077 ns, lb 28.89527 ns, ub 29.01562 ns, ci 0.950
> std dev: 305.8466 ps, lb 262.1008 ps, ub 345.6136 ps, ci 0.950
> 
> benchmarking toStrict/simple #2
> mean: 487.0739 ns, lb 486.7939 ns, ub 487.4713 ns, ci 0.950
> std dev: 1.699232 ns, lb 1.262363 ns, ub 2.457099 ns, ci 0.950
> 
> benchmarking toStrict/simple #3
> mean: 55.06322 us, lb 54.91370 us, ub 55.20236 us, ci 0.950
> std dev: 741.6239 ns, lb 656.3273 ns, ub 846.6403 ns, ci 0.950
> 
> 
> 
> benchmarking toStrict/optimized #1
> mean: 48.67522 ns, lb 48.65188 ns, ub 48.70237 ns, ci 0.950
> std dev: 129.3192 ps, lb 111.3761 ps, ub 165.4819 ps, ci 0.950
> 
> benchmarking toStrict/optimized #2
> mean: 178.6342 ns, lb 178.5480 ns, ub 178.7276 ns, ci 0.950
> std dev: 457.4436 ps, lb 409.2746 ps, ub 519.8267 ps, ci 0.950
> 
> benchmarking toStrict/optimized #3
> mean: 13.01866 us, lb 13.00734 us, ub 13.03549 us, ci 0.950
> std dev: 70.09916 ns, lb 52.18012 ns, ub 97.77226 ns, ci 0.950
> 
> -}
> 
> 
> 
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries





More information about the Libraries mailing list