[web-devel] Data.Word8 (word8 library)

Michael Snoyman michael at snoyman.com
Thu Sep 20 17:01:03 CEST 2012


Well... let's test it out:

benchmarking Char8
mean: 333.0050 us, lb 329.2846 us, ub 336.2362 us, ci 0.950
std dev: 17.73400 us, lb 15.69876 us, ub 19.45947 us, ci 0.950
variance introduced by outliers: 51.452%
variance is severely inflated by outliers

benchmarking Char8 toLowerC
mean: 117.1571 us, lb 116.8739 us, ub 117.4219 us, ci 0.950
std dev: 1.394150 us, lb 1.189928 us, ub 1.649276 us, ci 0.950

benchmarking Word8
mean: 41.01667 us, lb 40.94708 us, ub 41.09468 us, ci 0.950
std dev: 378.4175 ns, lb 335.4655 ns, ub 462.6281 ns, ci 0.950

benchmarking bsToLower
mean: 37.37589 us, lb 37.24453 us, ub 37.48697 us, ci 0.950
std dev: 616.5653 ns, lb 513.7510 ns, ub 752.8996 ns, ci 0.950
found 9 outliers among 100 samples (9.0%)
  3 (3.0%) low severe
  4 (4.0%) low mild
  2 (2.0%) high mild
variance introduced by outliers: 9.426%
variance is slightly inflated by outliers

So a specialized `Char -> Char` function helps, but doesn't completely
close the performance gap. (Updates at the same gist[1].)

I disagree with a problem with an extra package: this is such a
low-level detail that average users don't need to really be aware of
the existence of the package, and I think the marginal increase in
compile times shouldn't cause any issues. I used to worry much more
about adding extra packages to the mix, but with the more recent
versions of cabal-install and the community's general improvement in
handling dependency hell, I see less of a reason to do so.

That said, I think having specialized toLower/toUpper in a central
place- perhaps even bytestring itself- would be a good thing.

Michael

[1] https://gist.github.com/3756212

On Thu, Sep 20, 2012 at 5:47 PM, Gregory Collins
<greg at gregorycollins.net> wrote:
> This is, of course, not an apples-to-apples test:
>
> Prelude Data.Char> toUpper 'χ'
> '\935'
> Prelude Data.Char> putStrLn ('\935':[])
> Χ
>
>
> ...which I suppose is the point. I wonder whether a version of
> toUpper/toLower on Char restricted to ASCII values would have the same
> performance here.
>
> We only call toLower explicitly in one place in snap-server, but where this
> would be nice to fix is for HTTP headers, where I think we are all using
> case-insensitive (which just calls "map toLower"). Probably we should send
> Bas a patch to optimize the FoldCase instance for ByteString.
>
> Personally I would prefer not to have yet another tiny package here, as the
> package zoo has enough creatures in it as it is. Do we think we have a real
> problem here beyond the toUpper/toLower case? I suspect that for most other
> uses of Data.ByteString.Char8 the conversion is a no-op.
>
> G
>
> On Thu, Sep 20, 2012 at 4:17 PM, Michael Snoyman <michael at snoyman.com>
> wrote:
>>
>> On Thu, Sep 20, 2012 at 2:10 PM, Michael Snoyman <michael at snoyman.com>
>> wrote:
>> > On Thu, Sep 20, 2012 at 11:41 AM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:
>> >> Hello,
>> >>
>> >> ByteString is an array of Word8 but it seems to me that people tend to
>> >> use the Char interface with Data.ByteString.Char8 instead of Word8
>> >> interface with Data.ByteString. Since the functions defined in
>> >> Data.ByteString.Char8 converts Word8 to Char and Char to Word8, it has
>> >> unnecessary overhead. Yes, the overhead is ignorable in many cases,
>> >> but I would like to remove it for high performance server.
>> >>
>> >> Why do people use Data.ByteString.Char8? I guess that there are two
>> >> reasons:
>> >>
>> >> - There are no standard utility functions for Word8 such as "isUpper"
>> >> - Numeric literal (e.g 72 for 'H') is not readable
>> >>
>> >> To fix these problems, I implemented the Data.Word8 module and
>> >> uploaded the word8 library to Hackage:
>> >>
>> >>
>> >> http://hackage.haskell.org/packages/archive/word8/0.0.0/doc/html/Data-Word8.html
>> >>
>> >> If Michael and Bas like this, I would like to modify warp and
>> >> case-insensitive to use the word8 library. What do people think this?
>> >>
>> >> My concern is that character names start with "_". Some people would
>> >> dislike this convention. But I have not a better idea at this moment.
>> >> Suggestions are welcome.
>> >>
>> >> --Kazu
>> >>
>> >> _______________________________________________
>> >> web-devel mailing list
>> >> web-devel at haskell.org
>> >> http://www.haskell.org/mailman/listinfo/web-devel
>> >
>> > Sounds good to me. I put together a simple benchmark to compare the
>> > performance of toLower, and the results are encouraging:
>> >
>> > benchmarking Char8
>> > mean: 38.04527 us, lb 37.94080 us, ub 38.12774 us, ci 0.950
>> > std dev: 470.9770 ns, lb 364.8254 ns, ub 748.3015 ns, ci 0.950
>> >
>> > benchmarking Word8
>> > mean: 4.807265 us, lb 4.798199 us, ub 4.816563 us, ci 0.950
>> > std dev: 47.20958 ns, lb 41.51181 ns, ub 55.07049 ns, ci 0.950
>> >
>> > I want to try throwing one more idea into the mix, I'll post with
>> > updates when I have them.
>> >
>> > So to answer your question: I'd be happy to include word8 in warp :).
>> >
>> > Michael
>> >
>> >
>> > {-# LANGUAGE OverloadedStrings #-}
>> > import Criterion.Main
>> > import qualified Data.ByteString as S
>> > import qualified Data.ByteString.Char8 as S8
>> > import qualified Data.Char
>> > import qualified Data.Word8
>> >
>> > main :: IO ()
>> > main = do
>> >     input <- S.readFile "bench.hs"
>> >     defaultMain
>> >         [ bench "Char8" $ whnf (S.length . S8.map Data.Char.toLower)
>> > input
>> >         , bench "Word8" $ whnf (S.length . S.map Data.Word8.toLower)
>> > input
>> >         ]
>>
>> I tried implementing a more low-level approach to try and avoid the
>> Word8 boxing. The results improved a bit, but not significantly:
>>
>>
>> benchmarking Char8
>> mean: 318.2341 us, lb 314.5367 us, ub 320.4834 us, ci 0.950
>> std dev: 14.48230 us, lb 10.00946 us, ub 21.22126 us, ci 0.950
>> found 9 outliers among 100 samples (9.0%)
>>   8 (8.0%) low severe
>> variance introduced by outliers: 43.472%
>> variance is moderately inflated by outliers
>>
>> benchmarking Word8
>> mean: 35.79037 us, lb 35.66547 us, ub 35.92601 us, ci 0.950
>> std dev: 665.5299 ns, lb 599.3413 ns, ub 741.6474 ns, ci 0.950
>> variance introduced by outliers: 11.349%
>> variance is moderately inflated by outliers
>>
>> benchmarking bsToLower
>> mean: 31.49299 us, lb 31.32314 us, ub 31.65027 us, ci 0.950
>> std dev: 835.2251 ns, lb 744.4337 ns, ub 946.1789 ns, ci 0.950
>> variance introduced by outliers: 20.925%
>> variance is moderately inflated by outliers
>>
>> Perhaps someone with more experience with this level of optimization
>> would be able to improve the algorithm:
>>
>> https://gist.github.com/3756212
>>
>> Michael
>>
>> _______________________________________________
>> web-devel mailing list
>> web-devel at haskell.org
>> http://www.haskell.org/mailman/listinfo/web-devel
>
>
>
>
> --
> Gregory Collins <greg at gregorycollins.net>



More information about the web-devel mailing list