[Haskell-cafe] Bytestrings vs String?

wren ng thornton wren at freegeek.org
Mon Feb 2 22:41:57 EST 2009


Marc Weber wrote:
> A lot of people are suggesting using Bytestrings for performance,
> strictness whatsoever reasons.
> 
> However how well do they talk to other libraries?

I'm not sure how you mean?

For passing them around: If someone's trying to combine your library 
(version using ByteStrings) and another Haskell library that uses 
ByteStrings, then everything works fine--- assuming both libraries are 
compiled against the same version of the bytestring library. As I 
recall, ByteStrings are designed to ease passing to C code across the 
FFI too, in case someone wants to use your library with some FFI C code. 
If someone's trying to combine your library with another library that 
uses String, they'll need to add conversions. (All of this is symmetric 
for a version of your library using String with another library using 
ByteStrings.)

The big compatibility issue I can see is the question of what a given 
ByteString *means*. In particular, via the Data.ByteString.Char8 module 
it encodes only ASCII characters, not all of Unicode like [Char] does. 
There are libraries for lossless encoding of [Char] into ByteStrings, 
but in general there can be encoding mismatch problems if, say, your 
library uses UTF8-encoded ByteStrings but the other library treats them 
like Char8-encoded (or UTF16BE, UTF16LE, FooBar,...), potentially 
mangling or hallucinating multi-byte characters.

In general, if you're concerned about performance (or believe your users 
will be) then ByteStrings are a good bet. Just make it clear in the 
documentation what sort of encoding you use (or whether your library is 
encoding agnostic).

For hslogger specifically, it looks like most of the Strings are 
arguments which will typically be written as literals. Thus, to minimize 
boilerplate, if you do switch to ByteStrings then you may want to 
provide a module that does all the String->ByteString conversions for 
the user. If you have a good program for testing real world use of 
hslogger, before committing to the change I'd suggest benchmarking (in 
time and in space) the differences between the current String 
implementation and a proposed ByteString implementation.


> Should there be two versions?
> 
> hslogger-bytestring and hslogger-string?

I'd just stick with one (with a module for hiding the conversions, as 
desired). Duplicating the code introduces too much room for maintenance 
and compatibility issues.

> Or would it be better to implement one String class which can cope
> with everthing (performance will drop, won't it?)

It'd be a very large class if you do it generally[1], and large classes 
like that are generally frowned on (for good or ill). If you only need a 
small subset of string operations then it may be more feasible to have a 
smaller class with only those operations.

[1] See everything hidden from the Prelude in 
http://hackage.haskell.org/packages/archive/list-extras/0.2.2.1/doc/html/src/Prelude-Listless.html 
or see what all is offered by Data.ByteString vs the Prelude.


> In the future I'd like to explore using haskell for web developement.
> So speed does matter. And I don't want my server to convert from
> Bytestrings to Strings and back multiple times..

That's the big thing. The more people that use ByteStrings the less need 
there is to convert when combining libraries. That said, ByteStrings 
aren't a panacea; lists and laziness are very useful.

-- 
Live well,
~wren


More information about the Haskell-Cafe mailing list