ByteString I/O Performance

Peter Simons simons at cryp.to
Wed Sep 5 15:30:02 EDT 2007


Hey Apfelmus,

I have to apologize for being overly sensitive. I had a couple of rough
days and am easily frustrated at the moment. That is not your fault. I
am sorry.

Your illustrate example is very good. It helped me to see more clearly
the point I've been trying to make but couldn't quite articulate.

 >    peek :: Ptr Word8 -> Word8           -- :(

The ByteString package offers a number of pure functions to manipulate
the underlying buffer. Some of them -- like 'take' and 'drop' -- are by
all means supposed to be pure, because they manipulate merely the base
pointer, the offset, or the size. Those function depend only on the
value of ByteString, not on the memory it references.

Then there is this function:

  index :: ByteString -> Int -> Word8

This function does earn one of those inverse smilies. Personally, I
would not have provided a dereferencing operation outside of the IO
monad. My personal opinion is that an monadic 'index' would have been
ever so slightly less convenient, but it would be far more robust than
the function above.

As far as I can tell, the only reason why a function like
'unsafeUseAsCStringLen' has to be dubbed unsafe is because 'index' makes
it unsafe. The limitation that ByteString has to be immutable is a
consequence of the choice to provide 'index' as a pure function.

Personally, I won't use 'index' in my code. I'll happily dereference the
pointer in the IO monad, because I've found that to be no effort
whatsoever. I love monads. For my purposes, 'unsafeUseAsCStringLen' is a
perfectly safe function. The efficient variant of 'hGet' I posted can be
implemented on top of it, so that 'hGet' is by all means a safe function
in my code. There really is no risk at all, unless one uses 'index' or
something that's based on it.

The way I see it, there will be other people who'll find the performance
limitations of standard 'hGet' a decisive factor in their design
decisions. Chances are, those people will wonder about using the base
pointer for hGetBuf and then they'll end up re-inventing the wheel we
just came up with.

Maybe I'll find the time to submit a patch to the documentation, so that
fine points like an optimal buffer size etc. are explained in more
detail than they are right now. It would be nice if some kind of result
would come out of this discussion.

Anyway, thank you. I appreciate everyone's efforts in helping me figure
out why I/O with ByteString is more than two times slower than it could
be.

Take care,
Peter



More information about the Libraries mailing list