Proposal: ByteString based datagram communication (Ticket #1238 )

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Thu Apr 5 19:52:51 EDT 2007


On Fri, 2007-04-06 at 00:26 +0900, Robert Marlow wrote: 
> Hi Bulat
> 
> On Thu, 2007-04-05 at 15:08 +0400, Bulat Ziganshin wrote:
> > but why you provide ByteString-only API?? i think that more common
> > idiom is to provide String functions here and use somewhat like
> > Network.ByteString, Network.ByteString.Lazy modules to provide
> > ByteString/ByteStringLazy equivalents of String function from
> > Network.hs
> 
> Mostly because I wanted ByteStrings so that's what I implemented :)
> 
> Good point though. I've uploaded a replacement patch changing the
> Network functions to use String and adding Network.ByteString and
> Network.ByteString.Lazy. Thanks for the suggestion.

I'm not sure this really makes sense.

In most situations there is an obvious candidate amongst String, strict
ByteString and lazy ByteString. In this case, for datagram communication
the obvious choice is indeed strict ByteString. Correct me if I'm wrong
but datagrams are relatively small contiguous chunks and they arrive in
our memory space all in one go. So they are not at all like a continuous
stream of data which is what a lazy ByteString models. So there would
never be any advantage to using a lazy ByteString in this case, it would
always just have one chunk. Similarly, for String, one has to go via a
strict contiguous chunk representation in the first place so any String
interface would be a trivial wrapper on a ByteString representation.

Remember that the types are trivially inter-convertible with a single
function call[1]. I'm not sure that we need two whole extra module to
replace a single pack/unpack call in a calling module.

It's exactly this kind of thing that makes me worry about people
creating a Stringlike class. By passing the operations in via a class
rather than converting representations on the boundary we are in danger
of loosing all the performance benefits we were after in the first
place. 

I'm sure it makes more sense to provide a class to give us a string
equivalent of fromIntegral. That way operations that want to provide an
api that works on any string can chose the best internal representation
and just use the conversion on the boundary. That way we only need to
inline the conversion into the calling program to make it fast. As with
fromIntegral, that conversion can often be optimised or turned into a
no-op.

For performance, class dictionary use should be kept as near to the
'surface' as possible. For example, consider this standard List module
function:

elemIndex       :: Eq a => a -> [a] -> Maybe Int
elemIndex x     = findIndex (x==)

This is not a naive definition. It is very cunning.

If we wrote a full version of elemIndex in the style of findIndex but
using == at the appropriate point then to optimise uses of elemIndex
where we know the particular Eq class instance we'd have to inline the
whole of elemIndex. This isn't a tiny amount of code and GHC is normally
disinclined to do that. So we'd end up passing an Eq dictionary.
Disaster!

Instead, with the above definition we've lifted the use of the class
right to the surface. Now elemIndex looks tiny and ghc will inline it in
the calling context where we know the Eq instance. So now we just build
a little specialised (x==) function and make a call to the findIndex
function. So we get minimal code duplication and pretty fast results. 

And all this happens without having to bludgeon the compiler with INLINE
or SPECIALISE pragmas. In other words it works just fine on ordinary
user code.

Ok, enough ranting.

Duncan


[1] Well two to get between strict and lazy bytestrings, but that's kind
of deliberate to encourage people to think twice about doing that



More information about the Libraries mailing list