getting a Binary module into the standard libs

Simon Marlow simonmar@microsoft.com
Thu, 14 Nov 2002 10:54:12 -0000


> > 3) I think we can all agree that we should buffer BinIOs.  There are
> > a few questions, given this:
>=20
> >   a) Should multiple threads be allowed to write the same BinHandle
> > simultaneously?  If not, is an error thrown or is the behiour just
> > left "unspecified"?
> >   b) Should multiple threads be allowed to read from the same
> > BinHandle simultaneously?  If not, ...
> >   c) Should one thread be allowed to write and another to read from
> > the same BH simultaneously?  If not, ...
>=20
> I believe GHC has a reader-writer lock on Handles so the answer is
> that one thread blocks if another is already using it in a conflicting
> way.
>
> Basically, I suggest doing whatever normal file Handles do.

This is a tricky one.  Doing whatever normal Handles do is the "right"
way to approach this, but I fear it might be expensive.

Handles have a single file pointer (if they have a file pointer at all),
a buffer, and some other state.  The Handle itself is protected by a
lock, so that only one thread can access the state at a time.

Currently, a BinIO handle caches the file pointer for speed, and doesn't
protect this with a lock.  BinIO handles might also need a cache.  The
"right" thing to do is to push this inside the Handle - use the Handle's
buffer as the cache.  Provide something like

  hOpenBin :: FilePath -> OpenMode -> IO Handle
  hPutBits :: Handle -> Int -> Word8 -> IO ()
  hGetBits :: Handle -> Int -> IO Word8
  hSeekBits :: Handle -> Integer -> IO ()

I don't know whether this would be acceptably fast or not.  (I'll try to
do some perf measurements on BinIO vs. BinMem later today, that should
give us a rough idea).

What about BinMem?  Currently a BinMem is basically a flat array and a
pointer.  It has no lock; if you write or read from two threads
simultaneously you can get race conditions.  However, even with a lock,
reading from two threads simultaneously isn't likely to be a good idea
because of the shared file pointer.  This is why I suggested having
dupBin:

  dupBin :: BinHandle -> IO BinHandle

which essentially gives you another file pointer to work with, so that
two threads can safely read the same BinHandle at different points.
(writing is still problematic - use BinIO if you want multithreaded
writing).

dupBin can be implemented for Handles, and hence BinIO too.  It's fairly
straightforward and seems useful anyway.

Summary:

  - reading/writing the same BinHandle from two threads isn't useful
    unless the threads can have their own file pointers.  =3D=3D> need
dupBin

  - cacheing of the data in a BinIO should be done in the Handle,
    unless that's too expensive.

(Hal: for now, just continue with what you had planned, if we decide to
make some of these changes we can refactor later).

Cheers,
	Simon