String type in Socket I/O

Simon Marlow simonmar@microsoft.com
Mon, 8 Apr 2002 12:26:16 +0100


> I am writing an HTTP client-side library, using the=20
> SocketPrim library. =20
> During the implementation of Base64 encode/decode I began to=20
> have some=20
> doubts over the use of the Char type for socket I/O.
>=20
> As far as I can tell, "sendTo" and "recvFrom" are simply=20
> handles on the=20
> underlying OS calls.  My winsock2.h file tells me the data=20
> passed into and=20
> received from these functions are C style chars, 8 bits each.=20
>  In unix these=20
> functions (sys/sockets.h) appear to use a C void pointer. =20
> Finally I notice=20
> that the Haskell98 report defines Haskell Char as a Unicode=20
> char (which I=20
> figure isn't guaranteed 8 bits).
>
> So I am curious, what happens when I send these unicode=20
> Haskell chars to the=20
> SocketPrim.sendTo function?  My current guess is that the low=20
> 8 bits of each=20
> Char become a C style char.

That's what happens in GHC, but as you correctly point out this is
wrong.  The real problem is twofold: GHC doesn't have proper support for
Unicode encoding/decoding at the I/O interface, and we don't have an
"approved" way to write other kinds of data into a file.

In GHC we currently have this interface (not in a released version yet)
in the module Data.Array.IO:

   hGetArray :: Handle -> IOUArray Int Word8 -> Int -> IO Int
   hPutArray :: Handle -> IOUArray Int Word8 -> Int -> IO ()

and there's also a Binary I/O library I've been working on that can be
used for reading/writing Word8s and other low-level kinds of data in a
predictable way (I guess we should discuss what the interface to
Data.Binary should look like at some point).

I'd also be happy to add

  hPutWord8 :: Handle -> Word8 -> IO ()
  hGetWord8 :: Handle -> IO Word8

to System.IO if people think this is the right thing to do.

Cheers,
	Simon