Network.HTTP module, using simpleHTTP

Bjorn Bringert d00bring at dtek.chalmers.se
Tue Jun 8 18:12:51 EDT 2004


Graham Klyne wrote:
> I'm trying to add an HTTP entity retrieval capability to HaXml using 
> simpleHTTP as the basis of a new function, readHTTP [1], that works very 
> similarly to prelude.readFile (except that its argument is a Network.URI 
> value).  Function simpleHTTP still leaves a fair amount of result analysis 
> to be done by the calling program.  I'm thinking that it might be 
> convenient to provide a simple function, say:
> 
>      hasResponseData :: Response -> Bool
> 
> that can be used to drive a simple binary decision along the lines of:
> 
>      return $ if hasResponseData response
>          then (Right $ rspBody response)
>          else (Left  $ show response)

Unfortunately you cannot decide from the Response structure alone 
whether the response has no body or just a zero-length body. RFC 2616 
[1] says:

    "For response messages, whether or not a message-body is included with
    a message is dependent on both the request method and the response
    status code (section 6.1.1). All responses to the HEAD request method
    MUST NOT include a message-body, even though the presence of entity-
    header fields might lead one to believe they do. All 1xx
    (informational), 204 (no content), and 304 (not modified) responses
    MUST NOT include a message-body. All other responses do include a
    message-body, although it MAY be of zero length."

However, if we accept that hasResponseData is not accurate for HEAD 
requests, it should be doable.


> Do you think this would be a reasonable addition to the HTTP module, to 
> make it very easy for a program to issue a simple HTTP GET to retrieve a 
> resource?

That sounds very reasonable.


> Another thought I wanted to raise with you concerns the URI authority 
> parser that is currently part of the HTTP module.
> 
> My revised version of URI already does most of what this micro-parser does 
> (apart from not separating the username and password in userinfo).  When 
> the revised URI specification (successor to RFC2396) looks stable, I'm 
> planning to update the Network.URI module in the hierarchical 
> libraries.  It occurs to me that the added functionality here could mean 
> that module HTTP might be simplified.

I agree that the HTTP module should use Network.URI to do URI parsing. 
As soon as the new URI module is in the hierarchical libraries, the HTTP 
module should start using it. Hmm, actually we may have to wait until 
GHC and Hugs come with the new URI module. This seems to be general 
problem with having things in the standard libraries; you are tied to 
the release cycles of the Haskell implementations.


> [1] here's a simple readHTTP function I've cooked up... does it look workable?:

This looks like a fine addition to the HTTP module, with some minor 
tweaks mentioned below.

> readHTTP :: URI -> IO String
> readHTTP uri = withSocketsDo $ do
>      { res <- simpleHTTP (defaultGETRequest uri)
>      ; case res of
>          Left  err ->
>              return ("\nError!  failed to read "++show uri++": "++show 
> err++"\n")
>          Right rsp -> return $ if hasResponseData rsp
>              then rspBody rsp
>              else show rsp
>      }

I think it might be cleaner if the Left case and the case where there is 
no response data called fail instead of returning a string. The user 
could then catch those errors.

Concerning the use of withSocketsDo, the original homepage [2] of the 
HTTP module says:

"It is quite safe to call withSocketsDo multiple times, but that 
technique has earnt a place on the Winsock Programmers FAQ Lame List 
[3], since winsock initialisation has performance overhead."

So ideally, the programmer using readHTTP should call withSocketsDo at 
the top level of the program. But that requires extra work to use the 
library. I'm not sure what is the right thing to do here. Any thoughts?

> hasResponseData :: Response -> Bool
> hasResponseData rsp = case rspCode rsp of
>      (2,_,_)   -> True
>      otherwise -> False

Just needs some simple modifications as per the RFC 2616 quote above.

> defaultGETRequest uri =
>      Request { rqURI=uri
>              , rqBody=""
>              , rqHeaders=[ Header HdrContentLength "0"
>                          , Header HdrUserAgent "haskell-haxml/0.1"
>                          ]
>              , rqMethod=GET
>              }

If I understand RFC 2616 correctly, there may only be a Content-length 
header if the request contains a body, which as far as I understand 
(though I can't find that in the RFC) GET requests can't.

/Bjorn


[1] http://www.ietf.org/rfc/rfc2616.txt
   RFC 2616

[2] http://homepages.paradise.net.nz/warrickg/haskell/http/
   The HTTP and Browser Modules

[3] http://tangentsoft.net/wskfaq/articles/lame-list.html
   Winsock Programmer's FAQ, Articles: The Lame List


More information about the Libraries mailing list