Network.HTTP module, using simpleHTTP

Graham Klyne gk at ninebynine.org
Wed Jun 9 05:30:06 EDT 2004


At 00:12 09/06/04 +0200, Bjorn Bringert wrote:
>Graham Klyne wrote:
>>I'm trying to add an HTTP entity retrieval capability to HaXml using 
>>simpleHTTP as the basis of a new function, readHTTP [1], that works very 
>>similarly to prelude.readFile (except that its argument is a Network.URI 
>>value).  Function simpleHTTP still leaves a fair amount of result 
>>analysis to be done by the calling program.  I'm thinking that it might 
>>be convenient to provide a simple function, say:
>>      hasResponseData :: Response -> Bool
>>that can be used to drive a simple binary decision along the lines of:
>>      return $ if hasResponseData response
>>          then (Right $ rspBody response)
>>          else (Left  $ show response)
>
>Unfortunately you cannot decide from the Response structure alone whether 
>the response has no body or just a zero-length body. RFC 2616 [1] says:
>
>    "For response messages, whether or not a message-body is included with
>    a message is dependent on both the request method and the response
>    status code (section 6.1.1). All responses to the HEAD request method
>    MUST NOT include a message-body, even though the presence of entity-
>    header fields might lead one to believe they do. All 1xx
>    (informational), 204 (no content), and 304 (not modified) responses
>    MUST NOT include a message-body. All other responses do include a
>    message-body, although it MAY be of zero length."
>
>However, if we accept that hasResponseData is not accurate for HEAD 
>requests, it should be doable.

Ah, of course.  I was glossing the non-GET case, trying to make it easy to 
to the "readFile" equivalent.

>>Do you think this would be a reasonable addition to the HTTP module, to 
>>make it very easy for a program to issue a simple HTTP GET to retrieve a 
>>resource?
>
>That sounds very reasonable.
>
>
>>Another thought I wanted to raise with you concerns the URI authority 
>>parser that is currently part of the HTTP module.
>>My revised version of URI already does most of what this micro-parser 
>>does (apart from not separating the username and password in 
>>userinfo).  When the revised URI specification (successor to RFC2396) 
>>looks stable, I'm planning to update the Network.URI module in the 
>>hierarchical libraries.  It occurs to me that the added functionality 
>>here could mean that module HTTP might be simplified.
>
>I agree that the HTTP module should use Network.URI to do URI parsing. As 
>soon as the new URI module is in the hierarchical libraries, the HTTP 
>module should start using it. Hmm, actually we may have to wait until GHC 
>and Hugs come with the new URI module. This seems to be general problem 
>with having things in the standard libraries; you are tied to the release 
>cycles of the Haskell implementations.

Yes... although I have permission and access, I'm still wary about updating 
the library CVS because I don't want to create any problems for existing 
code.  (Though, of course, CVS exists to help in these situations, n'est 
pas?)  E.g. I'm very aware that all my testing is being done on Windows, 
and mostly using Hugs, which leaves me slightly out-of-step with most 
library devlopers.  (I think that's not a bad thing, but it makes me a 
little more wary.)

I'm sure there's a good solution for this.  Meanwhile, copies of my working 
code are at my web site:
   http://www.ninebynine.org/Software/HaskellUtils/
in particular:
   http://www.ninebynine.org/Software/HaskellUtils/Network/
   http://www.ninebynine.org/Software/HaskellUtils/HaXml-1.12/


>>[1] here's a simple readHTTP function I've cooked up... does it look 
>>workable?:
>
>This looks like a fine addition to the HTTP module, with some minor tweaks 
>mentioned below.

Ah, I didn't consider including *that* in HTTP.

>>readHTTP :: URI -> IO String
>>readHTTP uri = withSocketsDo $ do
>>      { res <- simpleHTTP (defaultGETRequest uri)
>>      ; case res of
>>          Left  err ->
>>              return ("\nError!  failed to read "++show uri++": "++show 
>> err++"\n")
>>          Right rsp -> return $ if hasResponseData rsp
>>              then rspBody rsp
>>              else show rsp
>>      }
>
>I think it might be cleaner if the Left case and the case where there is 
>no response data called fail instead of returning a string. The user could 
>then catch those errors.

I agree that the string return is not a good general solution.

(In much of my work on the XML parser, I've been trying to avoid fail, 
because that can't be caught outside the IO monad.  But of course, this 
case is already in the IO monad, so that's not a problem.)

>Concerning the use of withSocketsDo, the original homepage [2] of the HTTP 
>module says:
>
>"It is quite safe to call withSocketsDo multiple times, but that technique 
>has earnt a place on the Winsock Programmers FAQ Lame List [3], since 
>winsock initialisation has performance overhead."
>
>So ideally, the programmer using readHTTP should call withSocketsDo at the 
>top level of the program. But that requires extra work to use the library. 
>I'm not sure what is the right thing to do here. Any thoughts?

This is stretching my Haskell experience, but is it possible to use 
memoization to achieve just one call?

e.g.

    socketsInitialized :: Bool
    socketsInitialized = unsafePerformIO ( withSocketsDo ( return True ) )

Then:

    readHTTP uri = seq socketsInitialized $ do ...

or
    readHTTP uri = if socketsInitialized then ( do ... ) else fail

I don't particularly like the use of unsafePerformIO here;  is it possible 
to do it more cleanly?  OTOH, it seems that the whole point here is to be 
impure with respect to 'withSocketsDo', so maybe the "unsafe" option is the 
only one?


>>hasResponseData :: Response -> Bool
>>hasResponseData rsp = case rspCode rsp of
>>      (2,_,_)   -> True
>>      otherwise -> False
>
>Just needs some simple modifications as per the RFC 2616 quote above.

OK, it would be good to get that cleaned up.

>>defaultGETRequest uri =
>>      Request { rqURI=uri
>>              , rqBody=""
>>              , rqHeaders=[ Header HdrContentLength "0"
>>                          , Header HdrUserAgent "haskell-haxml/0.1"
>>                          ]
>>              , rqMethod=GET
>>              }
>
>If I understand RFC 2616 correctly, there may only be a Content-length 
>header if the request contains a body, which as far as I understand 
>(though I can't find that in the RFC) GET requests can't.

Maybe... I just copied that from the Browser module.  (No original thought 
here!)

...

Anyway, here's a revised version that I think incorporates your suggestions 
(I've finessed the HEAD issue somewhat by changing the name and 
specification of the response analysis function):

[[
--  Memoization of withSocketsDo to prevent multiple calls.
--  (cf. http://tangentsoft.net/wskfaq/articles/lame-list.html)
socketsInitialized :: Bool
socketsInitialized = unsafePerformIO ( withSocketsDo $ return True )

readHTTP :: URI -> IO String
readHTTP uri = seq socketsInitialized $ do
     { let req = defaultGETRequest uri
     ; res <- simpleHTTP req
     ; case res of
         Left  err -> fail ("Failed to read "++show uri++": "++show err)
         Right rsp -> return $ if requestCompletedOK req rsp
             then rspBody rsp
             else fail (show rsp)
     }

--  Simplified response handling:
--  determine if the expected final result from the given request
--  is available from the given response value.  (e.g. in the case of
--  a GET operation, this will be a representation of the resource.)
requestCompletedOK :: Request -> Response -> Bool
requestCompletedOK req rsp = case (rqMethod req,rspCode rsp) of
     (_,   (2,0,0)) -> True      -- OK
     (_,   (2,0,1)) -> True      -- Created
     (_,   (2,0,2)) -> False     -- Accepted (operation deferred)
     (_,   (2,0,3)) -> True      -- Non-authoritative info
     (GET, (2,0,4)) -> False     -- No-content
     (_,   (2,0,4)) -> True      -- No-content
     (GET, (2,0,5)) -> False     -- Reset content (can't happen?)
     (_,   (2,0,5)) -> True      -- Reset content
     (_,   (2,0,6)) -> True      -- Partial content (GET with range)
     (GET, (3,0,4)) -> True      -- Conditional GET, not modified
     otherwise      -> False

defaultGETRequest uri =
     Request { rqURI=uri
             , rqBody=""
             , rqHeaders=[ Header HdrContentLength "0"
                         , Header HdrUserAgent "haskell-haxml/0.1"
                         ]
             , rqMethod=GET
             }
]]

(This code works in an HaXml test case, but is not exhaustively tested.)

#g
--

>[1] http://www.ietf.org/rfc/rfc2616.txt
>   RFC 2616
>
>[2] http://homepages.paradise.net.nz/warrickg/haskell/http/
>   The HTTP and Browser Modules
>
>[3] http://tangentsoft.net/wskfaq/articles/lame-list.html
>   Winsock Programmer's FAQ, Articles: The Lame List



------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact



More information about the Libraries mailing list