[web-devel] Hoogle Advice

Michael Snoyman michael at snoyman.com
Mon Jan 24 22:04:39 CET 2011


On Mon, Jan 24, 2011 at 10:11 PM, Neil Mitchell <ndmitchell at gmail.com> wrote:
> Hi,
>
> I tried switch Hoogle to Wai and Warp today. The results were good,
> and once a few little things are cleared up I'm fairly certain that
> the the next Hoogle release will be based on them. As I wrote the
> code, I made some notes:
>
> 1) Why not add statusOK as an alias for status200? As someone who
> doesn't spend all day dealing with HTTP response codes, statusOK is
> far better documenting - similarly for all the other status values. I
> added statusOK as a local definition in Hoogle.

The main reason I stuck with numerical values was the redirects: there
is a bit of ambiguity about the right name for 302, 303 and 307. (Yes,
I know that the specs give a name for these, but in practice they are
treated slightly different.) Frankly, I had always intended WAI to be
something low-level that a framework would wrap around. It's a new
phenomenon that there is interest in writing directly against WAI. I
think adding some of the status codes as aliases makes sense in that
context.

> 2) Changing from HTTP to Wai, it was a shame that I had to define all
> the header keys myself - could they be included - it would have better
> performance, and give static guarantees. I added locally:
>
> hdrContentType = mkCIByteString "Content-Type"
> hdrCacheControl = mkCIByteString "Cache-Control"

Firstly, CIByteString is an instance of IsString, so you can just
write "Content-Type". While this doesn't address the performance
concerns, it does address the convenience. As far as *why* I didn't
include them, it just becomes an issue of how much to include/exclude.
I included the statuses because it's an annoyance to write Status 200
"OK", but I thought "Content-Type" was acceptable. But I really have
no strong feelings on the matter.

> 3) The documentation for ResponseFile doesn't say what happens if the
> file is not found. I am sure it must throw an exception, but for a
> very brief second I wondered if it might change my status code to 404
> on my behalf. Could the documentation be clarified.

It depends on what the sendfile package does actually. I haven't tried
this personally, so in fact I don't know what it does. I'll make a
note to test it and update the documentation accordingly.

> 4) I defined a Wai handler that threw an error, and Warp did nothing.
> Should it? Or should I just be careful and include my own exception
> handling? I'm not really sure what the right thing to do is - silently
> eating the error isn't great, but bring down my server would be even
> less appreciated.

There's already a request in place to allow you to provide a callback
function for exceptions. However, I personally prefer having the
application catch all exceptions (which is what Yesod does in
practice).

> 5) Hoogle is currently written as lazy IO + String. Converting this
> format to a response was easy with liftIO, LBS.pack and responseLBS.
> Converting back was rather tricky. I eventually came up with:
>
> responseFlatten :: Response -> IO (Status, ResponseHeaders, LBS.ByteString)
> responseFlatten r = responseEnumerator r $ \status headers ->
>    let f res E.EOF = return (status, headers, Blaze.toLazyByteString res)
>        f res (E.Chunks xs) = E.continue (f $ mconcat $ res:xs)
>    in E.continue (f mempty)
>
> It's not too long, but it has a relatively high level of complexity.
> Is there something simpler I could have done?

You could probably write something simpler using the consume iteratee
from the enumerator package. It's late at night here, so the following
may or may not fire missiles:

    responseEnumerator r $ \s hs -> do
        builders <- consume
        return (s, hs, Blaze.toLazyByteString $ mconcat builders)

> 6) I want responseFlatten for two purposes, the first of which is to
> output a CGI representation of the Response, which is easy with the
> above function. However, outputting a response in CGI format is
> trivial, and is probably the easiest way to debug/view the contents of
> a Response. Would it be worth adding showResponse :: Response -> IO
> LBS.ByteString to the Wai library, which produced the result
> representing how it would be sent in CGI format, and was also useful
> for debugging?

I think such a function would make more sense for the wai-extra
package. I already have a debug middleware over there, that kind of
function could be a good addition.

> 7) The second reason for using responseFlatten is that I want to write
> a function Response -> IO Response which modifies a response to
> rewrite "href='file://" to "href='/file/". I currently do this with
> responseFlatten, converting to a String, and then stepping through it
> Char by Char with isPrefixOf. Given the performance focus of this
> email list, I can imagine several of you are now crying. What would
> have been the Iterator/Builder approved way to do it?

Well, obviously converting to Char makes us cry, so sticking to
ByteString operations would be a little easier. But frankly, to my
knowledge, there isn't an efficient way to perform transformations on
Builders. You'll probably be best off mconcat'ing your Builders into a
single Builder, converting that to a lazy ByteString and performing
the conversion there.

Assuming that you are creating the Builder using fromLazyByteString in
the first place, the performance hit here won't really be as bad as it
seems: blaze-builder will actually use the original buffer when a
source bytestring is large enough, so there should not be a
significant overhead versus performing the transformation on the
original lazy ByteString.

> 8) The last point is the only point that is a blocker for Hoogle, and
> regards the Warp server, Windows and ghci. In ghci on Windows, running
> the network function accept doesn't terminate if you hit Ctrl+C. If
> you run Warp in the main thread ghci will never respond to you again.
> If you run Warp on a second thread then hitting Ctrl+C twice returns
> to the prompt, but the thread serving the web page is still running
> (and still serving web pages), and if you run the server again the old
> server continues to serve the pages instead. I develop Hoogle on
> Windows through ghci in server mode, so without a workaround it's
> fairly fatal.
>
> When writing Hoogle, I found that even though accept doesn't terminate
> on exceptions, if you do sClose on the socket it raises an exception
> in accept and does terminate. I experimented in a local copy of Warp
> and found the following worked very well:
>
> import Control.Concurrent
> import Control.Monad
>
> run :: Port -> Application -> IO ()
> run port app = withSocketsDo $ do
>    var <- newMVar Nothing
>    let clean = modifyMVar_ var $ \s -> maybe (return ()) sClose s >>
> return Nothing
>    forkIO $ bracket
>        (listenOn $ PortNumber $ fromIntegral port)
>        (const clean)
>        (\s -> do modifyMVar_ var (\_ -> return $ Just s);
> serveConnections port app s)
>    forever (threadDelay maxBound) `finally` clean
>
> If you hit Ctrl+C in ghci on Windows you get the message
> "Network.Socket.accept: failed (No error)" - but it terminates
> perfectly. That error could even be dropped silently if you reach the
> `finally` clause. Can this code be included in Warp?

I've also been using Warp for development purposes at work on my
Windows machine, and have run into the Ctrl-C issue. I'll review this
patch after some sleep. My initial reaction is to include it
conditionally for Windows. My guess is that very few people (if any)
are planning on running production websites using Windows/Warp, so
adding something which enhanced usability at the expense of a few
cycles would be a general win. However, for the speed-crazed masses
serving from Linux (eg, you and me apparently), I'd like to avoid any
extra code.

Thanks for taking the time to write up such a thorough email, it is
much appreciated.

Michael



More information about the web-devel mailing list