[web-devel] Type-safe URL handling

Thu Mar 25 19:25:17 EDT 2010

On Thu, Mar 25, 2010 at 12:29 PM, Michael Snoyman <michael at snoyman.com>wrote:

> OK, here are my initial code comments:
>
> * Do we want to move everything into Web.URLT? More to the point, I'm not
> sure I see the point of calling this URLT, since it doesn't really require
> any monad transformers; maybe we should call it web-routes and then the
> module would be Web.Routes?
>

I think Web.Routes is a fine name. I'll make it happen. In the rest of this
post I refer to things by the old names, but I do intend to change the
module names and rename the package to web-routes.

> * I like the PathInfo class and to/fromPathSegments. Perhaps we should
> bundle that with the decode/encodePathInfo into a single module?
>

I put PathInfo in a separate module because I am a little dubious of classes
these days. I find it a bit annoying that you can only have one PathInfo
instance per type. And I think it helps show that using PathInfo is not
actually required. But, in practice, I think having less modules is probably
a good thing in this case, since it does not affect the dependency chain at
all. Just because I *can* put every function in it's own module doesn't mean
I should. ;) Also, we probably do want people to provide PathInfo instances,
even if they don't have to..

* I'd like to minimize dependencies as much as possible for the basic
> package. The two dependencies I've noticed are Consumer and
> applicative-extras. I think the type signatures would be clearer *without*
> those packages included, eg:
>
>    fromPathSegments :: [String] -> Either ErrMsg a
>

Except that is not a usable type. fromPathSegments may consume, some, but
not all of the path segments. Consider the type:

data SiteURL = Foo Int Int

fromPathSegments is going to receive the path segments:

["Foo","1","2"]

If you wrote a parser by hand, you would want it to look a little something
like:

 do string "Foo"
      slash
      i <- fromPathSegments
      slash
      j <- fromPathSegments
     eol
     return (Foo i j)

The key concept here is that when you call fromPathSegments to get the first
argument of Foo you need to know how many of the path segments were consumed
/ are remaining, so you can pass only those segments to the second
fromPathSegments.

So you really need a type like:

   fromPathSegments :: [String] -> (Either ErrMsg a, [String])

which outputs the unconsumed path segments.

But this is obviously a ripe target for a monad of some sort -- trying keep
track of the unconsumed portions by hand seems like it would asking for
trouble...

The Consumer monad takes care of that and provides the functions you would
expect such as, next, peek, and poke. And it seems nice to be able to use
Monad, MonadPlus, Applicative, Alternative, etc, for composing
fromPathSegments into larger parsers ?

But, perhaps there is a better choice of monad, or a better way of dealing
with the problem? Or maybe it's not really a problem?

I think Failing is a pretty nifty data-type for dealing with errors. But
perhaps it is not a big win here.. The #1 thing that makes Failing better
than (Either [String] a) is it's Applicative instance. Specifically, Failing
will accumulate and return all the errors which have occurred, not the just
first failure (which is the behavior for Applicative (Either e)).

So for example, let's say you are doing are trying to lookup a bunch of keys
from the query string. The key / value pairs in the query string are
typically independent of each other. So let's say you do:

 (,) <$> lookup "foo" <*> lookup "bar"

but neither of those keys exist. With Either you will only get the error
'could not find "foo"'. But with Failing you will get the error 'could not
find "foo". could not find "bar"'.  It is nice to get a report of all the
things that are broken, instead of getting only one error at a time, fixing
it, and then getting another error, etc.

However, I am not sure if this property is all that useful which urlt. If
you are trying to parse a url like:

  (string "Foo" *> Foo) <$> fromPathSegments <*> fromPathSegments

And the parsing of "Foo" fails.. then there is no use in finding out if the
other segments parse ok -- because they are likely to be garbage. Maybe it
failed because it got the string "FOo" instead of "Foo", but more likely it
got something completely unrelated like, /bar/c/2.4.

So, perhaps Either is a better choice even with out considering
dependencies... I think that Applicative / Alternative instances for Either
are only defined in transformers in the Control.Monad.Error module -- which
is a bit annoying. But we don't actually need those to implement urlt
itself.

This brings up another detail though.

the fromPathSegments / Consumer stuff is basically implementing a parser.
Except, unlike something like parsec, we do not keep track of the current
position for reporting errors. I wonder if we should perhaps use a slightly
richer parser environment. Within a web app, once you got your to/from
instances debugged, you will never get a parse error, so having great error
messages is not essential. But, for other people linking to your site it
could be potentially helpful. Though, it seems like the current error
messages out to be sufficient given how short the urls are..

I'm not certain what exactly the type of ErrMsg should be here; I don't
> really have a problem using [String], which would be close to the definition
> of Failing.
>
> * I think it's very important to allow users to supply customized 404
> pages. Essentially, we need to augment handleWai (possibly others) with a
> (ErrMsg -> Application) parameter.
>

Yeah, there are (at least) two possibilities, add an extra param for the
handler. Or bubble the error up to the top:

handleWai_1 :: (url -> String) -> (String -> Failing url) -> String ->
([ErrorMsg] -> Application) -> ((url -> String) -> url -> Application) ->
Application
handleWai_1 fromUrl toUrl approot handleError handler =
  \request ->
     do let fUrl = toUrl $ stripOverlap approot $ S.unpack $ pathInfo
request
        case fUrl of
          (Failure errs) -> handleError errs request
          (Success url)  -> handler (showString approot . fromUrl) url
request

handleWai_2 :: (url -> String) -> (String -> Failing url) -> String -> ((url
-> String) -> url -> Application) -> (Request -> IO (Failing Response))
handleWai_2 fromUrl toUrl approot handler =
  \request ->
     do let fUrl = toUrl $ stripOverlap approot $ S.unpack $ pathInfo
request
        case fUrl of
          (Failure errs) -> return (Failure errs)
          (Success url)  -> fmap Success $ handler (showString approot .
fromUrl) url request

The second choice is perhaps more flexible. Which do you prefer? In the
first option, the handleError function could be a Maybe value -- and if you
supply Nothing you get some default 404 page?

In happstack we have a third possiblity. The ServerMonad is an instance of
MonadPlus so we can throw out the error message and just call mzero:

implSite :: (Functor m, Monad m, MonadPlus m, ServerMonad m) => String ->
FilePath -> Site url String (m a) -> m a
implSite domain approot siteSpec =
  do r <- implSite_ domain approot siteSpec
     case r of
       (Failure _) -> mzero
       (Success a) -> return a

implSite_ :: (Functor m, Monad m, MonadPlus m, ServerMonad m) => String ->
FilePath -> Site url String (m a) -> m (Failing a)
implSite_ domain approot siteSpec =
    dirs approot $ do rq <- askRq
                      let pathInfo = intercalate "/" (rqPaths rq)
                          f        = runSite (domain ++ approot) siteSpec
pathInfo
                      case f of
                        (Failure errs) -> return (Failure errs)
                        (Success sp)   -> Success <$> (localRq (const $ rq {
rqPaths = [] }) sp)

then we can do:

 msum [ implSite "domain" "approot" siteSpec
            , default404
            ]

if implSite calls mzero, then the next handler (in this case default404) is
tried.

> * It might be nice to have "type WaiSite url = Site url String
> Application". By the way, are you certain you want to allow parameterization
> over the pathInfo type?
>

I'm not certain I don't want to allow it... I have a vague notion that I
might want to use Text sometimes instead of String. Though if I was really
committed to that then I should make toPathInfo and fromPathInfo
parameterized over pathInfo as well... So perhaps I will axe it from Site
for now. I need to change the name of that type and it's record names too I
think.

> The only packages that I feel qualified to speak about then are urlt and
> urlt-wai, and my recommendation would be:
>
> urlt contains decode/encodePathInfo, PathInfo class and related functions,
> Site and related functions. If you agree on allowing the parameterization of
> 404 errors, then also provide a default 404 error.
>

> urlt-wai contains WaiSite, handleWai and related functions.
>

Yeah, that is what I was thinking. urlt would contain what is currently in;

URLT.Base
URLT.PathInfo
URLT.HandleT
URLT.Monad
URLT.QuickCheck

QuickCheck module does not actually depend on QuickCheck, which is nice
because QC1 vs QC2 is a big problem right now.

It might also be nice to include:

URLT.TH

with depends on template-haskell. But I am not sure that depending on
template-haskell is an issue because template-haskell comes with ghc6, and
the code in URLT.TH already handles the breakage that happened with TH 2.4.

If I switch to Either instead of Failing I believe the dependencies would
be:

 base, Consumer, template-haskell, network, utf8-string

urlt-wai would just include:

URLT.Wai

- jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/web-devel/attachments/20100325/9b088384/attachment-0001.html