[Haskell-cafe] Re: Interesting new user perspective

apfelmus apfelmus at quantentunnel.de
Mon Oct 13 06:50:28 EDT 2008


Devin Mullins wrote:
> apfelmus wrote:
>> Yes. "Just" an injection problem is an understatement. And its the
>> implementation of the abstract data type that determines how fast things
>> are. Who said that it may not simply be a newtyped String ?
> 
> I think the attraction to the SafeString example is that it's simple and
> effective for the task at hand -- in other words, pragmatic. Suggesting
> that in order to avoid HTML injection people re-read the HTML spec and
> invent a complete ADT to represent all the little corner cases they
> probably won't ever use is exactly the kind of answer that would scare
> Joe Six-Pack Hockey Programmer away.

/me shrugs. Can't help it that HTML is crazily complex.

But what I wanted to say is not that you should create an algebraic data
type that exactly represents valid HTML but rather that the basic idea
underlying SafeString, namely using an abstract (≠ algebraic) data type
for HTML should be a no-brainer and the first thing to do.

To demonstrate, here's a 27 line HTML module. It's really simple, even
simple-minded and still bug-prone, yet it already eliminates escaping
and nesting bugs.

    module HTML (Html, Attr(..), toString, tag, text) where

    newtype Html = Html { toString :: String }

    data Attr = String := String

    tag      :: String -> [Attr] -> [Html] -> Html
    text     :: String -> Html

    tag name attrs body = Html $
         "<" ++ name ++ attrs' ++ ">" ++ body' ++ "</" ++ name ++ ">"
         where
         attrs' = concatMap attr attrs
         attr (name := value) =
            " " ++ name ++ "=\"" ++ escapeHtml value ++ "\""
         body'  = concatMap toString body

    text = Html . escapeHtml

    escapeHtml s = concatMap esc s
        where
        esc '<'  = "&lt;"
        esc '>'  = "&gt;"
        esc '&'  = "&amp;"
        esc '"'  = "&#34;"
        esc '\'' = "&#39;"
        esc x    = [x]

Example usage:

  *HTML> toString $ tag "b" [] [tag "i" [] [text "<>"], text "test"]
  "<b><i>&lt;&gt;</i>test</b>"

These are just the very basics, extend them ad libitum. For instance by
using  Data.DList  (i.e. ShowS) instead of  String  for O(1)
concatenation. Or reading the HTML spec and doing something about
newlines in attributes. Or checking the spec for the format of tag and
attribute names. Or making Html a forest of nodes instead of a tree.


I don't understand why there are still escaping bugs in this world.


Regards,
apfelmus



More information about the Haskell-Cafe mailing list