XMLParser

Ron de Bruijn rondebruijn@yahoo.com
Tue, 29 Jul 2003 04:04:10 -0700 (PDT)


Hi there,

I used the PARSEC parser combinator library to create
an following XML-Parser. It's not exactly XML, but
it's a usefull format for saving various data.
The big difference there is, is that I don't have
attributes, but those could easily be added, although
I don't need them.
Below is the complete code (20! lines of code  for the
core). I was excited when I saw it also worked like I
had imagined, but then I tested it sometimes with
increasingly big data. 
Using my test function below I noticed that there's
some exponentional behaviour. I don't see a reason for
this. Although I use the try combinator, it should at
max double the time, because the parser uses the other
parser when there occurs a parse error in the one. 

My definition of xmlParser looks also a bit odd, but
without at least one of the functions
of(subEnTop,topEnSub) it will fail parsing.
I also don't understand why.

Then I question myself: why does my program work with
the downloaded version, and not with the PARSEC
library that ships with GHC 5.0.4.2, I atleast thought
that Parsec hasn't changed for a long time?

Does anyone have an explanation for the above
problems?

Greets Ron

module XMLCreator where
--this module can parse XML

import Parsec
import GHC.Show
import System.Time

--datatype om XML te representeren: XML bestaat altijd
uit een topelement en een aantal nestingen
data XML a = TopElement String [XML a]
           | SubElement String a deriving Show

type Level = Int

--not finished Pretty Printer
ppXML::XML a->Level->String
ppXML (TopElement name (xml:xmls)) level = concat(take
level (repeat " "))++openTag name
ppXML _ _ = "hallo"
--ppXML (NestedElement name info) level      =

openTag::String->String
openTag s = "<"++s++">"


--run :: Show a => Parser a -> String -> IO ()
run p input= do printTijd
                case (parse p "" input) of
                 Left err -> do{ putStr "parse error
at ";
                                print err
                              }
                 Right x -> do print x
                               printTijd

--prints time(used above for printing the start time
and the endtime of the algoritm
printTijd = do time<-getClockTime
               (putStrLn.show) time

test n = run xmlParser (concat(replicate n str))
--test n = parseTest xmlParser (concat(replicate n
str))

str::String
str =
"<TopicList><TopicItem><Readed>1</Readed><ForumTitle>forumTitle</ForumTitle><TopicTitle>topicTitle</TopicTitle><TopicID>100100</TopicID><TopicStarter>someOne</TopicStarter><LastReplier>someOtherOne</LastReplier><ReplierID>1234</ReplierID><MessageID>messageID</MessageID><SubForumTitle>PW</SubForumTitle><SubForumID>14</SubForumID><Replies>2</Replies><ReplyTime>1050767736</ReplyTime><StartTime>1050762676</StartTime></TopicItem><FileInfo><Version>5</Version><TimeCreated>1059155053</TimeCreated><Application>SomeProgram</Application><About>aboutText</About></FileInfo></TopicList>
";

--determine which sign can not be in the elementname
or in the informationpiece
signs = noneOf ['<','>']

--parses a piece of xml
xmlParser::Parser ([XML String])
xmlParser =  choice [try (many1 sub),try topEnSub,try
subEnTop, many1 top]
--xmlParser = choice[try(many1 sub),many1 top]

getName::Parser String
getName = do teken '<'
             name<-many1 signs
             teken '>'
             return name
          where teken t = do skipMany space
                             char t
                             skipMany space
end::String->Parser String
end name = do skipMany space
              string ("</"++name++">")

sub::Parser (XML String)
sub = do name<-getName
         info<-many1 signs
         end name
         return (SubElement name info)

top::Parser (XML String)
top = do name<-getName
         moreXml<-xmlParser
         end name
         return (TopElement name moreXml)


subEnTop::Parser [(XML String)]
subEnTop = do x<-many1 sub
              y<-many1 top
              return (x++y)


topEnSub::Parser [(XML String)]
topEnSub = do x<-many1 top
              y<-many1 sub
              return (x++y)

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com