<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">Hello,<br>

      <br>

      Thanks for the tip!<br>

      <br>

      I'm in fact using dom-selector:<br>

      <a class="moz-txt-link-freetext" href="http://hackage.haskell.org/package/dom-selector">http://hackage.haskell.org/package/dom-selector</a><br>

      which is based on xml-conduit and html-conduit. The reason being

      that it offers CSS selectors and is generally much higher-level

      than what I would do with parsec.<br>

      <br>

      So I'm not sure whether what you wrote applies.<br>

      Actually your function doing the parsing here is not pure as such,

      it's a do block and ordered. What I have done so far is that

      dom-selector gives me the DOM structure of the page (so that

      parsing part is done for me), and then I give to my function that

      DOM structure and the examination of that DOM structure is

      completely without a do block, it's not ordered, it's pure. In

      that way my "parsing" (really examination of the DOM tree) is

      completely split of any IO or other monad.<br>

      <br>

      I think when you are within parsec as you mentioned, you are

      within the parsec monad (bear in mind I don't really understand

      all of this for now), and to do IO you need to go to the IO monad,

      and for that you use liftIO. In that case that's another problem

      than the one I'm having.<br>

      <br>

      Emmanuel<br>

      <br>

      On 12.10.2012 15:39, David McBride wrote:<br>

    </div>

    <blockquote

cite="mid:CAN+Tr43_Dt8Ex54zKzhpdMi0hoPbNS0+BCep469UErb8CpuTAA@mail.gmail.com"

      type="cite">There's a better option in my opinion.&nbsp; Use the monad

      transformer capability of the parser you are using (I'm assuming

      you are using parsec for parsing).<br>

      <br>

      If you check the hackage docs for parsec you'll see that the

      ParsecT is an instance of MonadIO.&nbsp; That means at any point during

      the parsing you can go liftIO $ &lt;any IO action&gt; and use the

      result in your parsing.&nbsp; Here's an example of what that would

      might look like.<br>

      <br>

      import Control.Monad.IO.Class<br>

      import Control.Monad (when)<br>

      import Text.Parsec<br>

      import Text.Parsec.Char<br>

      <br>

      parseTvStuff :: (MonadIO m) =&gt; ParsecT String u m (Char,Maybe

      ())<br>

      parseTvStuff = do<br>

      &nbsp; string "tvshow:"<br>

      &nbsp; c &lt;- anyChar<br>

      &nbsp; morestuff &lt;- if c == 'x'<br>

      &nbsp;&nbsp;&nbsp; then fmap Just $ liftIO $ putStrLn "run an http request, parse

      the result, and store the result in morestuff as a maybe"<br>

      &nbsp;&nbsp;&nbsp; else return Nothing<br>

      &nbsp; return (c,morestuff)<br>

      <br>

      So you will run an http request if you get back something that

      seems like it could be worth further parsing.&nbsp; Then you just parse

      that stuff with a separate parser and store it in your data

      structure and continue parsing the rest of the first page with the

      original parser if you wish.<br>

      <br>

      <div class="gmail_quote">On Fri, Oct 12, 2012 at 9:28 AM, Emmanuel

        Touzery <span dir="ltr">&lt;<a moz-do-not-send="true"

            href="mailto:etouzery@gmail.com" target="_blank">etouzery@gmail.com</a>&gt;</span>

        wrote:<br>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          Hi,

          <div class="im"><br>

            <br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              when parsing the string representing a page, you could<br>

              save all the links you encounter.<br>

              <br>

              After the parsing you would load the linked pages and

              start<br>

              again parsing.<br>

              <br>

              You would redo this until no more links are returned or a<br>

              maximum deepness is reached.<br>

            </blockquote>

            <br>

          </div>

          Thanks for the tip. That sounds much more reasonable than what

          I mentioned. It seems a bit "spaghetti" to me though in a way

          (but maybe I just have to get used to the Haskell way).<br>

          <br>

          To be more specific about what I want to do: I want to parse

          TV programs. On the first page I have the daily listing for a

          channel. start/end hour, title, category, and link or not.<br>

          To fully parse one TV program I can follow the link if it's

          present and get the extra info which is there (summary,

          pictures..).<br>

          <br>

          So the first scheme that comes to mind is a method which takes

          the DOM tree of the daily page and returns the list of

          programs for that day.<br>

          <br>

          Instead, what I must then do, is to return the incomplete

          programs: the data object would have the link filled in, if

          it's available, but the summary, picture... would be empty.<br>

          Then I have a "second pass" in the caller function, where for

          programs which have a link, I would fetch the extra page, and

          call a second function, which will fill in the extra data

          (thankfully if pictures are present I only store their URL so

          it would stop there, no need for a third pass for pictures).<br>

          <br>

          It annoys me that the first function returns "incomplete"

          objects... It somehow feels wrong.<br>

          <br>

          Now that I mentioned my problem with more details, maybe you

          can think of a better way of doing that?<br>

          <br>

          And otherwise I guess this is the policy when writing Haskell

          code: absolutely avoid spreading impure/IO tainted code, even

          if it maybe negatively affects the general structure of the

          program?<br>

          <br>

          Thanks again for the tip though! That's definitely what I'll

          do if nothing better is suggested. It is actually probably the

          best way to do that if you want to separate IO from "pure"

          code.<span class="HOEnZb"><font color="#888888"><br>

              <br>

              Emmanuel</font></span>

          <div class="HOEnZb">

            <div class="h5"><br>

              <br>

              _______________________________________________<br>

              Beginners mailing list<br>

              <a moz-do-not-send="true"

                href="mailto:Beginners@haskell.org" target="_blank">Beginners@haskell.org</a><br>

              <a moz-do-not-send="true"

                href="http://www.haskell.org/mailman/listinfo/beginners"

                target="_blank">http://www.haskell.org/mailman/listinfo/beginners</a><br>

            </div>

          </div>

        </blockquote>

      </div>

      <br>

    </blockquote>

    <br>

  </body>

</html>