1. Oops - I overlooked the fact that the redirectCount attribute of a Request is exported (it isn't listed on the <a href="http://hackage.haskell.org/packages/archive/http-conduit/1.2.0/doc/html/Network-HTTP-Conduit.html">documentation</a> probably because the constructor itself isn't exported. This seems like a flaw in Haddock...). Silly me. No need to export httpRaw.<div>
<br></div><div>2. I think that stuffing many arguments into the 'http' function is ugly. However, I'm not sure that the number of arguments to 'http' could ever reach an unreasonably large amount. Perhaps I have bad foresight, but I personally feel that adding cookies to the http request will be the last thing that we will need to add. Putting a bound on this growth of arguments makes me more willing to think about this option. On the other hand, using a BrowserAction to modify internal state is very elegant. Which approach do you think is best? I think I'm leaning toward the upper-level Browser module idea.</div>
<div><br></div><div>If there was to be a higher-level HTTP library, I would argue that the redirection code should be moved into it, and the only high-level function that the Network.HTTP.Conduit module would export is 'http' (or httpRaw). What do you think about this?</div>
<div><br></div><div>Thanks for helping me out with this,</div><div>Myles C. Maxfield</div><div><br></div><div><div class="gmail_quote">On Sun, Jan 22, 2012 at 9:56 PM, Michael Snoyman <span dir="ltr"><<a href="mailto:michael@snoyman.com">michael@snoyman.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Sun, Jan 22, 2012 at 11:07 PM, Myles C. Maxfield<br>
<div><div class="h5"><<a href="mailto:myles.maxfield@gmail.com">myles.maxfield@gmail.com</a>> wrote:<br>
> Replies are inline. Thanks for the quick and thoughtful response!<br>
><br>
> On Sat, Jan 21, 2012 at 8:56 AM, Michael Snoyman <<a href="mailto:michael@snoyman.com">michael@snoyman.com</a>><br>
> wrote:<br>
>><br>
>> Hi Myles,<br>
>><br>
>> These sound like two solid features, and I'd be happy to merge in code to<br>
>> support it. Some comments below.<br>
>><br>
>> On Sat, Jan 21, 2012 at 8:38 AM, Myles C. Maxfield<br>
>> <<a href="mailto:myles.maxfield@gmail.com">myles.maxfield@gmail.com</a>> wrote:<br>
>>><br>
>>> To: Michael Snoyman, author and maintainer of http-conduit<br>
>>> CC: haskell-cafe<br>
>>><br>
>>> Hello!<br>
>>><br>
>>> I am interested in contributing to the http-conduit library. I've been<br>
>>> using it for a little while and reading through its source, but have felt<br>
>>> that it could be improved with two features:<br>
>>><br>
>>> Allowing the caller to know the final URL that ultimately resulted in the<br>
>>> HTTP Source. Because httpRaw is not exported, the caller can't even<br>
>>> re-implement the redirect-following code themselves. Ideally, the caller<br>
>>> would be able to know not only the final URL, but also the entire chain of<br>
>>> URLs that led to the final request. I was thinking that it would be even<br>
>>> cooler if the caller could be notified of these redirects as they happen in<br>
>>> another thread. There are a couple ways to implement this that I have been<br>
>>> thinking about:<br>
>>><br>
>>> A straightforward way would be to add a [W.Ascii] to the type of<br>
>>> Response, and getResponse can fill in this extra field. getResponse already<br>
>>> knows about the Request so it can tell if the response should be gunzipped.<br>
>><br>
>> What would be in the [W.Ascii], a list of all paths redirected to? Also,<br>
>> I'm not sure what gunzipping has to do with here, can you clarify?<br>
>><br>
><br>
> Yes; my idea was to make the [W.Ascii] represent the list of all URLs<br>
> redirected to, in order.<br>
><br>
> My comment about gunzipping is only tangentially related. I meant that in<br>
> the latest version of the code on GitHub, the getResponse function already<br>
> takes a Request as an argument. This means that the getResponse function<br>
> already knows what URL its data is coming from, so modifying the getResponse<br>
> function to return that URL is simple. (I mentioned gunzip because, as far<br>
> as I can tell, the reason that getResponse already takes a Request is so<br>
> that the function can tell if the request should be gunzipped.)<br>
>>><br>
>>> It would be nice for the caller to be able to know in real time what URLs<br>
>>> the request is being redirected to. A possible way to do this would be for<br>
>>> the 'http' function to take an extra argument of type (Maybe<br>
>>> (Control.Concurrent.Chan W.Ascii)) which httpRaw can push URLs into. If the<br>
>>> caller doesn't want to use this variable, they can simply pass Nothing.<br>
>>> Otherwise, the caller can create an IO thread which reads the Chan until<br>
>>> some termination condition is met (Perhaps this will change the type of the<br>
>>> extra argument to (Maybe (Chan (Maybe W.Ascii)))). I like this solution,<br>
>>> though I can see how it could be considered too heavyweight.<br>
>><br>
>><br>
>> I do think it's too heavyweight. I think if people really want lower-level<br>
>> control of the redirects, they should turn off automatic redirect and allow<br>
>> 3xx responses.<br>
><br>
> Yeah, that totally makes more sense. As it stands, however, httpRaw isn't<br>
> exported, so a caller has no way of knowing about each individual HTTP<br>
> transaction. Exporting httpRaw solves the problem I'm trying to solve. If we<br>
> export httpRaw, should we also make 'http' return the URL chain? Doing both<br>
> is probably the best solution, IMHO.<br>
<br>
</div></div>What's the difference between calling httpRaw and calling http with<br>
redirections turned off?<br>
<div><div class="h5"><br>
>>><br>
>>> Making the redirection aware of cookies. There are redirects around the<br>
>>> web where the first URL returns a Set-Cookie header and a 3xx code which<br>
>>> redirects to another site that expects the cookie that the first HTTP<br>
>>> transaction set. I propose to add an (IORef to a Data.Set of Cookies) to the<br>
>>> Manager datatype, letting the Manager act as a cookie store as well as a<br>
>>> repository of available TCP connections. httpRaw could deal with the cookie<br>
>>> store. Network.HTTP.Types does not declare a Cookie datatype, so I would<br>
>>> probably be adding one. I would probably take it directly from<br>
>>> Network.HTTP.Cookie.<br>
>><br>
>> Actually, we already have the cookie package for this. I'm not sure if<br>
>> putting the cookie store in the manager is necessarily the right approach,<br>
>> since I can imagine wanting to have separate sessions while reusing the same<br>
>> connections. A different approach could be adding a list of Cookies to both<br>
>> the Request and Response.<br>
><br>
> Ah, looks like you're the maintainer of that package as well! I didn't<br>
> realize it existed. I should have, though; Yesod must need to know about<br>
> cookies somehow.<br>
><br>
> As the http-conduit package stands, the headers of the original Request can<br>
> be set, and the headers of the last Response can be read. Because cookies<br>
> are implemented on top of headers, the caller knows about the cookies before<br>
> and after the redirection chain. I'm more interested in the preservation of<br>
> cookies within the redirection chain. As discussed earlier, exposing the<br>
> httpRaw function allows the entire redirection chain to be handled by the<br>
> caller, which alleviates the problem.<br>
><br>
> That being said, however, the simpleHttp function (and all functions built<br>
> upon 'http' inside of http-conduit) should probably respect cookies inside<br>
> redirection chains. Under the hood, Network.Browser does this by having the<br>
> State monad keep track of these cookies (as well as the connection pool) and<br>
> making HTTP requests mutate that State, but that's a pretty different<br>
> architecture than Network.HTTP.Conduit.<br>
><br>
> One way I can think to do this would be to let the user supply a CookieStore<br>
> (probably implemented as a (Data.Set Web.Cookie.SetCookie)) and receive a<br>
> (different) CookieStore from the 'http' function. That way, the caller can<br>
> manage the CookieStores independently from the connection pool. The downside<br>
> is that it's one more bit of ugliness the caller has to deal with. How do<br>
> you feel about this? You probably have a better idea :-)<br>
<br>
</div></div>The only idea was to implement an extra layer of cookie-away functions<br>
in a separate Browser module. That's been the running assumption for a<br>
while now, since HTTP does it, but I'm not opposed to taking a<br>
different approach.<br>
<br>
It could be that the big mistake in all this was putting redirection<br>
at the layer of the API that I did. Yitz Gale pointed out that in<br>
Python, they have the low-level API and the high-level API, the latter<br>
dealing with both redirection and cookies.<br>
<br>
Anyway, here's one possible approach to the whole situation: `Request`<br>
could have an extra record on it of type `Maybe (IORef (Set<br>
SetCookie))`. When `http` is called, if the record is `Nothing`, a new<br>
value is created. Every time a request is made, the value is updated<br>
accordingly. That way, redirects will respect cookies for the current<br>
sessions, and if you want to keep a longer-term session, you can keep<br>
reusing the record in different `Request`s. We can also add some<br>
convenience functions to automatically reuse the cookie set.<br>
<span class="HOEnZb"><font color="#888888"><br>
Michael<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
>> I'd be happy to do both of these things, but I'm hoping for your input on<br>
>> how to go about this endeavor. Are these features even good to be pursuing?<br>
>> Should I be going about this entirely differently?<br>
>><br>
>> Thanks,<br>
>> Myles C. Maxfield<br>
>><br>
>> P.S. I'm curious about the lack of Network.URI throughout<br>
>> Network.HTTP.Conduit. Is there a particular design decision that led you to<br>
>> use raw ascii strings?<br>
><br>
><br>
> Because there are plenty of URIs that are valid that we don't handle at all,<br>
> e.g., ftp.<br>
><br>
> I'm a little surprised by this, since you can easily test for unhandled URIs<br>
> because they're already parsed. Whatever; It doesn't really matter to me, I<br>
> was just surprised by it.<br>
><br>
> Michael<br>
><br>
> Thanks again for the feedback! I'm hoping to make a difference :]<br>
><br>
> --Myles<br>
</div></div></blockquote></div><br></div>