[Haskell-cafe] Downloading web page in Haskell

Sterling Clover s.clover at gmail.com
Sat Nov 20 18:51:24 EST 2010


On Nov 20, 2010, at 5:10 PM, Yitzchak Gale wrote:

> José Romildo Malaquias wrote:
>> Web browsers like Firefox and Opera does not seem to have the same
>> problem with this web page.
>> I would like to be able to download this page from Haskell.
> 
> Hi Romildo,
> 
> This web page serves the head, including a lot of JavaScript,
> and the first few hundred bytes of the body, then pauses.
> That causes web browsers to begin loading and executing
> the JavaScript. Apparently, the site only continues serving
> the rest of the page if the JavaScript is actually loaded and
> executed. If not, it aborts.

Actually, I think it's just a misconfigured proxy. The curl executable fails, at the same point, but a curl --compressed call succeeds. The curl bindings don't allow you to automatically get and decompress gzip data, so you could either set the accept: gzip header yourself, then pipe the output through the appropriate decompression routine, or, more simply, just get the page via using System.Process to drive the curl binary directly.

Cheers,
Sterl


More information about the Haskell-Cafe mailing list