[Haskell-cafe] Is it possible to have constant-space JSON decoding?

Malcolm Wallace malcolm.wallace at me.com
Fri Dec 7 10:57:49 CET 2012


See also the incremental XML parser in HaXml, described in "Partial parsing: combining choice with commitment", IFL 2006.  It has constant space usage (for some patterns of usage), even with extremely large inputs.

http://www.google.co.uk/url?sa=t&rct=j&q=malcolm+wallace+partial+parsing&source=web&cd=2&ved=0CEEQFjAB&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.135.7512%26rep%3Drep1%26type%3Dpdf&ei=Db3BUNmiOsfS4QTAkoDYAw&usg=AFQjCNHHywUCvaFv8eBoQ-x9jj4GOMHo2w

On 5 Dec 2012, at 05:37, Johan Tibell wrote:

> Hi Oleg,
> 
> On Tue, Dec 4, 2012 at 9:13 PM,  <oleg at okmij.org> wrote:
>> I am doing, for several months, constant-space processing of large XML
>> files using iteratees. The file contains many XML elements (which are
>> a bit complex than a number). An element can be processed
>> independently. After the parser finished with one element, and dumped
>> the related data, the processing of the next element starts anew, so
>> to speak. No significant state is accumulated for the overall parsing
>> sans the counters of processed and bad elements, for statistics. XML
>> is somewhat like JSON, only more complex: an XML parser has to deal
>> with namespaces, parsed entities, CDATA sections and the other
>> interesting stuff. Therefore, I'm quite sure there should not be
>> fundamental problems in constant-space parsing of JSON.
>> 
>> BTW, the parser itself is described there
>>        http://okmij.org/ftp/Streams.html#xml
> 
> It certainly is possible (using a SAX style parser). What you can't
> have (I think) is a function:
> 
>    decode :: FromJSON a => ByteString -> Maybe a
> 
> and constant-memory parsing at the same time. The return type here
> says that we will return Nothing if parsing fails. We can only do so
> after looking at the whole input (otherwise how would we know if it's
> malformed).
> 
> The use cases aeson was designed for (which I bet is the majority use
> case) is parsing smaller messages sent over the network (i.e. in web
> service APIs) so this is the only mode of parsing it supplies.
> 
> -- Johan
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe




More information about the Haskell-Cafe mailing list