[Haskell-cafe] Haskell & monads for newbies

Andrew Coppin andrewcoppin at btinternet.com
Sun Jul 15 17:06:42 EDT 2007


Paul Moore wrote:
> Haskell handles this with laziness. The canonical example is counting
> characters in a file, where you just grab the whole file, and use
> length. An imperative programmer's intuition says that this wastes
> huge amounts of memory compared to reading character by character and
> incrementing a count. Lazy I/O means that no more than 1 character
> needs to be in RAM at any one time, without the programmer need to do
> the bookkeeping.

Indeed, I had *this* conversation with Mr C++ as well... He proudly 
showed off a 3-page alphabet soup of C++ which allows him to do 
bit-level processing of a file as if it's really a collection of bits. 
And I said that in my program, I just grab a list of bytes and convert 
it into a list of bits. And he was like "wow - that's going to waste a 
heck of a lot of RAM..."

But using the magic of getContents... actually no, it isn't. ;-)

> If lazy I/O was publicised in this way, as separation of concerns (I/O
> and processing) with the compiler and language handling the work of
> minimising memory use and avoiding unnecessary I/O, then maybe the
> message might get through better. However, the only article I've ever
> seen taking this approach (http://blogs.nubgames.com/code/?p=22)
> didn't seem to get a good reception in the Haskell community, sparking
> comments that hGetContents and similar functions had a number of
> issues which made them "bad practice". The result was to leave me with
> a feeling that separating I/O and processing in Haskell really was
> hard, but I never quite understood why...
>
> So I guess that leaves me with the question: is separating I/O and
> processing really the right thing to do (in terms of memory usage and
> performance) in Haskell, and if so, why isn't it advertised more?

It's something I use all the time...

Of course, as soon as you want to scan the data *twice*... well, if you 
do it in the obvious way, the GC system will hold who knows how many MB 
(or even GB) of data in RAM ready for you to scan it the second time.

I have a vague recollection of somebody muttering something about 
ByteStrings and memory-mapped files...?



More information about the Haskell-Cafe mailing list