FPS again

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Sat Jul 15 12:04:26 EDT 2006


On Sat, 2006-07-15 at 19:16 +0400, Bulat Ziganshin wrote:
> Hello Donald,
> 
> can you test that this implementation
>   lines = split 0x0a
> is as fast as existing (long) ones both for Lazy and Strict ByteString?

It might actually be the other way around, that the split implementation
could benefit from the work that went into the optimisation of the lines
function. I spent quite some time trying to optimise the lines
implementation, at least for the Lazy module. To get better performance
it relies on the assumption that many lines fit into a chunk. That may
not be true for uses of split in general. It's worth investigating.

Btw, you can run the benchmarks too, they are included in the fps repo.

> also, is not it faster to use the following implementation:
>   isSpaceWord8 = (spacesFlagsArray!)?

Benchmark it and tell us which is faster.

> also, i propose to move getLine/getContents/putStr/interact/readFile-type
> functions into .Char8 modules (both for strict and lazy bytestrings),
> because these functions are encoding-dependent and work with texts
> (as opposite to hGet/hPut which works with raw binary data blocks).

Yes, getLine and putStrLn are encoding dependent (they know the encoding
of '\n'). getContents, putStr, readfile, interact etc are
encoding-independent, they're just the same as hGet/hPut, working on
binary data blocks. Indeed putStr = hPut stdout.

> in particular, i tried to implement Lazy.hGetLines as 'hGetContents >>= lines'
> but it was impossible because 'lines' function is defined only in
> Lazy.Char8 module

Yes, that's the way it should be. And of course there is no need for
hGetLines in the Lazy module since it is just hGetContents >>= lines
In my opinion the hGetLines in the other module should be removed too as
it's just a special case of what the Lazy module does.

> i send you a bunch of small patches that fixes I/O part of library,
> providing the same set of operations for lazy and strict bytestrings,
> for ghc and non-ghc platforms
> 
> also, i run into small problems using FPS repository to development
> (seems that i'm first windows developer of the lib). First, i propose
> to change darcs 'prefs' file to the following:
> 
> test cd tests && make fast
> 
> - it should work both on unix and windows

Fair enough. :-)

> second, i've changed 'time' calls in tests/Makefile to use my own 't'
> utility instead of 'time'. but of course it's not universal solution.
> at least, 'time' in windows shell (cmd.exe) is _built-in_ utility that
> don't have anything common with unix 'time' :)


Duncan



More information about the Libraries mailing list