HEAD: Deterioration in ByteString I/O

Daniel Fischer daniel.is.fischer at web.de
Wed Sep 8 19:28:04 EDT 2010


On Wednesday 08 September 2010 23:55:35, Don Stewart wrote:
> simonpj:
> > |  > ghc-6.12.3:
> > |  >       89,330,672 bytes allocated in the heap
> > |  >           15,092 bytes copied during GC
> > |  >           35,980 bytes maximum residency (1 sample(s))
> > |  >           29,556 bytes maximum slop
> > |  >                2 MB total memory in use (0 MB lost due to
> > |  > fragmentation)
> > |  >
> > |  > ghc-6.13.20100831:
> > |  >      475,305,720 bytes allocated in the heap
> > |  >           89,272 bytes copied during GC
> > |  >           68,860 bytes maximum residency (1 sample(s))
> > |  >           29,444 bytes maximum slop
> > |  >                2 MB total memory in use (0 MB lost due to
> > |  > fragmentation)
> > |
> > |  Can you put your benchmark code somewhere?  Likely a GHC
> > | regression.
> >
> > Indeed bad. If someone could characterise the regression more
> > precisely (e.g. "fusion isn't happening here") that would be jolly
> > helpful.
>
> Shouldn't be fusion. Is this a straight IO function. Something to do
> with buffering/encoding?

Maybe the following observation helps:

ghc-6.13.20100831 reads lazy ByteStrings in chunks of 8192 bytes.

If I understand correctly, that means (since defaultChunkSize = 32760)
- bytestring allocates a 32K buffer to be filled and asks ghc for 32760 
bytes in that buffer
- ghc asks the OS for 8192 bytes (and usually gets them)
- upon receiving fewer bytes than requested, bytestring copies them to a 
new smaller buffer
- since the number of bytes received is a multiple of ghc's allocation 
block size (which I believe is 4K), there's no space for the bookkeeping 
overhead, hence the new buffer takes up 12K instead of 8, resulting in 44K 
allocation for 8K bytes

That factor of 5.5 corresponds pretty well with the allocation figures 
above, and the extra copying explains the approximate doubling of I/O time.

Trying to find out why ghc asks the OS for only 8192 bytes instead of 32760 
hasn't brought enlightenment yet.

Cheers,
Daniel

Excerpt of strace log:

read(3, "%!PS-Adobe-2.0\n%%Title: nbench\n%"..., 8192) = 8192
open("/usr/lib/gconv/UTF-32.so", O_RDONLY) = 4
read(4, 
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0`\4\0\0004\0\0\0"..., 512) 
= 512
fstat64(4, {st_mode=S_IFREG|0755, st_size=9672, ...}) = 0
mmap2(NULL, 12328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 
0xb7852000
fadvise64(4, 0, 12328, POSIX_FADV_WILLNEED) = 0
mmap2(0xb7854000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 4, 0x1) = 0xb7854000
close(4)                                = 0
mprotect(0xb7854000, 4096, PROT_READ)   = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbff37cb0) = -1 ENOTTY 
(Inappropriate ioctl for device)
select(2, [], [1], NULL, {0, 0})        = 1 (out [1], left {0, 0})
write(1, "", 0)                         = 0
select(2, [], [1], NULL, {0, 0})        = 1 (out [1], left {0, 0})
write(1, "%!PS-Adobe-2.0\n%%Title: nbench\n%"..., 8192) = 8192
read(3, " 20.000000 lineto\n121.153524 20."..., 8192) = 8192
select(2, [], [1], NULL, {0, 0})        = 1 (out [1], left {0, 0})
write(1, "", 0)                         = 0
select(2, [], [1], NULL, {0, 0})        = 1 (out [1], left {0, 0})
write(1, " 20.000000 lineto\n121.153524 20."..., 8192) = 8192
read(3, "30.542315 21.394403 lineto\n125.3"..., 8192) = 8192
select(2, [], [1], NULL, {0, 0})        = 1 (out [1], left {0, 0})
write(1, "", 0)                         = 0
select(2, [], [1], NULL, {0, 0})        = 1 (out [1], left {0, 0})
write(1, "30.542315 21.394403 lineto\n125.3"..., 8192) = 8192
read(3, "neto\n308.929337 21.969871 lineto"..., 8192) = 8192


More information about the Glasgow-haskell-users mailing list