Unboxed arrays of Char are broken

Manuel M. T. Chakravarty chak@cse.unsw.edu.au
Sat, 30 Dec 2000 10:40:30 +1100


qrczak@knm.org.pl (Marcin 'Qrczak' Kowalczyk) wrote,

> Both {read,write}CharArray and overloaded MArray operations work on
> 32-bit Chars (by using {read,write}IntArray# - well, they waste half
> of memory on 64-bit machines).
> 
> I once made newCharArray allocate an array appropriately sized for
> this. But now it allocates only bytes. This is inconsistent with
> read/write, and thus using unboxed arrays of Chars corrupts memory.
> 
> There is a comment in ArrayBase.hs saying:
> -- Char arrays really contain only 8-bit bytes for compatibility.
> 
> I guess the following is the right solution:
> 
>   ByteArrays operated on with {new,read,write}CharArray are indeed
>   assumed somewhere to contain 8-bit bytes. MArray operations don't
>   have the compatibility baggage. In this case {new,read,write}Array in
>   Char instance should be fixed (to use {new,read,write}IntArray) and
>   {read,write}CharArray should be fixed (to use {read,write}CharArray#
>   again).

Sounds reasonable to me.

> BTW. I am surprised to discover that newArray does not take the initial
> value as an argument and fills the array with bottoms (when boxed) or
> probably garbage (or zeros? - when unboxed). IMHO usually a concrete
> value is needed to initialize the array with, and now code must do
> an explicit loop after creation. Is that for speeding up allocation
> of uninitialized unboxed arrays? If so, it slows down allocation of
> initialized boxed arrays with values other than bottom, because of
> two loops instead of one. 

I don't understand why having allocation without
initialisation slows down allocation of initialised, boxed
arrays.  Two loops are used to catch the case where some
array locations are not initialised by the index/values
pairs given when creating the array, but what does array
allocation have to do with this?  Or in other words, how do
you want to get rid of one loop?

BTW, if you really want to speed this up, cache usage is the
most important issue to optimise.  Two loops over one array
where the array fits into the 1st level cache is not much
slower than just one loop.  On the other hand, it gets
really slow when the array doesn't fit into the 2nd level
cache (or 3rd level cache if you got one).  In this case,
the best way to optimise this is to break the array
traversal into blocks, each of which should fit into the
cache (ideally into the 1st level).  Then, do both loops for
one block before moving to the next block.  This halfs the
number of cache misses, which usually gives you a dramatic
speed up.  Note that this optimisation is possible only if
newArray does *not* initialise the new array.

Unfortunately, Haskell's array constructor doesn't let you
do this optimisation without sorting the index/value pairs
first.  It might be possible to achieve a similar effect by
maintaining a bit map specifying which cache lines in the
new array have already been initialised and do the
initialisation of a cache line just before the first value
of the index/value pair is written to that particular cache
line.  Then, you might get reasonable performance whenever
the index/value pair list is more or less sorted.

Overall, the Haskell array interface isn't really great for
performance, but it is definitely not the array
initilisation loop that causes the problems.

> I would have the initializing version as
> the basic one and provide newUninitializedArray or something if it's
> really needed. 

I definitely need the non-initialising version and, as I
said, I don't believe that the initialising version would
buy us something substantial.

Cheers,
Manuel