Difference between revisions of "Arrays"

From HaskellWiki
Jump to navigation Jump to search
(Initial version)
(formatted code)
Line 52: Line 52:
 
IO monad:
 
IO monad:
   
import Data.Array.IO
+
import Data.Array.IO
main = do arr <- newArray (1,10) 37 :: IO (IOArray Int Int)
+
main = do arr <- newArray (1,10) 37 :: IO (IOArray Int Int)
readArray arr 1 >>= print
+
readArray arr 1 >>= print
writeArray arr 1 64
+
writeArray arr 1 64
readArray arr 1 >>= print
+
readArray arr 1 >>= print
   
 
This program creates array of 10 elements with 37 as initial
 
This program creates array of 10 elements with 37 as initial
Line 72: Line 72:
 
array types allows to work with modifiable arrays in state monad:
 
array types allows to work with modifiable arrays in state monad:
   
import Control.Monad.ST
+
import Control.Monad.ST
import Data.Array.ST
+
import Data.Array.ST
main = print $ runST
+
main = print $ runST
(do arr <- newArray (1,10) 127 :: ST s (STArray s Int Int)
+
(do arr <- newArray (1,10) 127 :: ST s (STArray s Int Int)
a <- readArray arr 1
+
a <- readArray arr 1
writeArray arr 1 216
+
writeArray arr 1 216
b <- readArray arr 1
+
b <- readArray arr 1
return (a,b)
+
return (a,b)
)
+
)
   
 
Believe you or not, but now you know all that needed to _use_ any
 
Believe you or not, but now you know all that needed to _use_ any
Line 148: Line 148:
 
All main array types in this library has their unboxed counterparts:
 
All main array types in this library has their unboxed counterparts:
   
Array - UArray (module Data.Array.Unboxed)
+
Array - UArray (module Data.Array.Unboxed)
IOArray - IOUArray (module Data.Array.IO)
+
IOArray - IOUArray (module Data.Array.IO)
STArray - STUArray (module Data.Array.ST)
+
STArray - STUArray (module Data.Array.ST)
DiffArray - DiffUArray (module Data.Array.Diff)
+
DiffArray - DiffUArray (module Data.Array.Diff)
   
 
So, basically replacing boxed arrays in your program with unboxed ones
 
So, basically replacing boxed arrays in your program with unboxed ones
Line 178: Line 178:
 
retured by the function passed as argument to 'withStorableArray'.
 
retured by the function passed as argument to 'withStorableArray'.
   
{-# OPTIONS_GHC -fglasgow-exts #-}
+
{-# OPTIONS_GHC -fglasgow-exts #-}
import Data.Array.Storable
+
import Data.Array.Storable
import Foreign.Ptr
+
import Foreign.Ptr
import Foreign.C.Types
+
import Foreign.C.Types
  +
 
main = do arr <- newArray (1,10) 37 :: IO (StorableArray Int Int)
+
main = do arr <- newArray (1,10) 37 :: IO (StorableArray Int Int)
readArray arr 1 >>= print
+
readArray arr 1 >>= print
withStorableArray arr $ \ptr ->
+
withStorableArray arr $ \ptr ->
memset ptr 0 40
+
memset ptr 0 40
readArray arr 1 >>= print
+
readArray arr 1 >>= print
  +
 
foreign import ccall unsafe "string.h" memset :: Ptr a -> CInt -> CSize -> IO ()
+
foreign import ccall unsafe "string.h"
  +
memset :: Ptr a -> CInt -> CSize -> IO ()
   
   
Line 228: Line 229:
 
that it is the Last Word of Truth :-)
 
that it is the Last Word of Truth :-)
   
disclaimer: i don't know how to format code/tables, so if you know - please edit this page accordingly. i also not native english speaker, so that this page can contain all sorts of spelling errors. to the final, i never used arrays in my programs (except for parallel arrays), so i can't guarantee that this page don't contains any other types of bugs :-)
+
disclaimer: i'm not native english speaker, so that this page can contain all sorts of spelling errors. moreover, i never used arrays in my programs (except for parallel arrays), so i can't guarantee that this page don't contains any other types of bugs :-)

Revision as of 10:58, 9 January 2006

Haskell'98 supports just one array constructor type, namely Array (see http://haskell.org/onlinereport/array.html). It creates immutable boxed arrays. "Immutable" means that these arrays, like any other pure functional data structures, have contents fixed at construction time - you can't modify it, only query. There is a "modification" operations, but they just return new array and don't modify an original one. This makes possible using Arrays in pure functional code along with lists. "Boxed" means that array elements are just ordinary Haskell (lazy) values, which are evaluated on need, and even can contain bottom (undefined) value. You can learn how to use these arrays at http://haskell.org/tutorial/arrays.html and i recommend you to read this before proceeding to rest of this page

Nowadays three Haskell compilers - GHC, Hugs and NHC - shipped with the same set of Hierarchical Libraries (http://www.haskell.org/ghc/docs/latest/html/libraries/index.html), and these libraries contains new implementation of arrays, which is backward compatible with the Haskell'98 one, but contains far more features. Suffice to say that these libraries supports 9 types of array constructors: Array, UArray, IOArray, IOUArray, STArray, STUArray, DiffArray, DiffUArray and StorableArray. It is no wonder that new arrays library make so much confusion for haskellers, although basically it is very simple - it provides only two interfaces, one of that you already know.

Immutable arrays (module Data.Array.IArray)

The first interface, provided by the new arrays library, is defined by type class IArray (which stands for "immutable array" and defined in module Data.Array.IArray - see http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-IArray.html) and contains just the same operations that was defined for Array in Haskell'98. The only difference is that now it is a typeclass and there are 4 array type constructors, which implement this interface: Array, UArray, DiffArray, DiffUArray. We will describe later differences between them and cases when other types are preferred to use instead of good old Array. Also note that to use Array type constructor together with other new array types, you need to import Data.Array.IArray module instead of Data.Array


Mutable IO arrays (module Data.Array.IO)

Second interface defined by the type class MArray (which stands for "mutable array" and defined in module Data.Array.MArray - see http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-MArray.html) and contains operations to update array elements in-place. Mutable arrays are very like to IORefs, only containing multiple values. Type constructors for mutable arrays are IOArray and IOUArray and operations which create, update and query these arrays all belongs to IO monad:

import Data.Array.IO
main = do arr <- newArray (1,10) 37 :: IO (IOArray Int Int)
          readArray  arr 1 >>= print
          writeArray arr 1 64
          readArray  arr 1 >>= print

This program creates array of 10 elements with 37 as initial values. Then it reads and prints first element of array. After that program modifies first element of array and then reads and prints it again. Type definition in second line is necessary because our little program don't allow compiler to determine concrete type of `arr`.


Mutable arrays in ST monad (module Data.Array.ST)

Just like IORef has more general cousine - STRef, IOArray has more general version - STArray (and IOUArray dubbed by STUArray). These array types allows to work with modifiable arrays in state monad:

import Control.Monad.ST
import Data.Array.ST
main = print $ runST
          (do arr <- newArray (1,10) 127 :: ST s (STArray s Int Int)
              a <- readArray arr 1
              writeArray arr 1 216
              b <- readArray arr 1
              return (a,b)
          )

Believe you or not, but now you know all that needed to _use_ any array type. Unless you are interested in speed issues, just use Array, IOArray and STArray where appropriate. The following topics are almost exclusively about selecting proper array type to make program run faster.


DiffArray (module Data.Array.Diff)

As we already stated, update operation on immutable arrays (IArray) just creates new copy of array, what is very inefficient, but it is pure operation what can be used in pure functions. On the other hand, updates on mutable arrays (MArray) are efficient but can be done only in monadic code. DiffArray combines the best of both worlds - it supports interface of IArray and therefore can be used in pure functional way, but internally used an efficient updating of MArrays.

How that trick works? DiffArray has pure external interface, but internally it represented as the reference to IOArray.

When the '//' operator is applied to a diff array, its contents are physically updated in place. The old array silently changes its representation without changing the visible behavior: it stores a link to the new current array along with the difference to be applied to get the old contents.

So if a diff array is used in a single-threaded style, i.e. after '//' application the old version is no longer used, a!i takes O(1) time and a//d takes O(length d). Accessing elements of older versions gradually becomes slower.

Updating an array which is not current makes a physical copy. The resulting array is unlinked from the old family. So you can obtain a version which is guaranteed to be current and thus have fast element access by a//[].

Library provides two "differential" array costructors - DiffArray, made internally from IOArray, and DiffUArray, based on IOUArray. But if you need, you can construct new "differential" array types from any 'MArray' types living in the 'IO' monad. See the module internals for further details


Unboxed arrays

Unboxed arrays are like arrays in C - they contains just the plain values without extra level of indirection, so that, for example, array of 1024 values of type Int32 will use only 4 kb of memory. Moreover, indexing of such arrays works significantly faster.

Of course, unboxed arrays have their own disadvantages. First, unboxed arays can be made only of plain values having fixed size - Int, Word, Char, Bool, Ptr, Double (see full list on http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-Unboxed.html). You can even implement yourself unboxed arrays interface for other simple types, including enumerations. But Integer, String and any other types defined with variants cannot form the unboxed arrays. Second, all elements in unboxed array are evaluated when array is created, so you can't use benefits of lazy evaluation for elements of such array. Nevertheless, unboxed arrays are very useful optimization instrument, so i recommend to use them as much as possible.

All main array types in this library has their unboxed counterparts:

Array - UArray          (module Data.Array.Unboxed)
IOArray - IOUArray      (module Data.Array.IO)
STArray - STUArray      (module Data.Array.ST)
DiffArray - DiffUArray  (module Data.Array.Diff)

So, basically replacing boxed arrays in your program with unboxed ones is very simple - just add 'U' to type signatures and you are done! If you changed Array to UArray, you also need to add "Data.Array.Unboxed" to your imports list


StorableArray (module Data.Array.Storable)

A storable array is an IO-mutable array which stores its contents in a contiguous memory block living in the C heap. Elements are stored according to the class 'Storable'. You can obtain the pointer to the array contents to manipulate elements from languages like C.

It is similar to 'IOUArray' (in particular, it implements the same MArray interface) but slower. Its advantage is that it's compatible with C. Memory address of storable arrays are fixed, so you can pass them to C routines.

The pointer to the array contents is obtained by 'withStorableArray'. The idea is similar to 'ForeignPtr' (used internally here). The pointer should be used only during execution of the 'IO' action retured by the function passed as argument to 'withStorableArray'.

{-# OPTIONS_GHC -fglasgow-exts #-}
import Data.Array.Storable
import Foreign.Ptr
import Foreign.C.Types

main = do arr <- newArray (1,10) 37 :: IO (StorableArray Int Int)
          readArray arr 1 >>= print
          withStorableArray arr $ \ptr ->
              memset ptr 0 40
          readArray arr 1 >>= print

foreign import ccall unsafe "string.h" 
    memset  :: Ptr a -> CInt -> CSize -> IO ()


If you want to use this pointer afterwards, ensure that you call 'touchStorableArray' AFTER the last use of the pointer, so that the array will be not freed too early.


The Haskell Array Preprocessor (STPP)

Using in Haskell mutable arrays (IO and ST ones) is not very handy. But there is one tool which adds syntax sugar and makes using of such arrays very close to that in imperative languages. It is written by Hal Daume III and you can get it as http://www.isi.edu/~hdaume/STPP/stpp.tar.gz

Using this tool, you can index array elements in arbitrary complex expressions with just "arr[|i|]" notation and this preprocessor will automatically convert such syntax forms to appropriate calls to 'readArray' and 'writeArray'. Multi-dimensional arrays are also supported, with indexing in the form "arr[|i|][|j|]". See further descriptions at http://www.isi.edu/~hdaume/STPP/


Unsafe indexing, freezing/thawing, running over array elements

GHC-specific topics

Parallel arrays (module GHC.PArr)

Welcome to machine: Array#, MutableArray#, ByteArray#, MutableByteArray#, pinned and moveable byte arrays

Notes for contributors to this page

if you have any questions, please ask at the IRC/maillist. if you have any answers, please submit them directly to this page. please don't sign your contributions, so that anyone will feel free to further improve this page. but if you are compiler/Array libraries author - please sign your text to let us know that it is the Last Word of Truth :-)

disclaimer: i'm not native english speaker, so that this page can contain all sorts of spelling errors. moreover, i never used arrays in my programs (except for parallel arrays), so i can't guarantee that this page don't contains any other types of bugs :-)