Difference between revisions of "Numeric Haskell: A Repa Tutorial"

From HaskellWiki
Jump to navigation Jump to search
Line 1: Line 1:
[http://hackage.haskell.org/package/repa Repa] is a Haskell library for high performance, regular, multi-dimensional parallel arrays. All numeric data is stored unboxed. Functions written with the Repa combinators are automatically parallel provided you supply +RTS -Nwhatever on the command line when running the program.
+
[http://hackage.haskell.org/package/repa Repa] is a Haskell library for
  +
high performance, regular, multi-dimensional parallel arrays. All
  +
numeric data is stored unboxed. Functions written with the Repa
  +
combinators are automatically parallel provided you supply "+RTS -N" on
  +
the command line when running the program.
   
  +
This document provides a tutorial on array programming in Haskell using
See also the [http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Vector_Tutorial vector tutorial].
 
  +
the repa package.
  +
 
''Note:'' a companion tutorial to this is provided in [http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Vector_Tutorial vector tutorial].
   
 
= Quick Tour =
 
= Quick Tour =
Line 39: Line 46:
 
* [http://hackage.haskell.org/package/repa-examples repa-examples]
 
* [http://hackage.haskell.org/package/repa-examples repa-examples]
   
== Index Types ==
+
== Index types and shapes ==
   
  +
Before we can get started manipulating arrays, we need a grasp of repa's
Much like the classic 'array' library in Haskell, repa-based arrays are
+
notion of array shape. Much like the classic 'array' library in Haskell, repa-based arrays are
 
parameterized via a type which determines the dimension of the array, and the type of its index. However, while classic arrays take tuples to represent multiple dimensions, Repa
 
parameterized via a type which determines the dimension of the array, and the type of its index. However, while classic arrays take tuples to represent multiple dimensions, Repa
 
arrays use a [http://hackage.haskell.org/packages/archive/repa/2.0.0.3/doc/html/Data-Array-Repa-Shape.html#t:Shape richer type language] for array indices and shapes.
 
arrays use a [http://hackage.haskell.org/packages/archive/repa/2.0.0.3/doc/html/Data-Array-Repa-Shape.html#t:Shape richer type language] for array indices and shapes.
Line 63: Line 71:
 
Array DIM2 Double
 
Array DIM2 Double
   
is a two-dimensional array of doubles, indexed via `Int` keys.
+
is a two-dimensional array of doubles, indexed via `Int` keys, while
  +
  +
Array Z Double
  +
  +
is a zero-dimension object (i.e. a point) holding a Double.
   
 
Many operations over arrays are polymorphic in the shape / dimension component.
 
Many operations over arrays are polymorphic in the shape / dimension component.
Others require operating on the shape itself, rather than the array.
+
Others require operating on the shape itself, rather than the array. A
  +
typeclass, <hask>Shape</hask>, lets us operate uniformally over arrays
  +
with different shape.
  +
  +
== Shapes ==
   
 
To build values of `shape` type, we can use the `Z` and `:.` constructors:
 
To build values of `shape` type, we can use the `Z` and `:.` constructors:
Line 115: Line 131:
   
 
The type of `x` is inferred as:
 
The type of `x` is inferred as:
 
   
 
> :t x
 
> :t x
Line 143: Line 158:
 
To access elements in repa arrays, you provide an array and a shape, to access the element:
 
To access elements in repa arrays, you provide an array and a shape, to access the element:
   
(!) :: (Shape sh, Elt a) => Array sh a -> sh -> a
+
(!) :: (Shape sh, Elt a) => Array sh a -> sh -> a
   
  +
So:
== Modifying arrays ==
 
   
  +
> let x = fromList (Z :. (10::Int)) [1..10]
== Generating arrays ==
 
  +
> x ! (Z :. 2)
  +
3.0
   
  +
Note that we can't, even for one-dimensional arrays, give just a bare literal as the shape:
   
  +
> x ! 2
   
  +
No instance for (Num (Z :. Int))
== Modifying arrays ==
 
  +
arising from the literal `2'
   
  +
as the Z type isn't in the Num class.
== Indexing arrays ==
 
   
  +
What if the index is out of bounds, though?
To access elements in repa arrays, you provide an array and a shape, to access the element:
 
  +
  +
> x ! (Z :. 11)
  +
*** Exception: ./Data/Vector/Generic.hs:222 ((!)): index out of bounds (11,10)
  +
  +
an exception is thrown. An altnerative is to indexing functions that return a Maybe:
  +
 
(!?) :: (Shape sh, Elt a) => Array sh a -> sh -> Maybe a
  +
  +
An example:
  +
  +
> x !? (Z :. 9)
  +
Just 10.0
  +
  +
> x !? (Z :. 11)
  +
Nothing
  +
 
== Operations on arrays ==
  +
  +
=== Numeric operations: negation, addition, subtraction, multiplication ===
  +
  +
Repa arrays are instances of the <code>Num</code>. This means that operations
 
on numerical elements are lifted automagically onto arrays of such elements:
  +
 
For example, <hask>(+)</hask> on two double values corresponds to element-wise addition,
  +
<hask>(+)</hask>, of the two arrays of doubles:
  +
  +
> let x = fromList (Z :. (10::Int)) [1..10]
  +
> x + x
  +
[2.0,4.0,6.0,8.0,10.0,12.0,14.0,16.0,18.0,20.0]
   
  +
Other operations from the Num class work just as well:
(!) :: (Shape sh, Elt a) => Array sh a -> sh -> a
 
   
  +
> -x
= Syntax =
 
  +
[-1.0,-2.0,-3.0,-4.0,-5.0,-6.0,-7.0,-8.0,-9.0,-10.0]
   
  +
> x ^ 3
Repa arrays are instances of `Num`. This means that operations on numerical elements are lifted automagically onto arrays of such elements:
 
  +
[1.0,8.0,27.0,64.0,125.0,216.0,343.0,512.0,729.0,1000.0]
   
  +
> x * x
For example, `(+)` on two double values corresponds to zip-wise `(+)` on two arrays of doubles.
 
  +
[1.0,4.0,9.0,16.0,25.0,36.0,49.0,64.0,81.0,100.0]

Revision as of 21:21, 9 May 2011

Repa is a Haskell library for high performance, regular, multi-dimensional parallel arrays. All numeric data is stored unboxed. Functions written with the Repa combinators are automatically parallel provided you supply "+RTS -N" on the command line when running the program.

This document provides a tutorial on array programming in Haskell using the repa package.

Note: a companion tutorial to this is provided in vector tutorial.

Quick Tour

Repa (REgular PArallel arrays) is an advanced, multi-dimensional parallel arrays library for Haskell, with a number of distinct capabilities:

  • The arrays are "regular" (i.e. dense and rectangular); and
  • Functions may be written that are polymorphic in the shape of the array;
  • Many operations on arrays are accomplished by changing only the shape of the array (without copying elements);
  • The library will automatically parallelize operations over arrays.

This is a quick start guide for the package.

Importing the library

Download the `repa` package:

  $ cabal install repa

and import it qualified:

  import qualified Data.Array.Repa as R

The library needs to be imported qualified as it shares the same function names as list operations in the Prelude.

Note: Operations that involve writing new index types for Repa arrays will require the '-XTypeOperators' language extension.

For non-core functionality, a number of related packages are available:

* repa-bytestring
* repa-io
* repa-algorithms

and example algorithms in:

* repa-examples

Index types and shapes

Before we can get started manipulating arrays, we need a grasp of repa's notion of array shape. Much like the classic 'array' library in Haskell, repa-based arrays are parameterized via a type which determines the dimension of the array, and the type of its index. However, while classic arrays take tuples to represent multiple dimensions, Repa arrays use a richer type language for array indices and shapes.

Index types consist of two parts:

  • a dimension component; and
  • an index type

The most common dimensions are given by the shorthand names:

   type DIM0 = Z
   type DIM1 = DIM0 :. Int
   type DIM2 = DIM1 :. Int
   type DIM3 = DIM2 :. Int
   type DIM4 = DIM3 :. Int
   type DIM5 = DIM4 :. Int

thus,

   Array DIM2 Double

is a two-dimensional array of doubles, indexed via `Int` keys, while

   Array Z Double

is a zero-dimension object (i.e. a point) holding a Double.

Many operations over arrays are polymorphic in the shape / dimension component. Others require operating on the shape itself, rather than the array. A typeclass, Shape, lets us operate uniformally over arrays with different shape.

Shapes

To build values of `shape` type, we can use the `Z` and `:.` constructors:

   > Z         -- the zero-dimension
   Z

For arrays of non-zero dimension, we must give a size. A common error is to leave off the type of the size,

   > :t Z :. 10
   Z :. 10 :: Num head => Z :. head

For arrays of non-zero dimension, we must give a size. A common error is to leave off the type of the size,

   > :t Z :. 10
   Z :. 10 :: Num head => Z :. head

leading to annoying type errors about unresolved instances, such as:

   No instance for (Shape (Z :. head0))

To select the correct instance, you will need to annotate the size literals with their concrete type:

   > :t Z :. (10 :: Int)
   Z :. (10 :: Int) :: Z :. Int

is the shape of 1D arrays of length 10, indexed via Ints.

Generating arrays

New repa arrays ("arrays" from here on) can be generated in many ways:

   $ ghci
   GHCi, version 7.0.3: http://www.haskell.org/ghc/  :? for help
   Loading package ghc-prim ... linking ... done.
   Loading package integer-gmp ... linking ... done.
   Loading package base ... linking ... done.
   Loading package ffi-1.0 ... linking ... done.
   Prelude> :m + Data.Array.Repa

They may be constructed from lists:

A one dimensional array of length 10, here, given the shape `(Z :. 10)`:

   > let x = fromList (Z :. (10::Int)) [1..10]
   > x
   [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]

The type of `x` is inferred as:

   > :t x
   x :: Array (Z :. Int) Double

which we can read as "an array of dimension 1, indexed via Int keys, holding elements of type Double"

We could also have written the type as:

   x :: Array DIM1 Double

The same data may also be treated as a two dimensional array:

   > let x = fromList (Z :. (5::Int) :. (2::Int)) [1..10]
   > x
   [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]

which would have the type:

   x :: Array ((Z :. Int) :. Int) Double

or

   x :: Array DIM2 Double

Indexing arrays

To access elements in repa arrays, you provide an array and a shape, to access the element:

   (!) :: (Shape sh, Elt a) => Array sh a -> sh -> a

So:

   > let x = fromList (Z :. (10::Int)) [1..10]
   > x ! (Z :. 2)
   3.0    

Note that we can't, even for one-dimensional arrays, give just a bare literal as the shape:

   > x ! 2
       No instance for (Num (Z :. Int))
         arising from the literal `2'

as the Z type isn't in the Num class.

What if the index is out of bounds, though?

   > x ! (Z :. 11)
   *** Exception: ./Data/Vector/Generic.hs:222 ((!)): index out of bounds (11,10)

an exception is thrown. An altnerative is to indexing functions that return a Maybe:

   (!?) :: (Shape sh, Elt a) => Array sh a -> sh -> Maybe a

An example:

   > x !? (Z :. 9)
   Just 10.0
   > x !? (Z :. 11)
   Nothing

Operations on arrays

Numeric operations: negation, addition, subtraction, multiplication

Repa arrays are instances of the Num. This means that operations on numerical elements are lifted automagically onto arrays of such elements:

For example, (+) on two double values corresponds to element-wise addition, (+), of the two arrays of doubles:

   > let x = fromList (Z :. (10::Int)) [1..10]
   > x + x
   [2.0,4.0,6.0,8.0,10.0,12.0,14.0,16.0,18.0,20.0]

Other operations from the Num class work just as well:

   > -x
   [-1.0,-2.0,-3.0,-4.0,-5.0,-6.0,-7.0,-8.0,-9.0,-10.0]
   > x ^ 3
   [1.0,8.0,27.0,64.0,125.0,216.0,343.0,512.0,729.0,1000.0]
   > x * x
   [1.0,4.0,9.0,16.0,25.0,36.0,49.0,64.0,81.0,100.0]