# Numeric Haskell: A Vector Tutorial

### From HaskellWiki

DonStewart (Talk | contribs) |
DonStewart (Talk | contribs) |
||

Line 287: | Line 287: | ||

./vector data 0.08s user 0.01s system 98% cpu 0.088 total |
./vector data 0.08s user 0.01s system 98% cpu 0.088 total |
||

− | == Basic Operations == |
+ | === Random numbers === |

+ | |||

+ | If we can parse from a file, we can also fill a vector with random numbers. |
||

+ | We'll use the [http://hackage.haskell.org/package/mersenne-random mersenne-random] package: |
||

+ | |||

+ | $ cabal install mersenne-random |
||

+ | |||

+ | We can then use MTRandom class to generate random vectors of different types: |
||

+ | |||

+ | <haskell> |
||

+ | import qualified Data.Vector.Unboxed as U |
||

+ | import System.Random.Mersenne |
||

+ | import Control.Monad |
||

+ | |||

+ | main = do |
||

+ | -- create a new source of randomness |
||

+ | -- andan infinite list of randoms |
||

+ | g <- newMTGen Nothing |
||

+ | rs <- randoms g |
||

+ | |||

+ | -- fill a vector with the first 10 random Ints |
||

+ | let a = U.fromList (take 10 rs) :: U.Vector Int |
||

+ | |||

+ | -- print the sum |
||

+ | print (U.sum a) |
||

+ | |||

+ | -- print each element |
||

+ | forM_ (U.toList a) print |
||

+ | </haskell> |
||

+ | |||

+ | Running it: |
||

+ | |||

+ | <haskell> |
||

+ | $ runhaskell B.hs |
||

+ | -56426044567146682 |
||

+ | -2144043065897064806 |
||

+ | -2361203915295681429 |
||

+ | 1023751035988638668 |
||

+ | -5145147152582103336 |
||

+ | 6545758323081548799 |
||

+ | -7630294342751488332 |
||

+ | -5861937811333342436 |
||

+ | -3198510304070719259 |
||

+ | 8949914511577398116 |
||

+ | -8681457396993884283 |
||

+ | </haskell> |
||

+ | |||

+ | == Transformations on Vectors == |
||

+ | |||

+ | A primary operation on arrays are the zip class of functions. |
||

+ | |||

+ | <haskell> |
||

+ | > let a = fromList [20,30,40,50] |
||

+ | |||

+ | > let b = enumFromN 0 4 |
||

+ | |||

+ | > a |
||

+ | fromList [20,30,40,50] |
||

+ | |||

+ | > b |
||

+ | fromList [0,1,2,3] |
||

+ | |||

+ | > Data.Vector.zipWith (-) a b |
||

+ | fromList [20,29,38,47] |
||

+ | </haskell> |
||

+ | |||

+ | We can also, of course, apply a function to each element of a vector (map): |
||

+ | |||

+ | <haskell> |
||

+ | > Data.Vector.map (^2) b |
||

+ | fromList [0,1,4,9] |
||

+ | |||

+ | > Data.Vector.map (\e -> 10 * sin (fromIntegral e)) a |
||

+ | fromList [9.129452507276277,-9.880316240928618,7.451131604793488,-2.6237485370392877] |
||

+ | |||

+ | > Data.Vector.map (< 35) a |
||

+ | fromList [True,True,False,False] |
||

+ | </haskell> |
||

+ | |||

+ | === Folds: Sums, Products, Min, Max === |
||

+ | |||

+ | Many special purpose folds (reductions) on vectors are available: |
||

+ | |||

+ | <haskell> |
||

+ | > let a = enumFromN 1 100 |
||

+ | |||

+ | > Data.Vector.sum a |
||

+ | 5050 |
||

+ | |||

+ | > Data.Vector.product a |
||

+ | 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000 |
||

+ | |||

+ | > Data.Vector.minimum a |
||

+ | 1 |
||

+ | |||

+ | > Data.Vector.maximum a |
||

+ | 100 |
||

+ | </haskell> |
||

+ | |||

== Indexing, Slicing and Iterating == |
== Indexing, Slicing and Iterating == |
||

+ | |||

+ | One dimensional arrays can be indexed, sliced and iterated over pretty much like lists. |
||

+ | |||

+ | Because Haskell values are by default immutable, all slice operations are zero-copying. |
||

+ | |||

+ | <haskell> |
||

+ | > let a = enumFromN 0 10 |
||

+ | |||

+ | > a |
||

+ | fromList [0,1,2,3,4,5,6,7,8,9] |
||

+ | |||

+ | > let b = Data.Vector.map (^3) a |
||

+ | |||

+ | > b ! 2 |
||

+ | 8 |
||

+ | |||

+ | > slice 2 3 b |
||

+ | fromList [8,27,64] |
||

+ | </haskell> |
||

+ | |||

+ | slice takes 3 arguments: the initial index to slice from, the number of elements to slice, and the vector to operate on. |
||

+ | |||

+ | A number of special-purpose slices are also available: |
||

+ | |||

+ | <haskell> |
||

+ | > Data.Vector.init b |
||

+ | fromList [0,1,8,27,64,125,216,343,512] |
||

+ | |||

+ | > Data.Vector.tail b |
||

+ | fromList [1,8,27,64,125,216,343,512,729] |
||

+ | |||

+ | > Data.Vector.take 3 b |
||

+ | fromList [0,1,8] :: Data.Vector.Vector |
||

+ | |||

+ | > Data.Vector.drop 3 b |
||

+ | fromList [27,64,125,216,343,512,729] :: Data.Vector.Vector |
||

+ | </haskell> |
||

+ | |||

== Bulk operations == |
== Bulk operations == |
||

== Stacking together different arrays == |
== Stacking together different arrays == |

## Revision as of 02:11, 20 February 2010

The vector library has an API similar to the famous Haskell list library, with many of the same names.

This tutorial is modelled on the NumPy tutorial.

# 1 Quick Tour

Here is a quick overview to get you started.

## 1.1 Importing the library

Download the vector package:

$ cabal install vector

and import it as, for boxed arrays:

import qualified Data.Vector as V

or:

import qualified Data.Vector.Unboxed as V

for unboxed arrays. The library needs to be imported qualified as it shares the same function names as list operations in the Prelude.

## 1.2 Generating Vectors

New vectors can be generated in many ways:

$ ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Loading package ffi-1.0 ... linking ... done. Prelude> :m + Data.Vector -- Generating a vector from a list: Prelude Data.Vector> let a = fromList [10, 20, 30, 40] Prelude Data.Vector> a fromList [10,20,30,40] :: Data.Vector.Vector -- Or filled from a sequence Prelude Data.Vector> enumFromStepN 10 10 4 fromList [10,20,30,40] :: Data.Vector.Vector -- A vector created from four consecutive values Prelude Data.Vector> enumFromN 10 4 fromList [10,11,12,13] :: Data.Vector.Vector

You can also build vectors using operations similar to lists:

-- The empty vector Prelude Data.Vector> empty fromList [] :: Data.Vector.Vector -- A vector of length one Prelude Data.Vector> singleton 2 fromList [2] :: Data.Vector.Vector -- A vector of length 10, filled with the value '2' -- Note that to disambiguate names, -- and avoid a clash with the Prelude, -- with use the full path to the Vector module Prelude Data.Vector> Data.Vector.replicate 10 2 fromList [2,2,2,2,2,2,2,2,2,2] :: Data.Vector.Vector

In general, you may construct new vectors by applying a function to the index space:

Prelude Data.Vector> generate 10 (^2) fromList [0,1,4,9,16,25,36,49,64,81] :: Data.Vector.Vector

Vectors may have more than one dimension:

-- Here we create a two dimensional vector, 10 columns, -- each row filled with the row index. Prelude Data.Vector> let x = generate 10 (\n -> Data.Vector.replicate 10 n) -- The type is "Vector of Vector of Ints" Prelude Data.Vector> :t x x :: Vector (Vector Int)

Vectors may be grown or shrunk arbitrarily:

Prelude Data.Vector> let y = Data.Vector.enumFromTo 0 11 Prelude Data.Vector> y fromList [0,1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector -- Take the first 3 elements as a new vector Prelude Data.Vector> Data.Vector.take 3 y fromList [0,1,2] :: Data.Vector.Vector -- Duplicate and join the vector Prelude Data.Vector> y Data.Vector.++ y fromList [0,1,2,3,4,5,6,7,8,9,10,11,0,1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector

## 1.3 Modifying vectors

Just as with lists, you can iterate (map) over arrays, reduce them (fold), filter them, or join them in various ways:

-- mapping a function over the elements of a vector Prelude Data.Vector> Data.Vector.map (^2) y fromList [0,1,4,9,16,25,36,49,64,81,100,121] :: Data.Vector.Vector -- Extract only the odd elements from a vector Prelude Data.Vector> Data.Vector.filter odd y fromList [1,3,5,7,9,11] :: Data.Vector.Vector -- Reduce a vector Prelude Data.Vector> Data.Vector.foldl (+) 0 y 66 -- Take a scan (partial results from a reduction): Prelude Data.Vector> Data.Vector.scanl (+) 0 y fromList [0,0,1,3,6,10,15,21,28,36,45,55,66] :: Data.Vector.Vector -- Zip two arrays pairwise, into an array of pairs Prelude Data.Vector> Data.Vector.zip y y fromList [(0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10),(11,11)] :: Data.Vector.Vector

## 1.4 Indexing vectors

And like all good arrays, you can index them in various ways:

-- Take the first element Prelude Data.Vector> Data.Vector.head y 0 -- Take the last element Prelude Data.Vector> Data.Vector.tail y fromList [1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector -- Take an arbitrary element Prelude Data.Vector> y ! 4 4

# 2 The Tutorial

The vector package provides a several types of array. The most general interface is via Data.Vector, which provides for boxed arrays, holding any type.

There are also more specialized array types:

which provide unboxed arrays (i.e. no closures) and storable arrays (data that is pinned, and may be passed to and from C via a Ptr).

In all cases, the operations are subject to loop fusion. That is, if you compose two functions,

map f . map g

the compiler will rewrite it into a single traversal:

map (f . g)

saving time and space.

## 2.1 Simple example

You can create the arrays in many ways, for example, from a regular Haskell list:

let a = fromList [2,3,4] Prelude Data.Vector> a fromList [2,3,4] :: Data.Vector.Vector Prelude Data.Vector> :t a a :: Vector Integer

GHCi will print the contents of the vector as executable code.

To create a multidimensional array, you can use a nested list generator to fill it:

Prelude Data.Vector> let x = fromList [ fromList [1 .. x] | x <- [1..10] ] Prelude Data.Vector> :t x x :: Vector (Vector Integer)

-- XXX TODO need a better printing function for multidimensional arrays.

You can also just create arrays filled with zeroes:

Prelude Data.Vector> Data.Vector.replicate 10 0 fromList [0,0,0,0,0,0,0,0,0,0] :: Data.Vector.Vector

And you can fill arrays from a sequence generator:

Prelude Data.Vector> enumFromN 1 10 fromList [1,2,3,4,5,6,7,8,9,10] :: Data.Vector.Vector Prelude Data.Vector> enumFromStepN 0 10 10 fromList [0,10,20,30,40,50,60,70,80,90] :: Data.Vector.Vector

## 2.2 Array Types

### 2.2.1 Boxed Arrays

### 2.2.2 Unboxed Arrays

### 2.2.3 Pure Arrays

### 2.2.4 Impure Arrays

### 2.2.5 Some examples

The most important attributes of an array are available in O(1) time, such as the size (length),

-- how big is the array? Prelude Data.Vector> let a = fromList [1,2,3,4,5,6,7,8,9,10] Prelude Data.Vector> Data.Vector.length a 10 -- is the array empty? Prelude Data.Vector> Data.Vector.null a False

## 2.3 Array Creation

### 2.3.1 An example: filling a vector from a file

We often want to populate a vector using a external data file. The easiest way to do this is with bytestring IO, and Data.Vector.unfoldr (or the equivalent functions in Data.Vector.Unboxed or Data.Vector.Storable:

{-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as L import qualified Data.Vector as U import System.Environment main = do [f] <- getArgs s <- L.readFile f print . U.sum . parse $ s -- Fill a new vector from a file containing a list of numbers. parse = U.unfoldr step where step !s = case L.readInt s of Nothing -> Nothing Just (!k, !t) -> Just (k, L.tail t)

Note the use of bang patterns to ensure the parsing accumulated state is produced strictly.

Create a data file filled with 1 million integers:

$ seq 1 1000000 > data

Compile with -Odph (enables special optimizations to help fusion):

$ ghc -Odph --make vector.hs

Run:

$ time ./vector data 500000500000 ./vector data 0.08s user 0.01s system 98% cpu 0.088 total

### 2.3.2 Random numbers

If we can parse from a file, we can also fill a vector with random numbers. We'll use the mersenne-random package:

$ cabal install mersenne-random

We can then use MTRandom class to generate random vectors of different types:

import qualified Data.Vector.Unboxed as U import System.Random.Mersenne import Control.Monad main = do -- create a new source of randomness -- andan infinite list of randoms g <- newMTGen Nothing rs <- randoms g -- fill a vector with the first 10 random Ints let a = U.fromList (take 10 rs) :: U.Vector Int -- print the sum print (U.sum a) -- print each element forM_ (U.toList a) print

Running it:

$ runhaskell B.hs -56426044567146682 -2144043065897064806 -2361203915295681429 1023751035988638668 -5145147152582103336 6545758323081548799 -7630294342751488332 -5861937811333342436 -3198510304070719259 8949914511577398116 -8681457396993884283

## 2.4 Transformations on Vectors

A primary operation on arrays are the zip class of functions.

> let a = fromList [20,30,40,50] > let b = enumFromN 0 4 > a fromList [20,30,40,50] > b fromList [0,1,2,3] > Data.Vector.zipWith (-) a b fromList [20,29,38,47]

We can also, of course, apply a function to each element of a vector (map):

> Data.Vector.map (^2) b fromList [0,1,4,9] > Data.Vector.map (\e -> 10 * sin (fromIntegral e)) a fromList [9.129452507276277,-9.880316240928618,7.451131604793488,-2.6237485370392877] > Data.Vector.map (< 35) a fromList [True,True,False,False]

### 2.4.1 Folds: Sums, Products, Min, Max

Many special purpose folds (reductions) on vectors are available:

> let a = enumFromN 1 100 > Data.Vector.sum a 5050 > Data.Vector.product a 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000 > Data.Vector.minimum a 1 > Data.Vector.maximum a 100

## 2.5 Indexing, Slicing and Iterating

One dimensional arrays can be indexed, sliced and iterated over pretty much like lists.

Because Haskell values are by default immutable, all slice operations are zero-copying.

> let a = enumFromN 0 10 > a fromList [0,1,2,3,4,5,6,7,8,9] > let b = Data.Vector.map (^3) a > b ! 2 8 > slice 2 3 b fromList [8,27,64]

slice takes 3 arguments: the initial index to slice from, the number of elements to slice, and the vector to operate on.

A number of special-purpose slices are also available:

> Data.Vector.init b fromList [0,1,8,27,64,125,216,343,512] > Data.Vector.tail b fromList [1,8,27,64,125,216,343,512,729] > Data.Vector.take 3 b fromList [0,1,8] :: Data.Vector.Vector > Data.Vector.drop 3 b fromList [27,64,125,216,343,512,729] :: Data.Vector.Vector