<font face="verdana,sans-serif">Hello cafe,<br><br>I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.<br>
<br>Here's what I have so far:<br><br>import qualified Data.Array.Accelerate as A<br>import Data.Array.Accelerate ( (:.)(..)<br> , Acc<br> , Vector<br> , Scalar<br>
, Elt<br> , fold<br> , slice<br> , constant<br> , Array<br> , Z(..), DIM1, DIM2<br>
, fromList<br> , All(..)<br> , generate<br> , lift, unlift<br> , shape<br> )<br>
import Data.Array.Accelerate.Interpreter ( run )<br><br>dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)<br>dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys<br><br>type Matrix a = Array DIM2 a<br>
<br>getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)<br>getRow n mat = slice mat . constant $ Z :. n :. All<br><br>-- Naive matrix multiplication:<br>--<br>-- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'<br>
matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)<br>matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $<br> \ix -><br> let (Z :. i :. j) = unlift ix<br>
in getRow i a `dotP` getRow j b<br> where<br> b = A.transpose b' -- I assume row indexing is faster than column indexing...<br> (Z :. nrows :. _ ) = unlift $ shape a<br> (Z :. _ :. ncols) = unlift $ shape b<br>
<br><br>This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.<br><br>So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?<br>
<br>Thanks for your help,<br> - Clark<br><br>[1] <a href="http://hackage.haskell.org/package/accelerate">http://hackage.haskell.org/package/accelerate</a><br></font>