[GHC] #4096: New primops for indexing: index*OffAddrUsing# etc

GHC cvs-ghc at haskell.org
Tue May 25 11:36:53 EDT 2010


#4096: New primops for indexing: index*OffAddrUsing# etc
---------------------------------+------------------------------------------
    Reporter:  simonpj           |        Owner:              
        Type:  feature request   |       Status:  new         
    Priority:  normal            |    Milestone:              
   Component:  Compiler          |      Version:  6.12.2      
    Keywords:                    |   Difficulty:              
          Os:  Unknown/Multiple  |     Testcase:              
Architecture:  Unknown/Multiple  |      Failure:  None/Unknown
---------------------------------+------------------------------------------
 Here's an email from Roman, and response from Simon.  Making a ticket so
 it's kept as an idea.

 In package vector, primitive vectors (the ones that `Data.Vector.Unboxed`
 is built on top of) are represented as follows (`ByteArray` and friends
 are wrappers for various GHC primitives provided by package primitive):
 {{{
 data Vector a = Vector Int               -- offset into the ByteArray
                        Int               -- length
                        ByteArray         -- data
 }}}
 This representation supports cheap slicing which is quite crucial.
 However, indexing into such vectors is a bit more expensive than
 necessary:
 {{{
 index (Vector i _ arr) j = indexByteArray arr (i+j)
 }}}
 Ultimately, this requires 2 additions to get the element's address:
 {{{
   <base address off the ByteArray> + ((i + j) * <size of element>)
 }}}
 I'd like to always allocate pinned `ByteArrays` and store the starting
 address of the vector instead of the offset:
 {{{
 data Vector a = Vector Addr
                        Int
                        ByteArray
 }}}
 This would make indexing cheaper as it would require only one addition:
 {{{
 index (Vector addr i _) = indexOffAddr addr i
 }}}
 This is quite a big deal if indexing happens in an inner loop (some
 algorithms become up to 20% faster). Of course, the backend could optimise
 the offset-based version by performing partial redundancy elimination but
 it doesn't and it probably wouldn't get all interesting cases even if it
 did. So the second version is better.

 The problem is that I can't implement it because I must touch the
 `ByteArray` after accessing the memory. This results in code like this
 which hides the constructor, breaking various optimisations:
 {{{
   case indexIntOffAddr# addr# i# of { n# ->
   case touch# arr# realWorld# of { _ -> I# n# }}
 }}}
 After thinking about this for a while, I came up with two possible
 solutions. One is to provide a "pure" version of touch#:
 {{{
   use# :: o -> o' -> o'
 }}}
 such that use# x y = y. This would change the code above to:
 {{{
   I# (use# arr# (indexIntOffAddr# addr# i#))
 }}}
 I don't know how to implement this, though, because use# would have to be
 able to return arbitrary (unboxed) types and the code generator doesn't
 really seem to support this.

 A perhaps simpler solution is to add a new set of primitives:
 {{{
   indexIntOffAddrUsing# :: o -> Addr# -> Int# -> Int#
   ...
 }}}
 These would take an additional argument which they'd touch and otherwise
 ignore. The code would then become:
 {{{
   I# (indexIntOffAddrUsing# arr# addr# i#)
 }}}
 Incidentally, the `index*OffAddr#` primitives don't seem to be used
 anywhere.  [Not true: there are a handful of uses of `index*OffAddr#` in
 libraries/base, mainly GHC.Base.]

 Simon replies:
 `indexIntOffAddrUsing#` seems like the way to go, I can't think of a
 better alternative.

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/4096>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the Glasgow-haskell-bugs mailing list