[Haskell-cafe] Re: trouble compiling "import GHC.Prim(MutableByteArray#, ....." (building regex-tdfa from darcs) -- what's that # sign doing?

Sun Aug 19 18:25:49 EDT 2007

Stefan O'Rear wrote:
> On Fri, Aug 17, 2007 at 04:27:29PM -0400, Thomas Hartman wrote:
>> trying to compile regex-tdfa, I ran into another issue. (earlier I had a 
>> cabal problem but that's resolved.)
>>
>> there's a line that won't compile, neither for ghc 6.6.1 nor 6.7 
>>
>> import 
>> GHC.Prim(MutableByteArray#,RealWorld,Int#,sizeofMutableByteArray#,unsafeCoerce#)
>>
>> so the fresh darcs regex tdfa package won't build.
>>
>> This line (line 16 below) causes this error for 
>>
>>   ghc -e '' RunMutState.hs
>>
>> for both ghc 6.1 and 6.7 
> 
> There are at least two things going on here.
> 
> 1. GHC-specific unboxed identifiers have a # in the name.   I think this
>    is a relic from back when the only reasonable way to namespace was to
>    modify your compiler to add extra identifier characters, and use them
>    in all non-portable identifiers.  In any case, you have to enable the
>    -fglasgow-exts option (or -XMagicHash in recent 6.7) to allow imports
>    of such identifiers.
> 
> 2. Explicitly importing GHC.Prim has been discouraged for as long as I
>    can remember, and GHC HQ has finally made good on the promise to make
>    it impossible.  Code which imports it has a bug already, which can be
>    fixed by switching to GHC.Exts.  (Why?  GHC.Prim is wired into the
>    compiler, while GHC.Exts is a normal Haskell module, so by using
>    GHC.Exts you are insulated from questions of what is primitive and
>    what is derived but still unportable.  Yes, this does change.)
> 
> Stefan
> 
> 

Hi,

  I wrote regex-tdfa, and since I don't use beyond GHC 6.6.1 I had not seen this
problem emerge.  The use of GHC.Prim and CPP is intimitely linked:

from
http://darcs.haskell.org/packages/regex-unstable/regex-tdfa/Text/Regex/TDFA/RunMutState.hs

> 
> #ifdef __GLASGOW_HASKELL__
> foreign import ccall unsafe "memcpy"
>     memcpy :: MutableByteArray# RealWorld -> MutableByteArray# RealWorld -> Int# -> IO ()
> 
> {-# INLINE copySTU #-}
> copySTU :: (Show i,Ix i,MArray (STUArray s) e (ST s)) => STUArray s i e -> STUArray s i e -> ST s ()
> copySTU (STUArray _ _ msource) (STUArray _ _ mdest) =
> -- do b1 <- getBounds s1
> --  b2 <- getBounds s2
> --  when (b1/=b2) (error ("\n\nWTF copySTU: "++show (b1,b2)))
>   ST $ \s1# ->
>     case sizeofMutableByteArray# msource        of { n# ->
>     case unsafeCoerce# memcpy mdest msource n# s1# of { (# s2#, () #) ->
>     (# s2#, () #) }}
> 
> #else /* !__GLASGOW_HASKELL__ */
> 
> copySTU :: (MArray (STUArray s) e (ST s))=> STUArray s Tag e -> STUArray s Tag e -> ST s ()
> copySTU source destination = do
>   b@(start,stop) <- getBounds source
>   b' <- getBounds destination
>   -- traceCopy ("> copySTArray "++show b) $ do
>   when (b/=b') (fail $ "Text.Regex.TDFA.RunMutState copySTUArray bounds mismatch"++show (b,b'))
>   forM_ (range b) $ \index ->
>     unsafeRead source index >>= unsafeWrite destination index
> #endif /* !__GLASGOW_HASKELL__ */

The entire point of using the ST monad is manage memory more efficiently than
with (U)Array.  The copySTU simply uses a "memcpy" to copy the whole source
array into the destination efficiently.  This lets me re-use the already
allocated destination array.  If there had been a high level "copyMArray" then
this would not have been needed.  The CPP is used to let non-GHC compilers copy
element by element.  The *right* solution is to patch the STUArray and/or MArray
code to do this behind the scenes.

So how does one get the array pointer without GHC.Prim in 6.7 ?

-- 
Chris