Personal tools

HSFFIG/Tutorial

From HaskellWiki

< HSFFIG(Difference between revisions)
Jump to: navigation, search
m (Narrowed the table)
m (formatting)
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Purpose =
+
== Program input ==
 
'''Hsffig''' is a tool to convert a '''C''' header file (.h) into Haskell code containing FFI import statements for all entities whose declarations are found in the header file. Current release version of '''hsffig''' is 1.0 (release date 07/30/2005).
 
 
= Getting HSFFIG =
 
 
== Downloading ==
 
 
The project homepage is located at http://hsffig.sourceforge.net. '''Hsffig''' source code may be downloaded from the "File Releases" section of the web page http://www.sourceforge.net/projects/hsffig. Source tarball file name is <tt>hsffig-1.0.tar.gz</tt>. [http://www.darcs.net Darcs] users may check out from the repos at:
 
 
* darcs get http://hsffig.sourceforge.net/repos/hsffig-1.0 ("stable" snapshot)
 
 
* darcs get http://www.golubovsky.org/repos/hsffig-1.0 ("unstable" snapshot)
 
 
== Building from Source ==
 
 
'''Hsffig''' has been released as a [http://www.haskell.org/cabal Cabalized] package. The distribution tarball file contains a minimal version of '''Cabal''' setup program, so it is not necessary to have '''Cabal''' installed in order to install '''hsffig'''. Even more, it is advised to use the '''Cabal''' setup program (further referred to as ''cabal-setup'') as it was slightly tailored to the details of '''hsffig''' installation.
 
 
# After unpacking the tarball, go to the root directory of the distibution and run <tt>make</tt>. This builds the ''cabal-setup'' executable necessary for proper package build and installation.
 
# Run <tt>./cabal-setup configure</tt>. Alex is required to compile '''hsffig''', so it is necessary to provide path to Alex executable (the ''--with-alex='' option of <tt>./cabal-setup configure</tt>). This creates the package build configuration file.
 
# Run <tt>./cabal-setup build</tt>. This builds the '''hsffig''' library, and all executables.
 
# Run <tt>./cabal-setup install</tt>. This installs the library and the executables, and registers the package with GHC.
 
 
At this point, '''hsffig''' is ready to use.
 
 
 
= Program Input =
 
   
 
'''Hsffig''' acts as a filter (in Unix sense) which means that it consumes all its input information on its standard input (stdin), and produces its output on standard output (stdout). No other files are involved.
 
'''Hsffig''' acts as a filter (in Unix sense) which means that it consumes all its input information on its standard input (stdin), and produces its output on standard output (stdout). No other files are involved.
Line 11: Line 11:
 
The ''-E'' option instructs '''gcc''' only to preprocess (no compilation) its input file, and the ''-dD'' option instructs it to emit ''#define'' preprocessor statements along with the preprocessed header (without ''-dD'', autogenerated binding module will not contain import statements for constants defined this way).
 
The ''-E'' option instructs '''gcc''' only to preprocess (no compilation) its input file, and the ''-dD'' option instructs it to emit ''#define'' preprocessor statements along with the preprocessed header (without ''-dD'', autogenerated binding module will not contain import statements for constants defined this way).
   
== Preprocessor Requirements ==
+
=== Preprocessor requirements ===
   
 
It is assumed that majority of '''hsffig''' users will use tools from the GNU Compiler Collection (GCC), whose preprocessor complies with the '''hsffig''' requirements. For other preprocessors from other vendors, the following is expected:
 
It is assumed that majority of '''hsffig''' users will use tools from the GNU Compiler Collection (GCC), whose preprocessor complies with the '''hsffig''' requirements. For other preprocessors from other vendors, the following is expected:
Line 27: Line 27:
 
If compilant preprocessor is not available, and control lines like discussed above cannot appear in the preprocessor output (with '''gcc''', this may be simulated with ''-P'' option), a special token ''@@MODULENAME@@'' will be used in the Module statement. Later, it may be substituted with anything desirable with a program like ''sed(1)''. Also, if '''hsffig''' is unable to determine the header file name from its input, constants defined via ''#define'' will not be imported.
 
If compilant preprocessor is not available, and control lines like discussed above cannot appear in the preprocessor output (with '''gcc''', this may be simulated with ''-P'' option), a special token ''@@MODULENAME@@'' will be used in the Module statement. Later, it may be substituted with anything desirable with a program like ''sed(1)''. Also, if '''hsffig''' is unable to determine the header file name from its input, constants defined via ''#define'' will not be imported.
   
= Program Output =
+
== Program output ==
   
 
'''Hsffig''' outputs hsc code on its standard output, to be redirected to a file or (possibly) to some other filter program. The output is ready for processing by '''hsc2hs'''. Description of '''hsc2hs''' input file format may be found here:
 
'''Hsffig''' outputs hsc code on its standard output, to be redirected to a file or (possibly) to some other filter program. The output is ready for processing by '''hsc2hs'''. Description of '''hsc2hs''' input file format may be found here:
Line 46: Line 46:
 
In first lines of generated hsc code, an ''#include'' statement is present to include the same header file that was consumed by '''hsffig'''. If '''hsffig''' is unable to determine header file name from its input (see Preprocessor Requirements), a special token ''@@INCLUDEFILE@@'' will be placed with the ''#include'' statement instead. Similarly to the module name substitute, it may be later converted into a real header file name with a program like '''sed'''.
 
In first lines of generated hsc code, an ''#include'' statement is present to include the same header file that was consumed by '''hsffig'''. If '''hsffig''' is unable to determine header file name from its input (see Preprocessor Requirements), a special token ''@@INCLUDEFILE@@'' will be placed with the ''#include'' statement instead. Similarly to the module name substitute, it may be later converted into a real header file name with a program like '''sed'''.
   
== Special Definitions for Hsc2hs ==
+
== Special definitions for hsc2hs ==
   
 
In generated hsc code, the following '''hsc2hs''' macros are used to access members of structures/unions:
 
In generated hsc code, the following '''hsc2hs''' macros are used to access members of structures/unions:
Line 117: Line 117:
 
Specifyiing ''/dev/null'' as template file name causes '''hsc2hs''' not to use any template file.
 
Specifyiing ''/dev/null'' as template file name causes '''hsc2hs''' not to use any template file.
   
== Splitting Large Modules ==
+
=== Splitting large modules ===
   
 
Considering that '''hsffig''' generates import statements for everything it encounters in all header files included within the toplevel header file, amount of generated Haskell code may be quite large, so it cannot be compiled by '''ghc''' at once due to memory limitations. To overcome this, a special utility, '''splitter''' is included with '''hsffig'''. In generated hsc code, the following comments are inserted:
 
Considering that '''hsffig''' generates import statements for everything it encounters in all header files included within the toplevel header file, amount of generated Haskell code may be quite large, so it cannot be compiled by '''ghc''' at once due to memory limitations. To overcome this, a special utility, '''splitter''' is included with '''hsffig'''. In generated hsc code, the following comments are inserted:
Line 174: Line 174:
 
It is recommended to use '''ghc''' with ''--make'' option: this forces the compiler to chase all module dependencies automatically. Since all the modules created by '''splitter''' contain correct import statements, no module will be left behind.
 
It is recommended to use '''ghc''' with ''--make'' option: this forces the compiler to chase all module dependencies automatically. Since all the modules created by '''splitter''' contain correct import statements, no module will be left behind.
   
= FFI Import Details =
+
== FFI import details ==
   
== Naming Conventions ==
+
=== Naming conventions ===
   
 
A significant difference between '''C''' and Haskell is that '''C''' does not force starting character of its variable, function, or type identifiers to be of certain case while Haskell has requirements of this kind. To resolve this problem, the following rules apply when an identifier declared in a '''C''' library is imported in a Haskell application.
 
A significant difference between '''C''' and Haskell is that '''C''' does not force starting character of its variable, function, or type identifiers to be of certain case while Haskell has requirements of this kind. To resolve this problem, the following rules apply when an identifier declared in a '''C''' library is imported in a Haskell application.
Line 315: Line 315:
 
The table summarizes all prefixes '''hsffig''' adds to imported identifiers, and corresponding types of import. Flags '''Yes''' and '''No''' show whether given prefix and import purpose is suitable for each sort of imported entity.
 
The table summarizes all prefixes '''hsffig''' adds to imported identifiers, and corresponding types of import. Flags '''Yes''' and '''No''' show whether given prefix and import purpose is suitable for each sort of imported entity.
   
== Typedefs ==
+
=== Typedefs ===
   
 
'''C''' types declared with the ''typedef'' statement are imported as type aliases prefixed with "T_". These type aliases may be more convenient to use in certain situations.
 
'''C''' types declared with the ''typedef'' statement are imported as type aliases prefixed with "T_". These type aliases may be more convenient to use in certain situations.
Line 321: Line 321:
 
The following '''C''' declaration:
 
The following '''C''' declaration:
   
{{{
+
<pre>
 
typedef union {
 
typedef union {
 
struct {
 
struct {
Line 331: Line 331:
 
int y;
 
int y;
 
} *PBF;
 
} *PBF;
}}}
+
</pre>
   
 
imports as:
 
imports as:
   
{{{
+
<pre>
 
type T_PBF = Ptr (U_9)
 
type T_PBF = Ptr (U_9)
   
Line 343: Line 343:
 
-- instances to access S_5 and U_9 members follow
 
-- instances to access S_5 and U_9 members follow
   
}}}
+
</pre>
   
 
In the following Haskell program:
 
In the following Haskell program:
   
{{{
+
<pre>
 
module Main where
 
module Main where
   
Line 363: Line 363:
 
y <- (bfu --> V_y)
 
y <- (bfu --> V_y)
 
putStrLn $ "y = " ++ (show y)
 
putStrLn $ "y = " ++ (show y)
}}}
+
</pre>
   
 
it is necessary to declare the type of ''bfu'' because there is no foreign import declaration that would drive the type inference mechanism. Since the union is declared anonymously, the type derived by '''hsffig''' contains a numeric identifier that is hard to remember. Imported ''typedef'' helps because it may be used instead, and is controlled by a developer (''Ptr U_9'' couls be used with the same effect).
 
it is necessary to declare the type of ''bfu'' because there is no foreign import declaration that would drive the type inference mechanism. Since the union is declared anonymously, the type derived by '''hsffig''' contains a numeric identifier that is hard to remember. Imported ''typedef'' helps because it may be used instead, and is controlled by a developer (''Ptr U_9'' couls be used with the same effect).
Line 369: Line 369:
 
Per '''FFI Addendum''', the following typedefs are imported as they are (the Addendum defines names of Haskell types equivalent to them):
 
Per '''FFI Addendum''', the following typedefs are imported as they are (the Addendum defines names of Haskell types equivalent to them):
   
* {{{ptrdiff_t}}}
+
* ptrdiff_t
* {{{size_t}}}
+
* size_t
* {{{wchar_t}}}
+
* wchar_t
* {{{sig_atomic_t}}}
+
* sig_atomic_t
* {{{clock_t}}}
+
* clock_t
* {{{time_t}}}
+
* time_t
* {{{FILE}}}
+
* FILE
* {{{fpos_t}}}
+
* fpos_t
* {{{jmp_buf}}}
+
* jmp_buf
   
== Constants (#define) ==
+
=== Constants (#define) ===
   
 
For each constant declared with ''#define'', generated hsc code looks as follows:
 
For each constant declared with ''#define'', generated hsc code looks as follows:
   
{{{
+
<pre>
 
#define BIG_ENDIAN __BIG_ENDIAN
 
#define BIG_ENDIAN __BIG_ENDIAN
 
#define PDP_ENDIAN __PDP_ENDIAN
 
#define PDP_ENDIAN __PDP_ENDIAN
Line 390: Line 390:
 
#define S_IRGRP (S_IRUSR >> 3)
 
#define S_IRGRP (S_IRUSR >> 3)
 
#define S_IROTH (S_IRGRP >> 3)
 
#define S_IROTH (S_IRGRP >> 3)
}}}
+
</pre>
   
 
becomes
 
becomes
   
{{{
+
<pre>
 
c_BIG_ENDIAN = #const BIG_ENDIAN
 
c_BIG_ENDIAN = #const BIG_ENDIAN
 
c_PDP_ENDIAN = #const PDP_ENDIAN
 
c_PDP_ENDIAN = #const PDP_ENDIAN
Line 401: Line 401:
 
c_S_IRGRP = #const S_IRGRP
 
c_S_IRGRP = #const S_IRGRP
 
c_S_IROTH = #const S_IROTH
 
c_S_IROTH = #const S_IROTH
}}}
+
</pre>
   
 
that is, '''hsffig''' does not deal with actual value of the constant: it is left to '''hsc2hs'''.
 
that is, '''hsffig''' does not deal with actual value of the constant: it is left to '''hsc2hs'''.
Line 407: Line 407:
 
However, arbitrary ''#define'' cannot be imported this way. Some preprocessor macros have parameters, or contain '''C''' statements, or otherwise cannot be assignment expression's RHS. To rule such preprocessor macros out, for each ''#define'' '''hsffig''' encounters on its input, a short '''C''' program is created, like this:
 
However, arbitrary ''#define'' cannot be imported this way. Some preprocessor macros have parameters, or contain '''C''' statements, or otherwise cannot be assignment expression's RHS. To rule such preprocessor macros out, for each ''#define'' '''hsffig''' encounters on its input, a short '''C''' program is created, like this:
   
{{{
+
<pre>
 
#include "header.h"
 
#include "header.h"
 
static int a = BIG_ENDIAN;
 
static int a = BIG_ENDIAN;
}}}
+
</pre>
   
for {{{#define BIG_ENDIAN __BIG_ENDIAN}}}. The command line for '''gcc''' is:
+
for <tt>#define BIG_ENDIAN __BIG_ENDIAN</tt>. The command line for '''gcc''' is:
   
{{{
+
<pre>
 
gcc -pipe -x c -q -fsyntax-only - 2>/dev/null
 
gcc -pipe -x c -q -fsyntax-only - 2>/dev/null
}}}
+
</pre>
   
 
If this program passes syntax check, the constant is imported, otherwise not imported.
 
If this program passes syntax check, the constant is imported, otherwise not imported.
Line 425: Line 425:
 
Constants are not in the '''IO''' monad, and may be used in any function. Bit operations (from '''Foreign.C.Bits''') may be applied to them:
 
Constants are not in the '''IO''' monad, and may be used in any function. Bit operations (from '''Foreign.C.Bits''') may be applied to them:
   
{{{
+
<pre>
 
dbperm = c_S_IRUSR .|. c_S_IWUSR .|. c_S_IRGRP .|. c_S_IWGRP .|. c_S_IROTH
 
dbperm = c_S_IRUSR .|. c_S_IWUSR .|. c_S_IRGRP .|. c_S_IWGRP .|. c_S_IROTH
}}}
+
</pre>
   
 
may be used to specify file permissions {{{rw-rw-r--}}}
 
may be used to specify file permissions {{{rw-rw-r--}}}
   
== Constants (enum) ==
+
=== Constants (enum) ===
   
 
Enumerations are imported similarly to preprocessor-defined constants, but identifiers are prefixed with "e_".
 
Enumerations are imported similarly to preprocessor-defined constants, but identifiers are prefixed with "e_".
Line 437: Line 437:
 
An example showing usage of both ''#define'' and enumeration constants:
 
An example showing usage of both ''#define'' and enumeration constants:
   
{{{
+
<pre>
 
dbop <- dbp --> V_open
 
dbop <- dbp --> V_open
 
ret <- withCString "access.db" $ \dbname -> do
 
ret <- withCString "access.db" $ \dbname -> do
Line 448: Line 448:
 
(fromIntegral dbperm)
 
(fromIntegral dbperm)
 
return r
 
return r
}}}
+
</pre>
   
 
''fromIntegral'' may be necessary to apply to constants when calling foreign functions because the Haskell compiler infers the type for the following sort of expressions:
 
''fromIntegral'' may be necessary to apply to constants when calling foreign functions because the Haskell compiler infers the type for the following sort of expressions:
   
{{{
+
<pre>
 
c_DB_CREATE = 1
 
c_DB_CREATE = 1
 
e_DB_BTREE = 1
 
e_DB_BTREE = 1
}}}
+
</pre>
   
 
as ''Integer'' while the function referred to by ''dbop'' in the example above requires ''CInt'', ''CUInt'', ''CInt'' for its last three arguments.
 
as ''Integer'' while the function referred to by ''dbop'' in the example above requires ''CInt'', ''CUInt'', ''CInt'' for its last three arguments.
   
== Variables ==
+
=== Variables ===
   
=== Regular Variables ===
+
==== Regular variables ====
   
 
Regular variables contain either values or pointers to other variables (not functions). For this sort of imports, the following identifiers are created:
 
Regular variables contain either values or pointers to other variables (not functions). For this sort of imports, the following identifiers are created:
   
* a pointer to the variable itself: prefix is "p_"
+
* a pointer to the variable itself: prefix is "p_"
* a nullary function returning variable's value in the '''IO''' monad: prefix is "v_"
+
* a nullary function returning variable's value in the '''IO''' monad: prefix is "v_"
* a unary function taking a value and assigning it to the variable (in the '''IO''' monad): prefix is "s_"
+
* a unary function taking a value and assigning it to the variable (in the '''IO''' monad): prefix is "s_"
   
 
So, for a variable declared as
 
So, for a variable declared as
   
{{{
+
<pre>
 
extern int a;
 
extern int a;
}}}
+
</pre>
   
 
FFI import declarations will be as follows:
 
FFI import declarations will be as follows:
   
{{{
+
<pre>
 
foreign import ccall "header.h &a"
 
foreign import ccall "header.h &a"
 
___2___ :: Ptr (CInt)
 
___2___ :: Ptr (CInt)
Line 483: Line 483:
 
v_a = peek ___2___
 
v_a = peek ___2___
 
s_a = poke ___2___
 
s_a = poke ___2___
}}}
+
</pre>
   
 
where ___''2''___ is one of internal unique names that '''hsffig''' assigns to all identifiers it encounters in the header file.
 
where ___''2''___ is one of internal unique names that '''hsffig''' assigns to all identifiers it encounters in the header file.
Line 489: Line 489:
 
So, to retrieve value from a variable, ''v_a'' is used:
 
So, to retrieve value from a variable, ''v_a'' is used:
   
{{{
+
<pre>
 
x <- v_a -- equivalent to peek p_a
 
x <- v_a -- equivalent to peek p_a
}}}
+
</pre>
   
 
To update the variable value, ''s_a'' is used:
 
To update the variable value, ''s_a'' is used:
   
{{{
+
<pre>
 
s_a 4 -- equivalent to poke p_a 4
 
s_a 4 -- equivalent to poke p_a 4
}}}
+
</pre>
   
 
If a variable contains pointer to a value, e. g
 
If a variable contains pointer to a value, e. g
   
{{{
+
<pre>
 
extern int *ap;
 
extern int *ap;
}}}
+
</pre>
   
 
then it is imported as follows:
 
then it is imported as follows:
   
{{{
+
<pre>
 
foreign import ccall "header.h &ap"
 
foreign import ccall "header.h &ap"
 
___3___ :: Ptr (Ptr (CInt))
 
___3___ :: Ptr (Ptr (CInt))
Line 513: Line 513:
 
v_ap = peek ___3___
 
v_ap = peek ___3___
 
s_ap = poke ___3___
 
s_ap = poke ___3___
}}}
+
</pre>
   
 
so to dereference the pointer, additional ''peek'' is needed:
 
so to dereference the pointer, additional ''peek'' is needed:
   
{{{
+
<pre>
 
c <- v_ap >>= peek
 
c <- v_ap >>= peek
}}}
+
</pre>
   
 
and ''s_ap'' will update the variable itself with new address, rather than the value it points to.
 
and ''s_ap'' will update the variable itself with new address, rather than the value it points to.
=== Function Pointers ===
+
==== Function pointers ====
   
 
Function pointers contain pointers to executable code.
 
Function pointers contain pointers to executable code.
Line 528: Line 528:
 
For this sort of imports, the following identifiers are created:
 
For this sort of imports, the following identifiers are created:
   
* pointer to the variable itself: prefix is "p_"
+
* pointer to the variable itself: prefix is "p_"
* a function the variable points to (with proper arity and type signature): prefix is "x_"
+
* a function the variable points to (with proper arity and type signature): prefix is "x_"
* a nullary function returning the function pointed to by the variable (in the '''IO''' monad): prefix is "v_"
+
* a nullary function returning the function pointed to by the variable (in the '''IO''' monad): prefix is "v_"
* a unary function updating the variable with new function reference: prefix is "s_"
+
* a unary function updating the variable with new function reference: prefix is "s_"
* optionally: wrapper functions for arguments taking function references: prefix is "w_"
+
* optionally: wrapper functions for arguments taking function references: prefix is "w_"
   
 
A variable declared as:
 
A variable declared as:
   
{{{
+
<pre>
 
int (*c)(long,int);
 
int (*c)(long,int);
}}}
+
</pre>
   
 
is imported as follows:
 
is imported as follows:
   
{{{
+
<pre>
 
foreign import ccall "header.h &c"
 
foreign import ccall "header.h &c"
 
___8___ :: Ptr (FunPtr (CLong -> CInt -> IO CInt))
 
___8___ :: Ptr (FunPtr (CLong -> CInt -> IO CInt))
Line 553: Line 553:
 
v_c = peek ___8___ >>= (return . ___8___mk___)
 
v_c = peek ___8___ >>= (return . ___8___mk___)
 
s_c = \s -> ___8___wr___ s >>= poke ___8___
 
s_c = \s -> ___8___wr___ s >>= poke ___8___
}}}
+
</pre>
   
 
Note that the function referred to by ''c'' returns '''IO CInt'''. It is assumed that all functions imported from a '''C''' library are unsafe, and therefore forced to be monadic. Based on the same assumption, all calls are considered "safe": this also allows calling Haskell functions from '''C''' code.
 
Note that the function referred to by ''c'' returns '''IO CInt'''. It is assumed that all functions imported from a '''C''' library are unsafe, and therefore forced to be monadic. Based on the same assumption, all calls are considered "safe": this also allows calling Haskell functions from '''C''' code.
   
A variable declared as a function pointer in '''C''', from the Haskell point of view contains a '''Fun''''''Ptr''' i. e. virtual address of some foreign function's entry point. This cannot be used by a Haskell application directly, so a special ''dynamic import'' function needs to be declared. Such a pseudo-function (only its name and type signature need to be declared: the Haskell compiler does the rest) converts a '''Fun''''''Ptr''' into a function usable by Haskell applications. Another pseudo-function, ''wrapper import'' dose the opposite: converts a Haskell function into a monadic value containing a '''Fun''''''Ptr''' i. e. a virtual address of an entry point usable by foreign functions (useful for callbacks).
+
A variable declared as a function pointer in <b>C</b> , from the Haskell point of view contains a '''FunPtr''' i. e. virtual address of some foreign function's entry point. This cannot be used by a Haskell application directly, so a special ''dynamic import'' function needs to be declared. Such a pseudo-function (only its name and type signature need to be declared: the Haskell compiler does the rest) converts a '''FunPtr''' into a function usable by Haskell applications. Another pseudo-function, ''wrapper import'' does the opposite: converts a Haskell function into a monadic value containing a '''FunPtr''' i. e. a virtual address of an entry point usable by foreign functions (useful for callbacks).
   
 
For the variable - function pointer ''c'', ''p_c'' returns pointer to the variable, just like for a regular variable. However, ''v_c'' returns the variable value passed through the dynamic import pseudo-function, so it may be used directly in Haskell applications. The setter function, ''s_c'' takes a Haskell function with proper type signature and updates the ''c'' variable value with new function address. The function ''x_c'' is the foreign function ''c'' points to.
 
For the variable - function pointer ''c'', ''p_c'' returns pointer to the variable, just like for a regular variable. However, ''v_c'' returns the variable value passed through the dynamic import pseudo-function, so it may be used directly in Haskell applications. The setter function, ''s_c'' takes a Haskell function with proper type signature and updates the ''c'' variable value with new function address. The function ''x_c'' is the foreign function ''c'' points to.
Line 563: Line 563:
 
The difference between ''v_c'' and ''x_c'' is illustrated below:
 
The difference between ''v_c'' and ''x_c'' is illustrated below:
   
{{{
+
<pre>
 
pc1 <- v_c
 
pc1 <- v_c
 
r1 <- pc1 33::CLong 21::CInt
 
r1 <- pc1 33::CLong 21::CInt
Line 571: Line 571:
 
r2 <- x_c 33::CLong 21::CInt
 
r2 <- x_c 33::CLong 21::CInt
   
}}}
+
</pre>
   
 
''v_c'' is a monadic value; ''x_c'' is not.
 
''v_c'' is a monadic value; ''x_c'' is not.
Line 579: Line 579:
 
Consider the following:
 
Consider the following:
   
{{{
+
<pre>
 
typedef int (*F2) (long);
 
typedef int (*F2) (long);
 
typedef float (*X2) (double);
 
typedef float (*X2) (double);
 
F2 (*ee) (X2);
 
F2 (*ee) (X2);
}}}
+
</pre>
   
 
The variable ''ee'' is a pointer to a function taking a function from ''double'' to ''float'', and returning a function from ''long'' to ''int''.
 
The variable ''ee'' is a pointer to a function taking a function from ''double'' to ''float'', and returning a function from ''long'' to ''int''.
Line 589: Line 589:
 
The import looks as follows:
 
The import looks as follows:
   
{{{
+
<pre>
 
foreign import ccall "header.h &ee"
 
foreign import ccall "header.h &ee"
 
___40___ :: Ptr (FunPtr (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt))))
 
___40___ :: Ptr (FunPtr (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt))))
Line 604: Line 604:
 
foreign import ccall "wrapper"
 
foreign import ccall "wrapper"
 
w_ee_1 :: (CDouble -> IO CFloat) -> IO (FunPtr (CDouble -> IO CFloat))
 
w_ee_1 :: (CDouble -> IO CFloat) -> IO (FunPtr (CDouble -> IO CFloat))
}}}
+
</pre>
   
 
An additional import declaration, ''w_ee_1'' has been created. It may be used if ''ee'' is called, and some Haskell function is passed as a parameter:
 
An additional import declaration, ''w_ee_1'' has been created. It may be used if ''ee'' is called, and some Haskell function is passed as a parameter:
   
{{{
+
<pre>
 
mydoublefloat :: CDouble -> IO CFloat
 
mydoublefloat :: CDouble -> IO CFloat
 
mydoublefloat = {-- function body here --}
 
mydoublefloat = {-- function body here --}
Line 619: Line 619:
   
 
foo <- w_ee_1 mydoublefloat >>= x_ee
 
foo <- w_ee_1 mydoublefloat >>= x_ee
}}}
+
</pre>
   
Note that in the current version, '''hsffig''' does not create a dynamic import when a function imported returns a '''Fun''''''Ptr'''.
+
Note that in the current version, '''hsffig''' does not create a dynamic import when a function imported returns a '''FunPtr'''.
   
 
Name of autogenerated ''wrapper import'' for function arguments ends with "_1". If a function takes more than one argument - function then more than one ''wrapper import'' will be created (one per function type). For example:
 
Name of autogenerated ''wrapper import'' for function arguments ends with "_1". If a function takes more than one argument - function then more than one ''wrapper import'' will be created (one per function type). For example:
   
{{{
+
<pre>
 
void (*psetpdf)(double (*df)(double,double),int,long,void(*)(int));
 
void (*psetpdf)(double (*df)(double,double),int,long,void(*)(int));
}}}
+
</pre>
   
 
imports as:
 
imports as:
   
{{{
+
<pre>
 
foreign import ccall "header.h &psetpdf"
 
foreign import ccall "header.h &psetpdf"
 
___6___ :: Ptr (FunPtr (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong -> FunPtr (CInt -> IO ()) -> IO ()))
 
___6___ :: Ptr (FunPtr (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong -> FunPtr (CInt -> IO ()) -> IO ()))
Line 649: Line 649:
 
foreign import ccall "wrapper"
 
foreign import ccall "wrapper"
 
w_psetpdf_2 :: (CInt -> IO ()) -> IO (FunPtr (CInt -> IO ()))
 
w_psetpdf_2 :: (CInt -> IO ()) -> IO (FunPtr (CInt -> IO ()))
}}}
+
</pre>
   
 
The function pointed to by ''setpdf'' takes two arguments-functions of different types, therefore two ''wrapper imports'' have been created.
 
The function pointed to by ''setpdf'' takes two arguments-functions of different types, therefore two ''wrapper imports'' have been created.
Line 655: Line 655:
 
Enumeration of wrappers created for arguments does not correspond to positional numbers of arguments.
 
Enumeration of wrappers created for arguments does not correspond to positional numbers of arguments.
   
== Functions ==
+
=== Functions ===
   
 
Functions are imported under their own names prefixed by "f_". Similarly to the situation described above (variables holding function pointers), additional wrappers will be created for arguments representing other functions. Examples of import:
 
Functions are imported under their own names prefixed by "f_". Similarly to the situation described above (variables holding function pointers), additional wrappers will be created for arguments representing other functions. Examples of import:
   
{{{
+
<pre>
 
double dplus(double,double);
 
double dplus(double,double);
}}}
+
</pre>
   
 
imports as:
 
imports as:
   
{{{
+
<pre>
 
foreign import ccall "static header.h dplus"
 
foreign import ccall "static header.h dplus"
 
f_dplus :: CDouble -> CDouble -> IO CDouble
 
f_dplus :: CDouble -> CDouble -> IO CDouble
}}}
+
</pre>
   
 
A function taking another function as argument:
 
A function taking another function as argument:
   
{{{
+
<pre>
 
int db_env_set_func_yield (int (*)(void)) ;
 
int db_env_set_func_yield (int (*)(void)) ;
}}}
+
</pre>
   
 
imports as:
 
imports as:
   
{{{
+
<pre>
 
foreign import ccall "static header.h db_env_set_func_yield"
 
foreign import ccall "static header.h db_env_set_func_yield"
 
f_db_env_set_func_yield :: FunPtr (IO CInt) -> IO CInt
 
f_db_env_set_func_yield :: FunPtr (IO CInt) -> IO CInt
 
foreign import ccall "wrapper"
 
foreign import ccall "wrapper"
 
w_db_env_set_func_yield_1 :: (IO CInt) -> IO (FunPtr (IO CInt))
 
w_db_env_set_func_yield_1 :: (IO CInt) -> IO (FunPtr (IO CInt))
}}}
+
</pre>
   
 
Rules for creation and naming of ''wrapper imports'' are the same as defined for variables containing function pointers.
 
Rules for creation and naming of ''wrapper imports'' are the same as defined for variables containing function pointers.
   
== Structures and Unions ==
+
=== Structures and unions ===
   
=== Basic Information ===
+
==== Basic information ====
   
 
'''C''' structures/unions are different from Haskell records with labeled fields: different structure types may contain members with same names. Therefore direct mapping of '''C''' structures to Haskell records is not possible.
 
'''C''' structures/unions are different from Haskell records with labeled fields: different structure types may contain members with same names. Therefore direct mapping of '''C''' structures to Haskell records is not possible.
Line 695: Line 695:
 
Some structures are declared with ''struct'' statement:
 
Some structures are declared with ''struct'' statement:
   
{{{
+
<pre>
 
struct foo {
 
struct foo {
 
int bar;
 
int bar;
 
long baz;
 
long baz;
 
};
 
};
}}}
+
</pre>
   
 
Some structures are declared with ''typedef'' statement:
 
Some structures are declared with ''typedef'' statement:
   
{{{
+
<pre>
 
typedef struct foo {
 
typedef struct foo {
 
int bar;
 
int bar;
 
long baz;
 
long baz;
 
} foo_t;
 
} foo_t;
}}}
+
</pre>
   
 
In some cases, structure name in the ''typedef'' statement may be omitted:
 
In some cases, structure name in the ''typedef'' statement may be omitted:
   
{{{
+
<pre>
 
typedef struct {
 
typedef struct {
 
int bar;
 
int bar;
 
long baz;
 
long baz;
 
} foo_t;
 
} foo_t;
}}}
+
</pre>
   
 
There also may be structures/unions declared within another structures/unions without explicit structure type name declaration:
 
There also may be structures/unions declared within another structures/unions without explicit structure type name declaration:
   
{{{
+
<pre>
 
typedef struct {
 
typedef struct {
 
int bar;
 
int bar;
Line 731: Line 731:
 
} str_anon;
 
} str_anon;
 
} foo_t;
 
} foo_t;
}}}
+
</pre>
   
=== Datatypes and Combinators for Structure Members Access ===
+
==== Datatypes and combinators for structure members access ====
   
 
Structure types are imported using the ''newtype'' Haskell statement. Type name is prefixed with "S_" for structures and "U_" for unions. If a structure was declared as ''struct foo'', Haskell type name will be ''S_foo''. If a structure was declared anonymously (''struct'' without identifier), a unique numeric identifier will be used as structure type name, e. g. S_65.
 
Structure types are imported using the ''newtype'' Haskell statement. Type name is prefixed with "S_" for structures and "U_" for unions. If a structure was declared as ''struct foo'', Haskell type name will be ''S_foo''. If a structure was declared anonymously (''struct'' without identifier), a unique numeric identifier will be used as structure type name, e. g. S_65.
   
A imported structure type declaration looks like: {{{newtype S_foo = S_foo ()}}}. Indeed, this type name serves only identification purpose.
+
A imported structure type declaration looks like: <tt>newtype S_foo = S_foo ()</tt>. Indeed, this type name serves only identification purpose.
   
 
In practice, Haskell application developers using '''hsffig''' do not need to memorize these clumsy type identifiers: Haskell type inference system carries most of the burden.
 
In practice, Haskell application developers using '''hsffig''' do not need to memorize these clumsy type identifiers: Haskell type inference system carries most of the burden.
Line 743: Line 743:
 
To access imported structure members, two algebraic data types are declared for each member identifier '''hsffig''' encounters in its input:
 
To access imported structure members, two algebraic data types are declared for each member identifier '''hsffig''' encounters in its input:
   
* V_''member_id'': to get/set value of a structure member ''member_id''
+
* V_''member_id'': to get/set value of a structure member ''member_id''
* X_''member_id'': if a member holds function pointer, to invoke the function ''member_id'' points to
+
* X_''member_id'': if a member holds function pointer, to invoke the function ''member_id'' points to
   
 
Unlike variables, there is no "S_" prefixed item.
 
Unlike variables, there is no "S_" prefixed item.
   
A multi-parameter class with functional dependencies is declared to enable access to structures/unions members. Because of this, Haskell-98 extensions must be enabled in the compiler processing output of '''hsffig'''. The definition of this class is placed in the ''HSFFIG.Field''''''Access'' module of the Haskell library coming with the package.
+
A multi-parameter class with functional dependencies is declared to enable access to structures/unions members. Because of this, Haskell-98 extensions must be enabled in the compiler processing output of '''hsffig'''. The definition of this class is placed in the ''HSFFIG.FieldAccess'' module of the Haskell library coming with the package.
   
{{{
+
<pre>
 
class FieldAccess a b c | a c -> b where
 
class FieldAccess a b c | a c -> b where
 
(==>) :: Ptr a -> c -> b
 
(==>) :: Ptr a -> c -> b
Line 758: Line 758:
 
(-->) _ _ = error " illegal context for -->"
 
(-->) _ _ = error " illegal context for -->"
 
(<--) _ _ = error " illegal context for <--"
 
(<--) _ _ = error " illegal context for <--"
}}}
+
</pre>
   
The functional dependencies {{{a b c | a c -> b}}} specify that type of ''b'' depends entirely on the types ''a'' and ''c'', and there may be only one type of ''b'' for every possible combination of ''a'' and ''c''.
+
The functional dependencies <tt>a b c | a c -> b</tt> specify that type of ''b'' depends entirely on the types ''a'' and ''c'', and there may be only one type of ''b'' for every possible combination of ''a'' and ''c''.
   
 
Indeed, ''a'' is the type of an imported structure/union. The type ''c'' is the data type associated with a structure/union member identifier. The type ''b'' is type of the member as represented to a Haskell program. In a '''C''' structure/union cannot be more than one member with same name (and different types). That's what the functional dependencies mean in this case. This information is sufficient for the Haskell compiler to infer the type of a value retrieved from a structure/union member.
 
Indeed, ''a'' is the type of an imported structure/union. The type ''c'' is the data type associated with a structure/union member identifier. The type ''b'' is type of the member as represented to a Haskell program. In a '''C''' structure/union cannot be more than one member with same name (and different types). That's what the functional dependencies mean in this case. This information is sufficient for the Haskell compiler to infer the type of a value retrieved from a structure/union member.
Line 768: Line 768:
 
So, for a simple structure (note that the structure type and one of its members have the same name):
 
So, for a simple structure (note that the structure type and one of its members have the same name):
   
{{{
+
<pre>
 
struct abc {
 
struct abc {
 
int abc;
 
int abc;
 
};
 
};
}}}
+
</pre>
   
 
the datatypes are:
 
the datatypes are:
   
{{{
+
<pre>
 
newtype S_abc = S_abc () -- for the structure itself
 
newtype S_abc = S_abc () -- for the structure itself
 
data V_abc = V_abc deriving (Show) -- for the member `abc'
 
data V_abc = V_abc deriving (Show) -- for the member `abc'
 
data X_abc = X_abc deriving (Show) -- not used
 
data X_abc = X_abc deriving (Show) -- not used
}}}
+
</pre>
   
 
To get/set the member ''abc'' of the structure ''abc'', the following instance is defined:
 
To get/set the member ''abc'' of the structure ''abc'', the following instance is defined:
   
{{{
+
<pre>
 
instance FieldAccess S_abc (CInt) V_abc where
 
instance FieldAccess S_abc (CInt) V_abc where
 
z --> V_abc = (#peek __quote__(struct abc), abc) z
 
z --> V_abc = (#peek __quote__(struct abc), abc) z
 
(z, V_abc) <-- v = (#poke __quote__(struct abc), abc) z v
 
(z, V_abc) <-- v = (#poke __quote__(struct abc), abc) z v
}}}
+
</pre>
   
So, for this instance, ''a'' = {{{S_abc}}}, ''b'' = {{{CInt}}}, and ''c'' = {{{V_abc}}}. Which is semantically equivalent to the above declaration of the member ''abc'' of the ''struct abc''.
+
So, for this instance, ''a'' = <tt>S_abc</tt>, ''b'' = <tt>CInt</tt>, and ''c'' = <tt>V_abc</tt>. Which is semantically equivalent to the above declaration of the member ''abc'' of the ''struct abc''.
   
The {{{-->}}} combinator retrieves value of the given member from the structure pointed to by the given pointer. Thus, to access the value of ''abc'', a Haskell application contains:
+
The <tt>--></tt> combinator retrieves value of the given member from the structure pointed to by the given pointer. Thus, to access the value of ''abc'', a Haskell application contains:
   
{{{
+
<pre>
 
r <- z --> V_abc
 
r <- z --> V_abc
}}}
+
</pre>
   
 
Because of the multiparameter class and the instance declaration, the compiler is able to determine the type of the {{{z --> V_abc}}} expression correctly.
 
Because of the multiparameter class and the instance declaration, the compiler is able to determine the type of the {{{z --> V_abc}}} expression correctly.
Line 802: Line 802:
 
Assume another structure declaration:
 
Assume another structure declaration:
   
{{{
+
<pre>
 
struct def {
 
struct def {
 
void (*abc)(char *,int);
 
void (*abc)(char *,int);
 
};
 
};
}}}
+
</pre>
   
 
perfectly legal from the '''C''' standpoint. Not only do the two different structures have a member with the same name ''abc'', but these members have different types: one is a scalar integer value; another is a function pointer.
 
perfectly legal from the '''C''' standpoint. Not only do the two different structures have a member with the same name ''abc'', but these members have different types: one is a scalar integer value; another is a function pointer.
Line 812: Line 812:
 
The import for ''struct def'' is (its part of the interest):
 
The import for ''struct def'' is (its part of the interest):
   
{{{
+
<pre>
 
newtype S_def = S_def ()
 
newtype S_def = S_def ()
 
instance FieldAccess S_def ((Ptr (CChar) -> CInt -> IO ())) V_abc where
 
instance FieldAccess S_def ((Ptr (CChar) -> CInt -> IO ())) V_abc where
 
z --> V_abc = (#peek __quote__(struct def), abc) z >>= (return . ___60___S_def___mk)
 
z --> V_abc = (#peek __quote__(struct def), abc) z >>= (return . ___60___S_def___mk)
 
(z, V_abc) <-- v = (___60___S_def___wr v) >>= (#poke __quote__(struct def), abc) z
 
(z, V_abc) <-- v = (___60___S_def___wr v) >>= (#poke __quote__(struct def), abc) z
}}}
+
</pre>
   
For this instance, ''a'' = {{{S_def}}}, ''b'' = {{{Ptr (CChar) -> CInt -> IO ()}}}, and ''c'' = {{{V_abc}}} i. e. the same as for ''abc'' in ''struct abc''.
+
For this instance, ''a'' = <tt>S_def</tt>, ''b'' = <tt>Ptr (CChar) -> CInt -> IO ()</tt>, and ''c'' = <tt>V_abc</tt> i. e. the same as for ''abc'' in ''struct abc''.
   
 
So, in
 
So, in
   
{{{
+
<pre>
 
r <- z --> abc
 
r <- z --> abc
 
t <- x --> abc
 
t <- x --> abc
}}}
+
</pre>
   
 
if ''x'' and ''z'' point to structures ''abc'' and ''def'' respectively, types of values bound to ''r'' and ''t'' will be different.
 
if ''x'' and ''z'' point to structures ''abc'' and ''def'' respectively, types of values bound to ''r'' and ''t'' will be different.
   
The {{{<--}}} combinator is used for updating structure/union members with new values. This is a "real" update, not simulated by Haskell state mechanisms.
+
The <tt><--</tt> combinator is used for updating structure/union members with new values. This is a "real" update, not simulated by Haskell state mechanisms.
   
 
The code
 
The code
   
{{{
+
<pre>
 
(z,V_abc) <-- 4 -- fixme: is fromIntegral needed for a constant?
 
(z,V_abc) <-- 4 -- fixme: is fromIntegral needed for a constant?
}}}
+
</pre>
   
 
is equivalent to '''C''' code:
 
is equivalent to '''C''' code:
   
{{{
+
<pre>
 
z -> abc = 4;
 
z -> abc = 4;
}}}
+
</pre>
   
The {{{==>}}} combinator is used to call functions pointed to by structre/union members. While {{{-->}}} may be applied only to "V_" prefixed identifiers, the {{{==>}}} combinator may be applied only to "X_" prefixed identifiers.
+
The <tt>==></tt> combinator is used to call functions pointed to by structre/union members. While <tt>--></tt> may be applied only to "V_" prefixed identifiers, the <tt>==></tt> combinator may be applied only to "X_" prefixed identifiers.
   
 
The difference between "V_" and "X_" prefixed structure member identifiers is similar to that for imported variables holding function pointers:
 
The difference between "V_" and "X_" prefixed structure member identifiers is similar to that for imported variables holding function pointers:
   
{{{
+
<pre>
 
fclose <- dbp --> V_close
 
fclose <- dbp --> V_close
 
ret <- fclose dbp 0
 
ret <- fclose dbp 0
Line 855: Line 855:
   
 
ret <- (dbp ==> X_close) dbp 0
 
ret <- (dbp ==> X_close) dbp 0
}}}
+
</pre>
   
 
The following is complete set of import statements that '''hsffig''' generates for ''struct def'':
 
The following is complete set of import statements that '''hsffig''' generates for ''struct def'':
   
{{{
+
<pre>
 
instance FieldAccess S_def ((Ptr (CChar) -> CInt -> IO ())) V_abc where
 
instance FieldAccess S_def ((Ptr (CChar) -> CInt -> IO ())) V_abc where
 
z --> V_abc = (#peek __quote__(struct def), abc) z >>= (return . ___62___S_def___mk)
 
z --> V_abc = (#peek __quote__(struct def), abc) z >>= (return . ___62___S_def___mk)
Line 873: Line 873:
 
r <- x _1 _2
 
r <- x _1 _2
 
return r
 
return r
}}}
+
</pre>
   
So, {{{z ==> X_abc}}} returns a function with type {{{Ptr (CChar) -> CInt -> IO ()}}}. Because instance with ''V_abc'' is defined with only {{{-->}}} combinator, and instance with ''X_abc'' is defined with only {{{==>}}} combinator, and instance with ''X_abc'' is declared only for members holding pointers to functions, attempt to use wrong combinator-member combination will result in compilation or runtime error.
+
So, <tt>z ==> X_abc></tt> returns a function with type <tt>Ptr (CChar) -> CInt -> IO ()</tt>. Because instance with ''V_abc'' is defined with only <tt>--></tt> combinator, and instance with ''X_abc'' is defined with only <tt>==></tt> combinator, and instance with ''X_abc'' is declared only for members holding pointers to functions, attempt to use wrong combinator-member combination will result in compilation or runtime error.
   
Members of structures/unions which are structures/unions themselves are imported so that {{{-->}}} combinator returns a '''Ptr''' to the member, and {{{<--}}} combinator is not allowed. Example (from ''bfd.h''):
+
Members of structures/unions which are structures/unions themselves are imported so that <tt>--></tt> combinator returns a '''Ptr''' to the member, and <tt><--</tt> combinator is not allowed. Example (from ''bfd.h''):
   
{{{
+
<pre>
 
typedef struct lineno_cache_entry {
 
typedef struct lineno_cache_entry {
 
unsigned int line_number; /* Linenumber from start of function*/
 
unsigned int line_number; /* Linenumber from start of function*/
Line 887: Line 887:
 
} u;
 
} u;
 
} alent;
 
} alent;
}}}
+
</pre>
   
 
imports as:
 
imports as:
   
{{{
+
<pre>
 
newtype S_lineno_cache_entry = S_lineno_cache_entry ()
 
newtype S_lineno_cache_entry = S_lineno_cache_entry ()
 
newtype U_40 = U_40 ()
 
newtype U_40 = U_40 ()
Line 919: Line 919:
 
z --> V_sizeof = return $ (#size __quote__(union {struct symbol_cache_entry *sym; unsigned long offset; }))
 
z --> V_sizeof = return $ (#size __quote__(union {struct symbol_cache_entry *sym; unsigned long offset; }))
   
}}}
+
</pre>
   
 
So, to achieve the same as '''C''' code below does:
 
So, to achieve the same as '''C''' code below does:
   
{{{
+
<pre>
 
alent *pal;
 
alent *pal;
   
Line 929: Line 929:
   
 
off = pal -> u.offset
 
off = pal -> u.offset
}}}
+
</pre>
   
 
a Haskell program would contain:
 
a Haskell program would contain:
   
{{{
+
<pre>
 
pal <- -- whatever returns IO (Ptr S_lineno_cache_entry)
 
pal <- -- whatever returns IO (Ptr S_lineno_cache_entry)
   
 
off <- (pal --> V_u) >>= \u -> (u --> V_offset)
 
off <- (pal --> V_u) >>= \u -> (u --> V_offset)
}}}
+
</pre>
   
=== Bit Fields ===
+
==== Bit fields ====
   
Bit fields (declared as {{{int somefield:n}}}) are supported by '''hsffig''', but there are some additional things that need attention.
+
Bit fields (declared as <tt>int somefield:n</tt>) are supported by '''hsffig''', but there are some additional things that need attention.
   
There is no such thing as ''offsetof'' for bit fields. To implement the {{{-->}}} and {{{<--}}} combinators, special inline '''C''' functions are embedded in the hsc code '''hsffig''' generates. This is achieved with the ''#def'' macro that '''hsc2hs''' recognizes.
+
There is no such thing as ''offsetof'' for bit fields. To implement the <tt>--></tt> and <tt><--</tt> combinators, special inline '''C''' functions are embedded in the hsc code '''hsffig''' generates. This is achieved with the ''#def'' macro that '''hsc2hs''' recognizes.
   
 
These functions are placed in a special '''C''' file. This file must be compiled and linked to the final executable. Consider the following '''C''' header (''bf.h''):
 
These functions are placed in a special '''C''' file. This file must be compiled and linked to the final executable. Consider the following '''C''' header (''bf.h''):
   
{{{
+
<pre>
 
union bitfields {
 
union bitfields {
 
struct {
 
struct {
Line 957: Line 957:
 
int y;
 
int y;
 
};
 
};
}}}
+
</pre>
   
 
and the Haskell program (''bftest.hs''):
 
and the Haskell program (''bftest.hs''):
   
{{{
+
<pre>
 
-- Test for bit fields.
 
-- Test for bit fields.
   
Line 979: Line 979:
 
y <- (bfu --> V_y)
 
y <- (bfu --> V_y)
 
putStrLn $ "y = " ++ (show y)
 
putStrLn $ "y = " ++ (show y)
}}}
+
</pre>
   
 
When '''hsffig''' processes ''bf.h'' it creates ''BF_H.hsc''. When '''hsc2hs''' processes ''BF_H.hsc'' it creates ''BF_H.hs'', and additionally ''BF_H_hsc.h'' and ''BF_H_hsc.c''. The latter is recommended to compile with '''ghc''':
 
When '''hsffig''' processes ''bf.h'' it creates ''BF_H.hsc''. When '''hsc2hs''' processes ''BF_H.hsc'' it creates ''BF_H.hs'', and additionally ''BF_H_hsc.h'' and ''BF_H_hsc.c''. The latter is recommended to compile with '''ghc''':
   
{{{
+
<pre>
 
ghc -c BF_H_hsc.c
 
ghc -c BF_H_hsc.c
}}}
+
</pre>
   
 
which provides correct compilation environment, so necessary paths for the runtime header files will be passed to '''gcc''' properly.
 
which provides correct compilation environment, so necessary paths for the runtime header files will be passed to '''gcc''' properly.
Line 991: Line 991:
 
After all, the executable may be compiled and linked:
 
After all, the executable may be compiled and linked:
   
{{{
+
<pre>
 
ghc -ffi -fglasgow-exts -package HSFFIG --make bftest.hs BF_H_hsc.o -o bftest
 
ghc -ffi -fglasgow-exts -package HSFFIG --make bftest.hs BF_H_hsc.o -o bftest
}}}
+
</pre>
   
 
If ''bftest'' is run, it prints "9", as one might expect (at least on the x86 architecture).
 
If ''bftest'' is run, it prints "9", as one might expect (at least on the x86 architecture).
   
=== Foreign Structures/Unions as Storables and Alloca ===
+
==== Foreign structures/unions as storables and alloca ====
   
 
It is often necessary to pre-allocate a foreign ('''C''') structure/union, so a foreign function can use it. The ''alloca'' and ''malloc'' functions are defined in Haskell FFI for this purpose. They require an argument belonging to the class ''Storable''.
 
It is often necessary to pre-allocate a foreign ('''C''') structure/union, so a foreign function can use it. The ''alloca'' and ''malloc'' functions are defined in Haskell FFI for this purpose. They require an argument belonging to the class ''Storable''.
Line 1,003: Line 1,003:
 
So, for every structure type imported by '''hsffig''', iinstance of ''Storable'' is declared (for ''struct abc'' used in the example above):
 
So, for every structure type imported by '''hsffig''', iinstance of ''Storable'' is declared (for ''struct abc'' used in the example above):
   
{{{
+
<pre>
 
instance Storable S_abc where
 
instance Storable S_abc where
 
sizeOf _ = (#size __quote__(struct abc))
 
sizeOf _ = (#size __quote__(struct abc))
Line 1,009: Line 1,009:
 
peek _ = error $ "peek and poke cannot be used with struct abc"
 
peek _ = error $ "peek and poke cannot be used with struct abc"
 
poke _ = error $ "peek and poke cannot be used with struct abc"
 
poke _ = error $ "peek and poke cannot be used with struct abc"
}}}
+
</pre>
   
 
These four methods are minimally required implementation of ''Storable'' per the FFI Addendum.
 
These four methods are minimally required implementation of ''Storable'' per the FFI Addendum.
   
Actually, only ''sizeOf'' plays a role here, returning the same value as '''C''' expression {{{sizeof (struct abc)}}} would return.
+
Actually, only ''sizeOf'' plays a role here, returning the same value as '''C''' expression <tt>sizeof (struct abc)</tt> would return.
   
 
An example below (retrieval of data from Berkeley DB) shows the use of everything discussed above:
 
An example below (retrieval of data from Berkeley DB) shows the use of everything discussed above:
   
{{{
+
<pre>
 
(ret,
 
(ret,
 
ks,
 
ks,
Line 1,035: Line 1,035:
 
return (r, ks, vs)
 
return (r, ks, vs)
 
else return (r, undefined, undefined)
 
else return (r, undefined, undefined)
}}}
+
</pre>
   
It may be noted again that a Haskell application developer does not need to remember type identifiers of foreign structures involved. The compiler, using FFI import statements and the multiparameter class' functional dependencies, correctly infers all necessary type information. In fact, type of ''dbkey'' and ''dbdata'' in the example above should be ''Ptr S''_''''''_''''''_''db''_''dbt'' (as defined in the header: {{{typedef struct __db_dbt { ... } DBT;}}}). But this type is not mentioned explicitly in the code.
+
It may be noted again that a Haskell application developer does not need to remember type identifiers of foreign structures involved. The compiler, using FFI import statements and the multiparameter class' functional dependencies, correctly infers all necessary type information. In fact, type of ''dbkey'' and ''dbdata'' in the example above should be ''Ptr S''_''''''_''''''_''db''_''dbt'' (as defined in the header: <tt>typedef struct __db_dbt { ... } DBT;</tt>). But this type is not mentioned explicitly in the code.
   
= Compilation of Applications Using Bindings Created with HSFFIG =
+
== Compilation of applications using bindings created with HSFFIG ==
   
 
Applications using bindings created with '''hsffig''' must be compiled/linked with the option ''-package HSFFIG'' passed to '''ghc'''. In the example below, the application consists of a single module containing the ''main'' funstion: ''main.hs'', and the '''C''' library header file is ''header.h''. The library file name is ''library.a''.
 
Applications using bindings created with '''hsffig''' must be compiled/linked with the option ''-package HSFFIG'' passed to '''ghc'''. In the example below, the application consists of a single module containing the ''main'' funstion: ''main.hs'', and the '''C''' library header file is ''header.h''. The library file name is ''library.a''.
Line 1,045: Line 1,045:
 
The ''main.hs'' file must contain an import statement:
 
The ''main.hs'' file must contain an import statement:
   
{{{
+
<pre>
 
import HEADER_H
 
import HEADER_H
}}}
+
</pre>
   
 
When '''hsffig''' processes ''header.h'' it creates ''HEADER_H.hsc''. When '''hsc2hs''' processes ''HEADER_H.hsc'' it creates ''HEADER_H.hs'', and additionally ''HEADER_H_hsc.h'' and ''HEADER_H_hsc.c''. The latter is recommended to compile with '''ghc''':
 
When '''hsffig''' processes ''header.h'' it creates ''HEADER_H.hsc''. When '''hsc2hs''' processes ''HEADER_H.hsc'' it creates ''HEADER_H.hs'', and additionally ''HEADER_H_hsc.h'' and ''HEADER_H_hsc.c''. The latter is recommended to compile with '''ghc''':
   
{{{
+
<pre>
 
ghc -c HEADER_H_hsc.c
 
ghc -c HEADER_H_hsc.c
}}}
+
</pre>
   
 
which provides correct compilation environment, so necessary paths for the runtime header files will be passed to '''gcc''' properly.
 
which provides correct compilation environment, so necessary paths for the runtime header files will be passed to '''gcc''' properly.
Line 1,059: Line 1,059:
 
After all, the executable may be compiled and linked:
 
After all, the executable may be compiled and linked:
   
{{{
+
<pre>
 
ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_H_hsc.o library.a -o main
 
ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_H_hsc.o library.a -o main
}}}
+
</pre>
   
 
The library of the package ''HSFFIG'' contains definition of the multiparameter class (''HSFFIG.Field''''''Access.Field''''''Access'') and the combinators necessary to access members of '''C''' structures/unions (see above). It is not necessary to directly import the '''hsffig''' modules in the application modules because the binding module re-exports them. It is however necessary to use the ''-fglasgow-exts'' options of '''ghc'''.
 
The library of the package ''HSFFIG'' contains definition of the multiparameter class (''HSFFIG.Field''''''Access.Field''''''Access'') and the combinators necessary to access members of '''C''' structures/unions (see above). It is not necessary to directly import the '''hsffig''' modules in the application modules because the binding module re-exports them. It is however necessary to use the ''-fglasgow-exts'' options of '''ghc'''.
Line 1,067: Line 1,067:
 
So, in the simplest case, the following steps are necessary:
 
So, in the simplest case, the following steps are necessary:
   
{{{
+
<pre>
 
gcc -E -dD header.h | hsffig > HEADER_H.hsc
 
gcc -E -dD header.h | hsffig > HEADER_H.hsc
   
Line 1,075: Line 1,075:
   
 
ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_HSC.o library.a -o main
 
ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_HSC.o library.a -o main
}}}
+
</pre>
   
 
In some cases, size of the autogenerated Haskell code of the binding module is too large so it cannot be compiled due to lack of system resources. In this case, the binding module has to be split (see above, '''Splitting Large Modules'''). Steps to compile and build an executable now are:
 
In some cases, size of the autogenerated Haskell code of the binding module is too large so it cannot be compiled due to lack of system resources. In this case, the binding module has to be split (see above, '''Splitting Large Modules'''). Steps to compile and build an executable now are:
   
{{{
+
<pre>
 
gcc -E -dD header.h | hsffig > HEADER_H.hsc
 
gcc -E -dD header.h | hsffig > HEADER_H.hsc
   
Line 1,089: Line 1,089:
   
 
ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_HSC.o library.a -o main
 
ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_HSC.o library.a -o main
}}}
+
</pre>
   
 
The only one step added is running ''splitter'' over the Haskell source code generated by '''hsc2hs'''. The way the final executable is linked does not change.
 
The only one step added is running ''splitter'' over the Haskell source code generated by '''hsc2hs'''. The way the final executable is linked does not change.
   
= Known Problems and Workarounds =
+
== Known problems and workarounds ==
   
== Variadic Functions ==
+
=== Variadic functions ===
   
'''C''' functions whose argument list ends with an ellipsis (...) in other words taking variable number of arguments cannot be imported by '''hsffig'''. A common example is ''printf''. If '''hsffig''' encounters such a function, it places a comment in its output explaining why the function could not be imported. However if a variable or a structure/union member is defined as a pointer to a variadic function is imported as a pointer to a function with type '''IO ()''' i. e. {{{void (*p)(void)}}}. This preserves the ability to query/set function pointer from a Haskell program.
+
'''C''' functions whose argument list ends with an ellipsis (...) in other words taking variable number of arguments cannot be imported by '''hsffig'''. A common example is ''printf''. If '''hsffig''' encounters such a function, it places a comment in its output explaining why the function could not be imported. However if a variable or a structure/union member is defined as a pointer to a variadic function is imported as a pointer to a function with type '''IO ()''' i. e. <tt>void (*p)(void)</tt>. This preserves the ability to query/set function pointer from a Haskell program.
   
 
So:
 
So:
   
{{{
+
<pre>
 
int (*pvard)(int,int,...);
 
int (*pvard)(int,int,...);
   
 
int vard (int,int,...);
 
int vard (int,int,...);
}}}
+
</pre>
   
 
imports as:
 
imports as:
   
{{{
+
<pre>
 
foreign import ccall "header.h &pvard"
 
foreign import ccall "header.h &pvard"
 
___58___ :: Ptr (FunPtr (IO ()))
 
___58___ :: Ptr (FunPtr (IO ()))
Line 1,127: Line 1,127:
 
-- because of the following reason(s):
 
-- because of the following reason(s):
 
-- function is variadic
 
-- function is variadic
}}}
+
</pre>
   
== Functions/Variables of Structure Type ==
+
=== Functions/variables of structure type ===
   
If a function or a variable has a structure (not structure pointer) type, '''hsffig''' cannot import it. If '''hsffig''' encounters such a function, it places a comment in its output explaining why the function could not be imported. However, if a structure/union member is itself a structure/union, it is imported so that the {{{-->}}} combinator returns a '''Ptr''' to that member. However the {{{<--}}} combinator is not defined for such members.
+
If a function or a variable has a structure (not structure pointer) type, '''hsffig''' cannot import it. If '''hsffig''' encounters such a function, it places a comment in its output explaining why the function could not be imported. However, if a structure/union member is itself a structure/union, it is imported so that the <tt>--></tt> combinator returns a '''Ptr''' to that member. However the <tt><--</tt> combinator is not defined for such members.
   
 
So:
 
So:
   
{{{
+
<pre>
 
struct {int a,b,c,d;} *allocs(int,int,int,int);
 
struct {int a,b,c,d;} *allocs(int,int,int,int);
   
 
struct {int a,b,c,d;} directs(int,int,int,int);
 
struct {int a,b,c,d;} directs(int,int,int,int);
}}}
+
</pre>
   
 
imports as:
 
imports as:
   
{{{
+
<pre>
 
foreign import ccall "static header.h allocs"
 
foreign import ccall "static header.h allocs"
 
f_allocs :: CInt -> CInt -> CInt -> CInt -> IO (Ptr (S_56))
 
f_allocs :: CInt -> CInt -> CInt -> CInt -> IO (Ptr (S_56))
Line 1,154: Line 1,154:
 
-- function takes/returns structure(s) directly
 
-- function takes/returns structure(s) directly
 
--
 
--
}}}
+
</pre>
   
 
Note that while these two functions have return types looking the same, they are not the same from the '''C''' compiler standpoint, nor are they the same from the '''hsffig''' standpoint.
 
Note that while these two functions have return types looking the same, they are not the same from the '''C''' compiler standpoint, nor are they the same from the '''hsffig''' standpoint.
   
== Unresolved Symbols When Linking ==
+
=== Unresolved symbols When linking ===
   
 
As mentioned above, '''hsffig''' generates FFI import declarations for every function it finds a prototype for. It may rarely happen, the header file has prototypes for functions not defined by the library the header is related to. As an example, consider the following short program printing "Hello World" using the low-level ''write'' function of the standard '''C''' POSIX library:
 
As mentioned above, '''hsffig''' generates FFI import declarations for every function it finds a prototype for. It may rarely happen, the header file has prototypes for functions not defined by the library the header is related to. As an example, consider the following short program printing "Hello World" using the low-level ''write'' function of the standard '''C''' POSIX library:
   
{{{
+
<pre>
 
module Main where
 
module Main where
   
Line 1,169: Line 1,169:
 
main = withCStringLen "Hello World\n" $ \hello -> do
 
main = withCStringLen "Hello World\n" $ \hello -> do
 
f_write (fromIntegral 0) (fst hello) (fromIntegral $ snd hello)
 
f_write (fromIntegral 0) (fst hello) (fromIntegral $ snd hello)
}}}
+
</pre>
   
The prototype for ''write'' is contained in the ''unistd.h'' include file usually located in ''/usr/include''. However if the program above is compiled and linked the "usual" way, two symbols remain unresolved by the linker: {{{__}}}''ftruncate'', and ''pthread_atfork'' *. As a result, the linker fails to create the executable file.
+
The prototype for ''write'' is contained in the ''unistd.h'' include file usually located in ''/usr/include''. However if the program above is compiled and linked the "usual" way, two symbols remain unresolved by the linker: <tt>__</tt>''ftruncate'', and ''pthread_atfork'' *. As a result, the linker fails to create the executable file.
   
 
The possible workarounds are:
 
The possible workarounds are:
   
* To find the library where missing symbols are defined, and specify it on the '''ghc''' command line to link the executable.
+
* To find the library where missing symbols are defined, and specify it on the '''ghc''' command line to link the executable.
* To tell the linker to ignore the unresolved symbols and build the executable no matter what.
+
* To tell the linker to ignore the unresolved symbols and build the executable no matter what.
* To use the ''-split-objs'' option that '''ghc''' provides ('''Note:''' See also HsffigLinkageOptimization for more detailed discussion of object file splitting possibilities).
+
* To use the ''-split-objs'' option that '''ghc''' provides ('''Note:''' See also HsffigLinkageOptimization for more detailed discussion of object file splitting possibilities).
   
 
The first option provides the best consistency, but it may be hard sometimes to find the correct library, especially for such low-level functions perhaps not assumed to be called directly.
 
The first option provides the best consistency, but it may be hard sometimes to find the correct library, especially for such low-level functions perhaps not assumed to be called directly.
Line 1,195: Line 1,195:
 
The Makefile for this example program is shown below. It features the second and the third ways to work around the unresolved symbols problem.
 
The Makefile for this example program is shown below. It features the second and the third ways to work around the unresolved symbols problem.
   
{{{
+
<pre>
 
GCC = gcc
 
GCC = gcc
 
GHC = ghc
 
GHC = ghc
Line 1,235: Line 1,235:
 
clean:
 
clean:
 
rm -rf syscall-1 syscall-2 libUNISTD_H.a UNISTD_H* UNISTD_H_split *.o *.hi
 
rm -rf syscall-1 syscall-2 libUNISTD_H.a UNISTD_H* UNISTD_H_split *.o *.hi
}}}
+
</pre>
   
 
----
 
----
Line 1,241: Line 1,241:
   
 
----
 
----
[DimitryGolubovsky]
+
[[DimitryGolubovsky]]
 
----
 
----
[CategoryTutorial], [CategoryTools]
+
  +
[[Category:Tutorials]]

Latest revision as of 11:47, 3 June 2009

Contents

[edit] 1 Program input

Hsffig acts as a filter (in Unix sense) which means that it consumes all its input information on its standard input (stdin), and produces its output on standard output (stdout). No other files are involved.

The hsffig program is invoked as follows:

gcc -E -dD header.h | hsffig > HEADER_H.hsc

The -E option instructs gcc only to preprocess (no compilation) its input file, and the -dD option instructs it to emit #define preprocessor statements along with the preprocessed header (without -dD, autogenerated binding module will not contain import statements for constants defined this way).

[edit] 1.1 Preprocessor requirements

It is assumed that majority of hsffig users will use tools from the GNU Compiler Collection (GCC), whose preprocessor complies with the hsffig requirements. For other preprocessors from other vendors, the following is expected:

  • The very first line of preprocessor output will contain the following construct:
# 1 header.h

where header.h is the name of the "toplevel" header file; number following after the hash sign does not matter.

Since hsffig does not take any command line arguments, the only way to determine the name of the module is the preprocessor output itself. When hsffig encounters a line like shown above (starting with hash sign and number), it derives the autogenerated Haskell module name by stripping the directory part from the header file name (as in basename(1)), uppercasing all characters (therefore the header file name cannot start with a digit), and replacing dots with underscores. Therefore, for a header file header.h, the autogenerated module will be named HEADER_H.

If compilant preprocessor is not available, and control lines like discussed above cannot appear in the preprocessor output (with gcc, this may be simulated with -P option), a special token @@MODULENAME@@ will be used in the Module statement. Later, it may be substituted with anything desirable with a program like sed(1). Also, if hsffig is unable to determine the header file name from its input, constants defined via #define will not be imported.

[edit] 2 Program output

Hsffig outputs hsc code on its standard output, to be redirected to a file or (possibly) to some other filter program. The output is ready for processing by hsc2hs. Description of hsc2hs input file format may be found here:

http://haskell.org/ghc/docs/6.2.2/html/users_guide/hsc2hs.html

The output file will contain the following sections:

  • Special Definitions for hsc2hs
  • Module header
  • Constants defined with #define
  • For each structure/union imported, an instance of the above mentioned class to access each member
  • Constants defined via enumerations
  • Import declarations for each function/extern variable declared

The preprocessor outputs preprocessed contents of not only the header file specified on its command line, but also of all other header files included within. This results in importing of some extra stuff (for instance, if a header file has #include <stdio.h> inside, everything defined in stdio.h will be imported as well). This merely gives the Haskell compiler which "connects" the library to an application written in Haskell the same look as the C compiler would have if it were "connecting" the library to an application written in C.

In first lines of generated hsc code, an #include statement is present to include the same header file that was consumed by hsffig. If hsffig is unable to determine header file name from its input (see Preprocessor Requirements), a special token @@INCLUDEFILE@@ will be placed with the #include statement instead. Similarly to the module name substitute, it may be later converted into a real header file name with a program like sed.

[edit] 3 Special definitions for hsc2hs

In generated hsc code, the following hsc2hs macros are used to access members of structures/unions:

  • #peek
  • #poke
  • #ptr

These macros take type declaration of the structure whose members are being accessed, as its first argument.

Consider the following structure declaration (a slightly modified fragment of BerkeleyDB's db.h):


struct __dbc {
        DB *dbp;                        /* Related DB access method. */
        DB_TXN   *txn;                  /* Associated transaction. */

        struct {
                DBC *tqe_next, **tqe_prev;
        } links;
        /* . . . other members . . . */
}

To access members of the {{{struct ___dbc}}}, e. g. to peek a value of dbp, the following hsc2hs macro will be used:

(#peek struct __dbc, dbp)

But links is declared "anonymously", i. e. its structure type is not named. The only way to construct a #peek/#poke/#ptr macro is to put the whole structure declaration as its first argument:

(#peek struct {DBC *tqe_next, **tqe_prev;}, tqe_next)

This will cause an error because the C preprocessor macro that hsc2hs converts the above construct into, takes only two arguments, but comma in the structure declaration will be recognized as argument separator, i. e. there will be three arguments.

To work this around, hsffig makes the following declaration at the beginning of generated hsc code:

#ifndef __quote__
#define __quote__(x...) x
#endif

This is a gcc feature called "macro varargs". The only thing this macro does is to represent its arguments whatever is their number, as a single token for the preprocessor.

So the macros from the example above will look:

(#peek __quote__(struct __dbc), dbp)
(#peek __quote__(struct {DBC *tqe_next, **tqe_prev;}), tqe_next)

Additionally, the template file that hsc2hs needs to convert hsc to Haskell code needs to be modified. It uses the offsetof macro which also takes two arguments, but passing anonymously declared structure type containing commas will cause the same problems. So in the modified template file, the same __quote__ macro is used with offsetof:

#define hsc_peek(t, f) \
    printf ("(\\hsc_ptr -> peekByteOff hsc_ptr %ld)", (long) offsetof (__quote__(t), f));

These special definitions are placed in the very beginning of every file that hsffig generates. A separate template file for hsc2hs is no longer needed. So, to generate importable Haskell code, a command like one below needs to be issued:

hsc2hs -t /dev/null HEADER_H.hsc -o HEADER_H.hs

Specifyiing /dev/null as template file name causes hsc2hs not to use any template file.

[edit] 3.1 Splitting large modules

Considering that hsffig generates import statements for everything it encounters in all header files included within the toplevel header file, amount of generated Haskell code may be quite large, so it cannot be compiled by ghc at once due to memory limitations. To overcome this, a special utility, splitter is included with hsffig. In generated hsc code, the following comments are inserted:


-- Split begin/DB_H

{-# OPTIONS -fglasgow-exts -ffi #-}
#include "db.h"
#ifndef __quote__
#define __quote__(x...) x
#endif

module DB_H(
  module DB_H,
{-- #SPLIT#
  module DB_H_C,
  module DB_H_S,
  module DB_H_F,
  module DB_H_E,
  module DB_H_S_cnt,
#SPLIT# --}
  module Foreign,
  module Foreign.C.String,
  module Foreign.C.Types) where

import Foreign
import Foreign.Ptr
import Foreign.C.Types
import Foreign.C.String
{-- #SPLIT#
import DB_H_C
import DB_H_S
import DB_H_F
import DB_H_E
import DB_H_S_cnt
#SPLIT# --}

-- Split end

This example is found in the very begining of generated hsc code. The splitter program reads its input file line by line. Comments like {{{-- Split begin/DB_H}}} redirect all subsequent lines in a file whose name is composed of characters following after '/' with '.hs' appended. Comments like {{{-- Split end}}} close output file: input lines are lost until another Split begin comment is met. Lines {{{ {-- #SPLIT# }}} and {{{ #SPLIT# --} }}} are removed from output, so enclosed pieces of Haskell code are uncommented.

As a result, one large module is split into many parts:

  • The "Root" module to be imported into an application: name derived from the header file name, see above, e. g. DB_H
  • A separate module containing import statements for constants declared with #define: name is derived from the header file name and postfixed with "_C", e. g. DB_H_C
  • A separate module containing import statements for constants declared via enumerations: name is postfixed with "_E", e. g. DB_H_E
  • A separate module containing import statements for variables and functions: name is postfixed with "_F", e. g. DB_H_F
  • A separate module containing declaration of a multiparameter class and datatypes necessary to access members of structures/unions: name is postfixed with "_S_cnt", e. g. DB_H_S_cnt (Note: as of the patchlevel 2, instead of a single module postfixed with "_S_cnt", three separated modules are created, positixed with: "_S_n" for newtype declarations, "_S_t" for type aliases, and "_S_d" for declarations of structure member selector datatypes (V_..., X_..., D_...).
  • A separate module for each structure/union defined in the header file and files included within: name is postfixed with "_S_" and followed by the structure type identifier derived by hsffig, e. g. DB_H_S_S{{{___}}}db_dbt for {{{struct __db_dbt}}}, or module DB_H_S_S_1142 for a structure declared anonymously (hsffig assigns unique numeric identifiers to such structures).

However, application developers do not need to know all these module names: all they still need to know is name of the "root" module (DB_H in our example) to be imported.

It is recommended to use ghc with --make option: this forces the compiler to chase all module dependencies automatically. Since all the modules created by splitter contain correct import statements, no module will be left behind.

[edit] 4 FFI import details

[edit] 4.1 Naming conventions

A significant difference between C and Haskell is that C does not force starting character of its variable, function, or type identifiers to be of certain case while Haskell has requirements of this kind. To resolve this problem, the following rules apply when an identifier declared in a C library is imported in a Haskell application.

All identifiers imported from a C library, or created to make import easier, are prefixed as shown in the table below:

Prefix Purpose Sort of Import
Constant Variable Function
Pointer
Function Structure
Type
Structure
Member
c_ Constant imported with #define Yes No No No No No
e_ Constant imported with enum Yes No No No No No
f_ Standalone function No No No Yes No No
p_ Pointer (memory address) of a variable No Yes Yes No No No
v_ Value of a variable No Yes Yes No No No
s_ Set a variable No Yes Yes No No No
x_ Function pointed to by a variable No No Yes No No No
w_ Wrapper for a function argument No No Yes Yes No No
S_, U_ Structure/Union type as imported No No No No Yes No
V_ Member of a structure (value) No No No No No Yes
D_ Member of a structure - arrays only (dimensions) No No No No No Yes
X_ Member of a structure (callable) No No No No No Yes
T_ C Type Alias (typedef) as imported No No No No No No

The table summarizes all prefixes hsffig adds to imported identifiers, and corresponding types of import. Flags Yes and No show whether given prefix and import purpose is suitable for each sort of imported entity.

[edit] 4.2 Typedefs

C types declared with the typedef statement are imported as type aliases prefixed with "T_". These type aliases may be more convenient to use in certain situations.

The following C declaration:

typedef union {
  struct {
    int a:1;
    int b:1;
    int c:1;
    int d:1;
  } x;
  int y;
} *PBF;

imports as:

type T_PBF = Ptr (U_9)

newtype S_5 = S_5 ()
newtype U_9 = U_9 ()

-- instances to access S_5 and U_9 members follow

In the following Haskell program:

module Main where

import BF_H

main = do
  putStrLn $ "Test of Bit Fields"

  alloca $ \(bfu :: T_PBF) -> do
    x <- (bfu --> V_x)
    (x, V_a) <-- fromIntegral 1
    (x, V_b) <-- fromIntegral 0
    (x, V_c) <-- fromIntegral 0
    (x, V_d) <-- fromIntegral 1
    y <- (bfu --> V_y)
    putStrLn $ "y = " ++ (show y)

it is necessary to declare the type of bfu because there is no foreign import declaration that would drive the type inference mechanism. Since the union is declared anonymously, the type derived by hsffig contains a numeric identifier that is hard to remember. Imported typedef helps because it may be used instead, and is controlled by a developer (Ptr U_9 couls be used with the same effect).

Per FFI Addendum, the following typedefs are imported as they are (the Addendum defines names of Haskell types equivalent to them):

* ptrdiff_t
* size_t
* wchar_t
* sig_atomic_t
* clock_t
* time_t
* FILE
* fpos_t
* jmp_buf

[edit] 4.3 Constants (#define)

For each constant declared with #define, generated hsc code looks as follows:

#define BIG_ENDIAN      __BIG_ENDIAN
#define PDP_ENDIAN      __PDP_ENDIAN
#define BYTE_ORDER      __BYTE_ORDER
#define S_IREAD       S_IRUSR
#define S_IRGRP (S_IRUSR >> 3)
#define S_IROTH (S_IRGRP >> 3)

becomes

c_BIG_ENDIAN = #const BIG_ENDIAN
c_PDP_ENDIAN = #const PDP_ENDIAN
c_BYTE_ORDER = #const BYTE_ORDER
c_S_IREAD = #const S_IREAD
c_S_IRGRP = #const S_IRGRP
c_S_IROTH = #const S_IROTH

that is, hsffig does not deal with actual value of the constant: it is left to hsc2hs.

However, arbitrary #define cannot be imported this way. Some preprocessor macros have parameters, or contain C statements, or otherwise cannot be assignment expression's RHS. To rule such preprocessor macros out, for each #define hsffig encounters on its input, a short C program is created, like this:

#include "header.h"
static int a = BIG_ENDIAN;

for #define BIG_ENDIAN __BIG_ENDIAN. The command line for gcc is:

gcc -pipe -x c -q -fsyntax-only - 2>/dev/null

If this program passes syntax check, the constant is imported, otherwise not imported.

This explains why without being able to determine the header file name from its input, hsffig is unable to import #define 'd constants. This also shows that processing time for a header will be rougly linear to the number of #define preprocessor statements in the header.

Constants are not in the IO monad, and may be used in any function. Bit operations (from Foreign.C.Bits) may be applied to them:

dbperm = c_S_IRUSR .|. c_S_IWUSR .|. c_S_IRGRP .|. c_S_IWGRP .|. c_S_IROTH

may be used to specify file permissions {{{rw-rw-r--}}}

[edit] 4.4 Constants (enum)

Enumerations are imported similarly to preprocessor-defined constants, but identifiers are prefixed with "e_".

An example showing usage of both #define and enumeration constants:

  dbop <- dbp --> V_open
  ret <- withCString "access.db" $ \dbname -> do
    r <- dbop dbp
              nullPtr
              dbname
              nullPtr
              (fromIntegral e_DB_BTREE)
              (fromIntegral c_DB_CREATE)
              (fromIntegral dbperm)
    return r

fromIntegral may be necessary to apply to constants when calling foreign functions because the Haskell compiler infers the type for the following sort of expressions:

c_DB_CREATE = 1
e_DB_BTREE = 1

as Integer while the function referred to by dbop in the example above requires CInt, CUInt, CInt for its last three arguments.

[edit] 4.5 Variables

[edit] 4.5.1 Regular variables

Regular variables contain either values or pointers to other variables (not functions). For this sort of imports, the following identifiers are created:

  • a pointer to the variable itself: prefix is "p_"
  • a nullary function returning variable's value in the IO monad: prefix is "v_"
  • a unary function taking a value and assigning it to the variable (in the IO monad): prefix is "s_"

So, for a variable declared as

extern int a;

FFI import declarations will be as follows:

foreign import ccall "header.h &a"
  ___2___ :: Ptr (CInt)
p_a = ___2___
v_a = peek ___2___
s_a = poke ___2___

where ___2___ is one of internal unique names that hsffig assigns to all identifiers it encounters in the header file.

So, to retrieve value from a variable, v_a is used:

x <- v_a -- equivalent to peek p_a

To update the variable value, s_a is used:

s_a 4 -- equivalent to poke p_a 4

If a variable contains pointer to a value, e. g

extern int *ap;

then it is imported as follows:

foreign import ccall "header.h &ap"
  ___3___ :: Ptr (Ptr (CInt))
p_ap = ___3___
v_ap = peek ___3___
s_ap = poke ___3___

so to dereference the pointer, additional peek is needed:

c <- v_ap >>= peek

and s_ap will update the variable itself with new address, rather than the value it points to.

[edit] 4.5.2 Function pointers

Function pointers contain pointers to executable code.

For this sort of imports, the following identifiers are created:

  • pointer to the variable itself: prefix is "p_"
  • a function the variable points to (with proper arity and type signature): prefix is "x_"
  • a nullary function returning the function pointed to by the variable (in the IO monad): prefix is "v_"
  • a unary function updating the variable with new function reference: prefix is "s_"
  • optionally: wrapper functions for arguments taking function references: prefix is "w_"

A variable declared as:

int (*c)(long,int);

is imported as follows:

foreign import ccall "header.h &c"
  ___8___ :: Ptr (FunPtr (CLong -> CInt -> IO CInt))
p_c = ___8___
foreign import ccall "dynamic"
  ___8___mk___ :: FunPtr (CLong -> CInt -> IO CInt) -> (CLong -> CInt -> IO CInt)
foreign import ccall "wrapper"
  ___8___wr___ :: (CLong -> CInt -> IO CInt) -> IO (FunPtr (CLong -> CInt -> IO CInt))
x_c _1 _2 = peek ___8___ >>= \s -> ___8___mk___ s _1 _2
v_c = peek ___8___ >>= (return . ___8___mk___)
s_c = \s -> ___8___wr___ s >>= poke ___8___

Note that the function referred to by c returns IO CInt. It is assumed that all functions imported from a C library are unsafe, and therefore forced to be monadic. Based on the same assumption, all calls are considered "safe": this also allows calling Haskell functions from C code.

A variable declared as a function pointer in C , from the Haskell point of view contains a FunPtr i. e. virtual address of some foreign function's entry point. This cannot be used by a Haskell application directly, so a special dynamic import function needs to be declared. Such a pseudo-function (only its name and type signature need to be declared: the Haskell compiler does the rest) converts a FunPtr into a function usable by Haskell applications. Another pseudo-function, wrapper import does the opposite: converts a Haskell function into a monadic value containing a FunPtr i. e. a virtual address of an entry point usable by foreign functions (useful for callbacks).

For the variable - function pointer c, p_c returns pointer to the variable, just like for a regular variable. However, v_c returns the variable value passed through the dynamic import pseudo-function, so it may be used directly in Haskell applications. The setter function, s_c takes a Haskell function with proper type signature and updates the c variable value with new function address. The function x_c is the foreign function c points to.

The difference between v_c and x_c is illustrated below:

pc1 <- v_c
r1 <- pc1 33::CLong 21::CInt

-- but

r2 <- x_c 33::CLong 21::CInt

v_c is a monadic value; x_c is not.

Optionally additional wrapper imports are created if a function pointed to by a variable takes other functions as arguments.

Consider the following:

typedef int (*F2) (long);
typedef float (*X2) (double);
F2 (*ee) (X2);

The variable ee is a pointer to a function taking a function from double to float, and returning a function from long to int.

The import looks as follows:

foreign import ccall "header.h &ee"
  ___40___ :: Ptr (FunPtr (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt))))
p_ee = ___40___
foreign import ccall "dynamic"
  ___40___mk___ :: FunPtr (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt))) ->
    (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt)))
foreign import ccall "wrapper"
  ___40___wr___ :: (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt))) ->
    IO (FunPtr (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt))))
x_ee _1 = peek ___40___ >>= \s -> ___40___mk___ s _1
v_ee = peek ___40___ >>= (return . ___40___mk___)
s_ee = \s -> ___40___wr___ s >>= poke ___40___
foreign import ccall "wrapper"
  w_ee_1 :: (CDouble -> IO CFloat) -> IO (FunPtr (CDouble -> IO CFloat))

An additional import declaration, w_ee_1 has been created. It may be used if ee is called, and some Haskell function is passed as a parameter:

mydoublefloat :: CDouble -> IO CFloat
mydoublefloat = {-- function body here --}

mydoublefloat_wrapped <- w_ee_1 mydoublefloat

foo <- x_ee mydoublefloat_wrapped

-- or

foo <- w_ee_1 mydoublefloat >>= x_ee

Note that in the current version, hsffig does not create a dynamic import when a function imported returns a FunPtr.

Name of autogenerated wrapper import for function arguments ends with "_1". If a function takes more than one argument - function then more than one wrapper import will be created (one per function type). For example:

void (*psetpdf)(double (*df)(double,double),int,long,void(*)(int));

imports as:

foreign import ccall "header.h &psetpdf"
  ___6___ :: Ptr (FunPtr (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong -> FunPtr (CInt -> IO ()) -> IO ()))
p_psetpdf = ___6___
foreign import ccall "dynamic"
  ___6___mk___ :: FunPtr (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong ->
    FunPtr (CInt -> IO ()) -> IO ()) ->
    (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong -> FunPtr (CInt -> IO ()) -> IO ())
foreign import ccall "wrapper"
  ___6___wr___ :: (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong -> FunPtr (CInt -> IO ()) -> IO ()) ->
    IO (FunPtr (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong -> FunPtr (CInt -> IO ()) -> IO ()))
x_psetpdf _1 _2 _3 _4 = peek ___6___ >>= \s -> ___6___mk___ s _1 _2 _3 _4
v_psetpdf = peek ___6___ >>= (return . ___6___mk___)
s_psetpdf = \s -> ___6___wr___ s >>= poke ___6___
foreign import ccall "wrapper"
  w_psetpdf_1 :: (CDouble -> CDouble -> IO CDouble) -> IO (FunPtr (CDouble -> CDouble -> IO CDouble))
foreign import ccall "wrapper"
  w_psetpdf_2 :: (CInt -> IO ()) -> IO (FunPtr (CInt -> IO ()))

The function pointed to by setpdf takes two arguments-functions of different types, therefore two wrapper imports have been created.

Enumeration of wrappers created for arguments does not correspond to positional numbers of arguments.

[edit] 4.6 Functions

Functions are imported under their own names prefixed by "f_". Similarly to the situation described above (variables holding function pointers), additional wrappers will be created for arguments representing other functions. Examples of import:

double dplus(double,double);

imports as:

foreign import ccall "static header.h dplus"
  f_dplus :: CDouble -> CDouble -> IO CDouble

A function taking another function as argument:

int db_env_set_func_yield  (int (*)(void))  ;

imports as:

foreign import ccall "static header.h db_env_set_func_yield"
  f_db_env_set_func_yield :: FunPtr (IO CInt) -> IO CInt
foreign import ccall "wrapper"
  w_db_env_set_func_yield_1 :: (IO CInt) -> IO (FunPtr (IO CInt))

Rules for creation and naming of wrapper imports are the same as defined for variables containing function pointers.

[edit] 4.7 Structures and unions

[edit] 4.7.1 Basic information

C structures/unions are different from Haskell records with labeled fields: different structure types may contain members with same names. Therefore direct mapping of C structures to Haskell records is not possible.

Some structures are declared with struct statement:

struct foo {
  int bar;
  long baz;
};

Some structures are declared with typedef statement:

typedef struct foo {
  int bar;
  long baz;
} foo_t;

In some cases, structure name in the typedef statement may be omitted:

typedef struct {
  int bar;
  long baz;
} foo_t;

There also may be structures/unions declared within another structures/unions without explicit structure type name declaration:

typedef struct {
  int bar;
  long baz;
  struct {
    float faa;
    size_t taa;
  } str_anon;
} foo_t;

[edit] 4.7.2 Datatypes and combinators for structure members access

Structure types are imported using the newtype Haskell statement. Type name is prefixed with "S_" for structures and "U_" for unions. If a structure was declared as struct foo, Haskell type name will be S_foo. If a structure was declared anonymously (struct without identifier), a unique numeric identifier will be used as structure type name, e. g. S_65.

A imported structure type declaration looks like: newtype S_foo = S_foo (). Indeed, this type name serves only identification purpose.

In practice, Haskell application developers using hsffig do not need to memorize these clumsy type identifiers: Haskell type inference system carries most of the burden.

To access imported structure members, two algebraic data types are declared for each member identifier hsffig encounters in its input:

  • V_member_id: to get/set value of a structure member member_id
  • X_member_id: if a member holds function pointer, to invoke the function member_id points to

Unlike variables, there is no "S_" prefixed item.

A multi-parameter class with functional dependencies is declared to enable access to structures/unions members. Because of this, Haskell-98 extensions must be enabled in the compiler processing output of hsffig. The definition of this class is placed in the HSFFIG.FieldAccess module of the Haskell library coming with the package.

class FieldAccess a b c | a c -> b where
  (==>) :: Ptr a -> c -> b
  (-->) :: Ptr a -> c -> IO b
  (<--) :: (Ptr a, c) -> b -> IO ()
  (==>) _ _ = error " illegal context for ==>"
  (-->) _ _ = error " illegal context for -->"
  (<--) _ _ = error " illegal context for <--"

The functional dependencies a b c | a c -> b specify that type of b depends entirely on the types a and c, and there may be only one type of b for every possible combination of a and c.

Indeed, a is the type of an imported structure/union. The type c is the data type associated with a structure/union member identifier. The type b is type of the member as represented to a Haskell program. In a C structure/union cannot be more than one member with same name (and different types). That's what the functional dependencies mean in this case. This information is sufficient for the Haskell compiler to infer the type of a value retrieved from a structure/union member.

The multiparameter class mentioned above is not used by Haskell applications directly. All imported types of structures/unions are instances of this class.

So, for a simple structure (note that the structure type and one of its members have the same name):

struct abc {
  int abc;
};

the datatypes are:

newtype S_abc = S_abc () -- for the structure itself
data V_abc = V_abc deriving (Show) -- for the member `abc'
data X_abc = X_abc deriving (Show) -- not used

To get/set the member abc of the structure abc, the following instance is defined:

instance FieldAccess S_abc (CInt) V_abc where
  z --> V_abc = (#peek __quote__(struct abc), abc) z
  (z, V_abc) <-- v = (#poke __quote__(struct abc), abc) z v

So, for this instance, a = S_abc, b = CInt, and c = V_abc. Which is semantically equivalent to the above declaration of the member abc of the struct abc.

The --> combinator retrieves value of the given member from the structure pointed to by the given pointer. Thus, to access the value of abc, a Haskell application contains:

r <- z --> V_abc

Because of the multiparameter class and the instance declaration, the compiler is able to determine the type of the {{{z --> V_abc}}} expression correctly.

Assume another structure declaration:

struct def {
  void (*abc)(char *,int);
};

perfectly legal from the C standpoint. Not only do the two different structures have a member with the same name abc, but these members have different types: one is a scalar integer value; another is a function pointer.

The import for struct def is (its part of the interest):

newtype S_def = S_def ()
instance FieldAccess S_def ((Ptr (CChar) -> CInt -> IO ())) V_abc where
  z --> V_abc = (#peek __quote__(struct def), abc) z >>= (return . ___60___S_def___mk)
  (z, V_abc) <-- v = (___60___S_def___wr v) >>= (#poke __quote__(struct def), abc) z

For this instance, a = S_def, b = Ptr (CChar) -> CInt -> IO (), and c = V_abc i. e. the same as for abc in struct abc.

So, in

r <- z --> abc
t <- x --> abc

if x and z point to structures abc and def respectively, types of values bound to r and t will be different.

The <-- combinator is used for updating structure/union members with new values. This is a "real" update, not simulated by Haskell state mechanisms.

The code

(z,V_abc) <-- 4 -- fixme: is fromIntegral needed for a constant?

is equivalent to C code:

z -> abc = 4;

The ==> combinator is used to call functions pointed to by structre/union members. While --> may be applied only to "V_" prefixed identifiers, the ==> combinator may be applied only to "X_" prefixed identifiers.

The difference between "V_" and "X_" prefixed structure member identifiers is similar to that for imported variables holding function pointers:

  fclose <- dbp --> V_close
  ret <- fclose dbp 0

-- but

  ret <- (dbp ==> X_close) dbp 0

The following is complete set of import statements that hsffig generates for struct def:

instance FieldAccess S_def ((Ptr (CChar) -> CInt -> IO ())) V_abc where
  z --> V_abc = (#peek __quote__(struct def), abc) z >>= (return . ___62___S_def___mk)
  (z, V_abc) <-- v = (___62___S_def___wr v) >>= (#poke __quote__(struct def), abc) z
foreign import ccall "dynamic"
  ___62___S_def___mk :: (FunPtr (Ptr (CChar) -> CInt -> IO ())) -> ((Ptr (CChar) -> CInt -> IO ()))
foreign import ccall "wrapper"
  ___62___S_def___wr :: ((Ptr (CChar) -> CInt -> IO ())) -> IO (FunPtr (Ptr (CChar) -> CInt -> IO ())

instance FieldAccess S_def ((Ptr (CChar) -> CInt -> IO ())) X_abc where
  z ==> X_abc = \_1 _2 -> do
    x <- z --> V_abc
    r <- x _1 _2
    return r

So, z ==> X_abc> returns a function with type Ptr (CChar) -> CInt -> IO (). Because instance with V_abc is defined with only --> combinator, and instance with X_abc is defined with only ==> combinator, and instance with X_abc is declared only for members holding pointers to functions, attempt to use wrong combinator-member combination will result in compilation or runtime error.

Members of structures/unions which are structures/unions themselves are imported so that --> combinator returns a Ptr to the member, and <-- combinator is not allowed. Example (from bfd.h):

typedef struct lineno_cache_entry {
  unsigned int line_number;     /* Linenumber from start of function*/
  union {
    struct symbol_cache_entry *sym; /* Function name */
    unsigned long offset;       /* Offset into section */
  } u;
} alent;

imports as:

newtype S_lineno_cache_entry = S_lineno_cache_entry ()
newtype U_40 = U_40 ()

instance FieldAccess S_lineno_cache_entry (Ptr U_40) V_u where
  z --> V_u = return $ (#ptr __quote__(struct lineno_cache_entry), u) z
  (z, V_u) <-- v = error $ "field u is a structure: cannot be set"

instance FieldAccess S_lineno_cache_entry (CUInt) V_line_number where
  z --> V_line_number = (#peek __quote__(struct lineno_cache_entry), line_number) z
  (z, V_line_number) <-- v = (#poke __quote__(struct lineno_cache_entry), line_number) z v

instance FieldAccess S_lineno_cache_entry (CInt) V_sizeof where
  z --> V_sizeof = return $ (#size __quote__(struct lineno_cache_entry))

--

instance FieldAccess U_40 (CULong) V_offset where
  z --> V_offset = (#peek __quote__(union {struct symbol_cache_entry  *sym; unsigned long  offset; }), offset) z
  (z, V_offset) <-- v = (#poke __quote__(union {struct symbol_cache_entry  *sym; unsigned long  offset; }), offset) z v

instance FieldAccess U_40 (Ptr (S_symbol_cache_entry)) V_sym where
  z --> V_sym = (#peek __quote__(union {struct symbol_cache_entry  *sym; unsigned long  offset; }), sym) z
  (z, V_sym) <-- v = (#poke __quote__(union {struct symbol_cache_entry  *sym; unsigned long  offset; }), sym) z v

instance FieldAccess U_40 (CInt) V_sizeof where
  z --> V_sizeof = return $ (#size __quote__(union {struct symbol_cache_entry  *sym; unsigned long  offset; }))

So, to achieve the same as C code below does:

alent *pal;

/* ... */

off = pal -> u.offset

a Haskell program would contain:

pal <- -- whatever returns IO (Ptr S_lineno_cache_entry)

off <- (pal --> V_u) >>= \u -> (u --> V_offset)

[edit] 4.7.3 Bit fields

Bit fields (declared as int somefield:n) are supported by hsffig, but there are some additional things that need attention.

There is no such thing as offsetof for bit fields. To implement the --> and <-- combinators, special inline C functions are embedded in the hsc code hsffig generates. This is achieved with the #def macro that hsc2hs recognizes.

These functions are placed in a special C file. This file must be compiled and linked to the final executable. Consider the following C header (bf.h):

union bitfields {
  struct {
    int a:1;
    int b:1;
    int c:1;
    int d:1;
  } x;
  int y;
};

and the Haskell program (bftest.hs):

-- Test for bit fields.

module Main where

import BF_H

main = do
  putStrLn $ "Test of Bit Fields"

  alloca $ \(bfu :: Ptr U_bitfields) -> do
    x <- (bfu --> V_x)
    (x, V_a) <-- fromIntegral 1
    (x, V_b) <-- fromIntegral 0
    (x, V_c) <-- fromIntegral 0
    (x, V_d) <-- fromIntegral 1
    y <- (bfu --> V_y)
    putStrLn $ "y = " ++ (show y)

When hsffig processes bf.h it creates BF_H.hsc. When hsc2hs processes BF_H.hsc it creates BF_H.hs, and additionally BF_H_hsc.h and BF_H_hsc.c. The latter is recommended to compile with ghc:

ghc -c BF_H_hsc.c

which provides correct compilation environment, so necessary paths for the runtime header files will be passed to gcc properly.

After all, the executable may be compiled and linked:

ghc -ffi -fglasgow-exts -package HSFFIG --make bftest.hs BF_H_hsc.o -o bftest

If bftest is run, it prints "9", as one might expect (at least on the x86 architecture).

[edit] 4.7.4 Foreign structures/unions as storables and alloca

It is often necessary to pre-allocate a foreign (C) structure/union, so a foreign function can use it. The alloca and malloc functions are defined in Haskell FFI for this purpose. They require an argument belonging to the class Storable.

So, for every structure type imported by hsffig, iinstance of Storable is declared (for struct abc used in the example above):

instance Storable S_abc where
  sizeOf _ = (#size __quote__(struct abc))
  alignment _ = 1
  peek _ = error $ "peek and poke cannot be used with struct abc"
  poke _ = error $ "peek and poke cannot be used with struct abc"

These four methods are minimally required implementation of Storable per the FFI Addendum.

Actually, only sizeOf plays a role here, returning the same value as C expression sizeof (struct abc) would return.

An example below (retrieval of data from Berkeley DB) shows the use of everything discussed above:

  (ret,
   ks,
   vs) <- alloca $ \dbkey ->                                        -- dbkey = alloca (sizeof (DBT))
          alloca $ \dbdata ->                                       -- dbdata = alloca (sizeof (DBT))
          withCStringLen "fruit" $ \fruit -> do                     -- (address, length)
            (dbkey,V_data) <-- fst fruit                            -- dbkey -> data set to the string address
            (dbkey,V_size) <-- (fromIntegral $ snd fruit)           -- dbkey -> size set to the string length
            r <- (dbp ==> X_get) dbp nullPtr dbkey dbdata 0         -- r = dbp -> get (dbp, NULL, dbkey, dbdata, 0)
            ksc <- (dbkey --> V_data) :: IO CString                 -- ksc = dbkey -> data
            kss <- dbkey --> V_size                                 -- kss = dbkey -> size
            if (r == 0)
              then do dsc <- (dbdata --> V_data) :: IO CString      -- dsc = dbdata -> data
                      dss <- dbdata --> V_size                      -- dss = dbdata -> size
                      ks <- peekCStringLen (ksc, fromIntegral kss)
                      vs <- peekCStringLen (dsc, fromIntegral dss)
                      return (r, ks, vs)
              else return (r, undefined, undefined)

It may be noted again that a Haskell application developer does not need to remember type identifiers of foreign structures involved. The compiler, using FFI import statements and the multiparameter class' functional dependencies, correctly infers all necessary type information. In fact, type of dbkey and dbdata in the example above should be Ptr S_'_'_db_dbt (as defined in the header: typedef struct __db_dbt { ... } DBT;). But this type is not mentioned explicitly in the code.

[edit] 5 Compilation of applications using bindings created with HSFFIG

Applications using bindings created with hsffig must be compiled/linked with the option -package HSFFIG passed to ghc. In the example below, the application consists of a single module containing the main funstion: main.hs, and the C library header file is header.h. The library file name is library.a.

The main.hs file must contain an import statement:

  import HEADER_H

When hsffig processes header.h it creates HEADER_H.hsc. When hsc2hs processes HEADER_H.hsc it creates HEADER_H.hs, and additionally HEADER_H_hsc.h and HEADER_H_hsc.c. The latter is recommended to compile with ghc:

ghc -c HEADER_H_hsc.c

which provides correct compilation environment, so necessary paths for the runtime header files will be passed to gcc properly.

After all, the executable may be compiled and linked:

ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_H_hsc.o library.a -o main

The library of the package HSFFIG contains definition of the multiparameter class (HSFFIG.Field'Access.Field'Access) and the combinators necessary to access members of C structures/unions (see above). It is not necessary to directly import the hsffig modules in the application modules because the binding module re-exports them. It is however necessary to use the -fglasgow-exts options of ghc.

So, in the simplest case, the following steps are necessary:

gcc -E -dD header.h | hsffig > HEADER_H.hsc

hsc2hs -t /dev/null HEADER_H.hsc -o HEADER_H.hs

ghc -c HEADER_H_hsc.c

ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_HSC.o library.a -o main

In some cases, size of the autogenerated Haskell code of the binding module is too large so it cannot be compiled due to lack of system resources. In this case, the binding module has to be split (see above, Splitting Large Modules). Steps to compile and build an executable now are:

gcc -E -dD header.h | hsffig > HEADER_H.hsc

hsc2hs -t /dev/null HEADER_H.hsc -o HEADER_H.hs_unsplit

splitter HEADER_H.hs_unsplit

ghc -c HEADER_H_hsc.c

ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_HSC.o library.a -o main

The only one step added is running splitter over the Haskell source code generated by hsc2hs. The way the final executable is linked does not change.

[edit] 6 Known problems and workarounds

[edit] 6.1 Variadic functions

C functions whose argument list ends with an ellipsis (...) in other words taking variable number of arguments cannot be imported by hsffig. A common example is printf. If hsffig encounters such a function, it places a comment in its output explaining why the function could not be imported. However if a variable or a structure/union member is defined as a pointer to a variadic function is imported as a pointer to a function with type IO () i. e. void (*p)(void). This preserves the ability to query/set function pointer from a Haskell program.

So:

int (*pvard)(int,int,...);

int vard (int,int,...);

imports as:

foreign import ccall "header.h &pvard"
  ___58___ :: Ptr (FunPtr (IO ()))
p_pvard = ___58___
foreign import ccall "dynamic"
  ___58___mk___ :: FunPtr (IO ()) -> (IO ())
foreign import ccall "wrapper"
  ___58___wr___ :: (IO ()) -> IO (FunPtr (IO ()))
x_pvard  = peek ___58___ >>= \s -> ___58___mk___ s
v_pvard = peek ___58___ >>= (return . ___58___mk___)
s_pvard = \s -> ___58___wr___ s >>= poke ___58___

--

--
-- import of function/variable/structure member(s) vard :: WrongVariadicFunction is not possible
-- because of the following reason(s):
-- function is variadic

[edit] 6.2 Functions/variables of structure type

If a function or a variable has a structure (not structure pointer) type, hsffig cannot import it. If hsffig encounters such a function, it places a comment in its output explaining why the function could not be imported. However, if a structure/union member is itself a structure/union, it is imported so that the --> combinator returns a Ptr to that member. However the <-- combinator is not defined for such members.

So:

struct {int a,b,c,d;} *allocs(int,int,int,int);

struct {int a,b,c,d;} directs(int,int,int,int);

imports as:

foreign import ccall "static header.h allocs"
  f_allocs :: CInt -> CInt -> CInt -> CInt -> IO (Ptr (S_56))

--

--
-- import of function/variable/structure member(s) directs :: CInt -> CInt -> CInt -> CInt -> @S_58 is not possible
-- because of the following reason(s):
-- function takes/returns structure(s) directly
--

Note that while these two functions have return types looking the same, they are not the same from the C compiler standpoint, nor are they the same from the hsffig standpoint.

[edit] 6.3 Unresolved symbols When linking

As mentioned above, hsffig generates FFI import declarations for every function it finds a prototype for. It may rarely happen, the header file has prototypes for functions not defined by the library the header is related to. As an example, consider the following short program printing "Hello World" using the low-level write function of the standard C POSIX library:

module Main where

import UNISTD_H

main = withCStringLen "Hello World\n" $ \hello -> do
  f_write (fromIntegral 0) (fst hello) (fromIntegral $ snd hello)

The prototype for write is contained in the unistd.h include file usually located in /usr/include. However if the program above is compiled and linked the "usual" way, two symbols remain unresolved by the linker: __ftruncate, and pthread_atfork *. As a result, the linker fails to create the executable file.

The possible workarounds are:

  • To find the library where missing symbols are defined, and specify it on the ghc command line to link the executable.
  • To tell the linker to ignore the unresolved symbols and build the executable no matter what.
  • To use the -split-objs option that ghc provides (Note: See also HsffigLinkageOptimization for more detailed discussion of object file splitting possibilities).

The first option provides the best consistency, but it may be hard sometimes to find the correct library, especially for such low-level functions perhaps not assumed to be called directly.

The second option is a "short-cut" indeed: it leaves the problem open. If the developer is cofident that none of the unresolved symbols will be used for a whole life cycle of the apllication being developed, this may be acceptable, but this should not be considered an universal solution.

The third option is kind of a compromise: it leads to creation of consistent executable (and even the file size may be reduced significantly), but it also complicates the compliation/linkage process.

When given an option, -split-objs, ghc invokes a special script on the assembly files generated during compilation which cuts them into large number of small pieces. For this example program, UNISTD_H.hs was split into more than 700 object files.

These object files are placed in the special subdirectory whose name is derived from the module name: e. g. UNISTD_H_split for UNISTD_H. This directory is not autocreated: explicit mkdir command is necessary. The next step is to create a static library containing all those object files (ar -cq). The library must be specified later on the ghc command line to link the executable.

To tell more, this applies to the situation when splitter is not invoked, and there remains only one Haskell module with FFI declarations. If the module is split (this is "source split", not "object split, one not to be confused with another), then it makes sense to apply -split-objs only to the submodule containing FFI import declarations for functions (i. e. whose name is postfixed with "_F", UNISTD_H_F in this example): structures/unions, constants, and enumerations are compile-time items not dealt with by the linker.

The bottom line: uniform processing of object files with ghc --make is no longer available. Even worse, --make does not recognize libraries as compilation targets: if the module Main imports UNISTD_H, and there is UNISTD_H.hs, and libUNISTD_H.a, but no UNISTD_H.o, ghc --make will try to compile UNISTD_H to UNISTD_H.o and will ignore existence of the library.

The Makefile for this example program is shown below. It features the second and the third ways to work around the unresolved symbols problem.

GCC = gcc
GHC = ghc
HSFFIGDIR = /path/to/installed/hsffig-1.0
HSFFIG = $(HSFFIGDIR)/bin/hsffig

# The command line to invoke hsc2hs:

HSC2HS = hsc2hs -t /dev/null

all: syscall

UNISTD_H.hsc: /usr/include/unistd.h
        $(GCC) -E -dD /usr/include/unistd.h | $(HSFFIG) > UNISTD_H.hsc

UNISTD_H.hs: UNISTD_H.hsc
        $(HSC2HS) UNISTD_H.hsc -o UNISTD_H.hs

# The target to build the split-objects library:

libUNISTD_H.a: UNISTD_H.hs
        -mkdir UNISTD_H_split
        $(GHC) -c -split-objs -package HSFFIG UNISTD_H.hs
        ar cq libUNISTD_H.a UNISTD_H_split/*.o

syscall: syscall.hs libUNISTD_H.a UNISTD_H_hsc.o
        $(GHC) -c UNISTD_H_hsc.c

# The second workaround: ignore unresolved symbols. The linker option to do this may depend on the version
# of binutils used. In this example, it is --noinhibit-exec. Note how -optl and -Wl are combined.

        $(GHC) -ffi -package HSFFIG -fglasgow-exts --make syscall.hs UNISTD_H_hsc.o \
                -optl -Wl,--warn-once -optl -Wl,--noinhibit-exec -o syscall-1

# The third workaround: link against the library. The --make option of ghc is not available.

        $(GHC) -package HSFFIG -fglasgow-exts syscall.hs UNISTD_H_hsc.o libUNISTD_H.a -o syscall-2

clean:
        rm -rf syscall-1 syscall-2 libUNISTD_H.a UNISTD_H* UNISTD_H_split *.o *.hi

  • Note: This is of course OS-dependent situation: the situation described was observed on Linux with glibc 2.1.x. This example is provided here to illustrate how unresolved symbols problem may be worked around.

DimitryGolubovsky