[Haskell] ANN: HDBC (Haskell Database Connectivity)

Wed Jan 4 02:55:51 EST 2006

2006/1/4, John Goerzen <jgoerzen at complete.org>:
> The final thing that prompted me to do this was that the PostgreSQL --
> and possibly the Sqlite -- module for HSQL was segfaulting.  I spent
> quite a bit of time with gdb and the HSQL code, and even with Simon
> Marlow's assistance, was unable to track down the precise cause.  To
> make matters worse, the problem was intermittent.  The Haskell program
> in question was pure Haskell, and switching it to HDBC solved this
> issue.

Can you send me a short code that causes the segfaulting?

> I also had extremely high memory usage when dealing with large result
> sets -- somewhere on the order of 700MB; the same consumes about 12MB
> with HDBC.  My guess from looking briefly at the code is that the entire
> result set is being read into memory up front.

I can't understand this. The result set isn't read in memory unless
you want do to it. If you are using collectRows then yes, you will end
with the entire result set read in memory, but you can use fetch to
read the set row by row as well. The forEachRow function is also
helpfull in this case.

> There were a number of other problems as well:
>
>  * No support for prepared queries or for supplying
>   replacable parameters.  (Supported everywhere in HDBC, which removes
>   the need to have escaping.)  That's really my #1 complaint
>   (well, aside from the segfaulting <g>).
>
>  * Escaping function was global, rather than per-DB, which caused some
>   trouble with Sqlite3 at least.  (See SF bug 1324873 that I submitted
>   on Oct. 12 with no replies since then)

This is true.

>  * No way to retrieve result data by column index instead of column name

This is already available in CVS.

>  * HSQL provided no way to see the result set as a lazy list, and
>   the public API provided no way to implement that.  (There is 'fetch',
>   but it seems that the entire result set was read into memory in
>   advance anyway.)

This is intentional. The trouble is that you can close the database
connection before to evaluate the entire lazy list. This is different
from the lazy file reading where you have only file handle. In any
case the lazy reading can be implemented on top of the existing API.

>  * The code wasn't very easy to understand.  (This may be just me
>   though.)
>
>  * Unclear semantics in multithreaded programs.

Each call to any Connection and/or Statement is guarded from one MVar
so I think it is thread safe.

>  * No testsuite.

Ok.

> I knew I couldn't fix it the right way in HSQL (since I had trouble
> following the code), and it also seemed like these weren't high-priority
> issues for you.  (No blame here; it's the same way for me with the code
> I maintain.  I can't expect you to fix my bugs in something that's
> free.)
>
> In hindsight, I should have contacted you first, and I apologize for not
> doing that.  I just sorta sat down to design a DB API that I'd like, and
> pretty soon had a working prototype, and then some drivers...  I'm
> dangerous when I'm on vacation ;-)
>
> I'm not quite sure where to go from here.  Both packages have features
> that the other lack.  I don't think that it's possible to merge all the
> HDBC features into HSQL without a major API and architecture
> refactoring.  The HSQL features that HDBC lacks are mostly in progress
> already, and I've tried to design the HDBC API with them in mind.
>
> So, I'd invite you to take a look at the HDBC API at
>
>  http://darcs.complete.org/hdbc/doc/Database-HDBC.html

I saw the API and some of the code. It is quite similar to HSQL in
some cases but also differs in other. I saw that you are always
fetching the entire row in list of SqlValue. I decided to provide
function to get the values one by one because usually I preffer to
represent each row as application specific data type. In your case you
will build an intermediate list which I have to transform to specific
data type. In the same time it makes sense to have SqlValue type in
HSQL too. Then I can provide SqlBind instance for SqlValue. In this
way the user can fetch the values without the need to know the value
types in advance.

Cheers,
   Krasimir