[Haskell-cafe] relational data representation in memory using haskell?

Thu May 22 03:37:02 EDT 2008

On Thu, May 22, 2008 at 08:16:54AM +0200, Salvatore Insalaco wrote:
> 2008/5/22 Marc Weber <marco-oweber at gmx.de>:
> > I'd like to illustrate two different ideas using a small example:
> > (A)
> >        data CD = CD { title :: String, tracks :: [ Track ] }
> >        data Track = Track { track :: String, cd :: CD }
> >        data PDB = PDB { cds :: Set CD, tracks :: Set Track }
> >
> > because it's not using foreign ids but kind of pointers I'll call this
> > the pointer method
> 
> This doesn't look like a relational structure at all in Haskell.
> Let's take the CD and Track relations. In a relational database you
> have something like:
> CD (1, 'Querying about you')
> Track (1, 'Inserting some love', 1)
> Track (2, 'Updating my feelings', 1)
> Track (3, 'Deleting my hopes', 1)
> 
> In an imperative language you can do something similar in memory using
> objects (you can in haskell to with IORefs and so on, but let's stay
> on "data"). You get something like:
> 
> 0x000 CD('Querying about you')
> 0x004 Track('Inserting some love, 0x004)
You are right. But ghc does represent those things as pointers.. :)
So it is indeed unless you ask ghc to not use (eg by using strict fields
etc), correct? IORefs are not that good because you can't read them
within STM. But you are right: Using IORefs was my first idea.

> similar to a CD, you are storing a *value* (low level you probably
> have a pointer, but you have not pointer semantics). As you noticed,
> you cannot "update" the CD title without changing each Track. That's a
> way to store information, and a good way too, but it's not a
> relational structure by any extent.
I agree it's not in general and higly implementation dependant
Maybe I'm totally wrong. I imagine ghc having some internal
representation of data types which come close to 

struct CD {
  Track * tracks;
  title ** char;
} cd;

struct Track {
  cd * CD;
  title ** char;
} track;

(does'nt compile but you get the idea. My C knowldge is good enough for
reading only).

So if you start seeing the whole database as list of connected structs
vio pointers adding / deleting/ inserting is quite the same as adding
some nodes to a Data.Map. You replace the nodes you have to replace and
finally get a poiter pointing to te new database state. As long as you
don't loose the original "pointer" you can easily rollpack.
Consider 

let x = Cd ...
forkIO $ ( do something with x } -- (1)
print x -- (2) 

How can ghc know when running line (2) that (1) hasen't changed the
record? I see two solutions:
a) give the forked process a copy (Then my design will collapse)
   but this is expensive to copy data without knowing you ned to
b) use pointers and replace x ony on updating. Thus if (1) changes the
   title a new struct wil be created poiting to the old list but a new
   title String. line (2) doesn't have to care at all.
I'm guessing that analyzing a) when values have to be copied is
complicated - thus b) is implemented ? quicksilver has told me so as
well - But I'm not sure that's why I'm asking.

> If you want to use "normal" Haskell ADT, are you sure that a
> relational storage is what you want? Keeping that in memory doesn't
> give you some advantages of relational databases (e.g. uniform
> representation), and the impedance between the functional and the
> relational world is not easy to manage.
> 
> Maybe I misunderstood what you are trying to accomplish, and you only
> want to do a generic data structure with fast lookups on the content
> of the items?
Exactly.. Of course I only want a generic data structure with fast
lookups and content of items.. But I don't want to write the insert
delete and update functions for each table again and again..
Does this already exist?

Marc Weber