<div dir="ltr">This is a bit advanced for the beginners list. You would probably have better luck on stackoverflow.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, May 11, 2014 at 7:24 AM, Jan Snajder <span dir="ltr"><<a href="mailto:jan.snajder@fer.hr" target="_blank">jan.snajder@fer.hr</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear all,<br>
<br>
I'm trying to implement a simple file-based database. I apparently have<br>
a space leak, but I have no clue where it comes from.<br>
<br>
Here's the file-based database implementation:<br>
<a href="http://pastebin.com/QqiqcXFw" target="_blank">http://pastebin.com/QqiqcXFw</a><br>
<br>
The idea to have a database table in a single textual file. One line<br>
equals one table row. The fields within a row are whitespace separated.<br>
The first field is the key. Because I'd like to work with large files, I<br>
don't want to load the whole file into memory. Instead, I'd like to be<br>
able to fetch the rows on demand, by keys. Thus I first create an index<br>
that links keys to file seeks. I use the readerT to add the index to the<br>
IO monad.<br>
<br>
For testing, I use a dummy table produced as follows:<br>
<br>
import System.IO<br>
import Text.Printf<br>
import Control.Monad<br>
<br>
row = unwords [printf "field%03d" (i::Int) | i <- [1..999]]<br>
<br>
main = do<br>
forM_ [1..250000] $ \i -><br>
putStrLn $ printf "row%06d %s" (i::Int) row<br>
<br>
This generates a 2.1G textual file, which I store on my disk.<br>
<br>
The testing code:<br>
<br>
import FileDB<br>
import qualified Data.Text as T<br>
import Text.Printf<br>
import Control.Applicative<br>
import Control.Monad<br>
import Control.Monad.Trans<br>
import System.IO<br>
import System.Environment<br>
<br>
main = do<br>
(f:_) <- getArgs<br>
t <- openTable f<br>
runDB t $ do<br>
ks <- getKeys<br>
liftIO $ do<br>
putStrLn . printf "%d keys read" $ length ks<br>
putStrLn "Press any key to continue..."<br>
getChar<br>
forM_ ks $ \k -> do<br>
Just r <- getRow k<br>
liftIO . putStrLn $ printf "Row \"%s\" has %d fields"<br>
(T.unpack k) (length r)<br>
<br>
When I run the test on the 2.1GB file, the whole program consumes 10GB.<br>
<br>
6GB seem to be allocated after the index is built (just before entering<br>
the forM_ function). The remaining 4GB are allocated while fetching all<br>
the rows.<br>
<br>
I find both things difficult to explain.<br>
<br>
6GB seems too much for the index. Each key is 9 characters (stored as<br>
Data.Text), and I have 250K such keys in a Data.Map. Should this really<br>
add up to 6GB?<br>
<br>
Also, I have no idea why fetching all the rows, one by one, should<br>
consume any additional memory. Each row is fetched and its length is<br>
computed and printed out. I see no reason for the rows to be retained in<br>
the memory.<br>
<br>
Here's the memory allocation summary:<br>
<br>
> 1,093,931,338,632 bytes allocated in the heap<br>
> 2,225,144,704 bytes copied during GC<br>
> 4,533,898,000 bytes maximum residency (26 sample(s))<br>
> 3,080,926,336 bytes maximum slop<br>
> 10004 MB total memory in use (0 MB lost due to fragmentation)<br>
><br>
> Tot time (elapsed) Avg pause Max<br>
pause<br>
> Gen 0 2171739 colls, 0 par 45.29s 45.26s 0.0000s<br>
0.0030s<br>
> Gen 1 26 colls, 0 par 1.50s 1.53s 0.0589s<br>
0.7087s<br>
><br>
> INIT time 0.00s ( 0.00s elapsed)<br>
> MUT time 279.92s (284.85s elapsed)<br>
> GC time 46.80s ( 46.79s elapsed)<br>
> EXIT time 0.68s ( 0.71s elapsed)<br>
> Total time 327.40s (332.35s elapsed)<br>
><br>
> %GC time 14.3% (14.1% elapsed)<br>
><br>
> Alloc rate 3,908,073,170 bytes per MUT second<br>
><br>
> Productivity 85.7% of total user, 84.4% of total elapsed<br>
<br>
<br>
Btw., I don't get the "bytes allocated in the heap" figure, which is<br>
approx. 1000 GB (?).<br>
<br>
I'm obviously doing something wrong here. I'd be thankful for any help.<br>
<br>
Best,<br>
Jan<br>
_______________________________________________<br>
Beginners mailing list<br>
<a href="mailto:Beginners@haskell.org">Beginners@haskell.org</a><br>
<a href="http://www.haskell.org/mailman/listinfo/beginners" target="_blank">http://www.haskell.org/mailman/listinfo/beginners</a><br>
</blockquote></div><br></div>