<div>I'm in the process of adding a Data.Sequence.Unboxed to unboxed-containers. I hope to have it in hackage today or tomorrow, should its performance work out as well as Data.Set.Unboxed.</div>
<div> </div>
<div>Be warned the API will likely be shifting on you for a little while, while I figure out a better way to manage all of the instances.</div>
<div> </div>
<div>-Edward Kmett<br></div>
<div class="gmail_quote">On Tue, Apr 7, 2009 at 7:49 AM, Manlio Perillo <span dir="ltr"><<a href="mailto:manlio_perillo@libero.it">manlio_perillo@libero.it</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">Hi.<br><br>I'm still working on my Netflix Prize project.<br><br>For a function I wrote, I really need a data structure that is both space efficient (unboxed elements) and with an efficient snoc operation.<br>
<br>I have pasted a self contained module with the definition of the function I'm using:<br><a href="http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453" target="_blank">http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453</a><br>
<br><br>The movie ratings are loaded from serialized data, and the result is serialized again, using the binary package:<br><br>transcodeIO :: IO ()<br>transcodeIO = do<br> input <- L.hGetContents stdin<br> let output = encodeZ $ transcode $ decodeZ input<br>
L.hPut stdout output<br><br>(here encodeZ and decodeZ are wrappers around Data.Binary.encode and Data.Binary.decode, with support to gzip compression/decompression)<br><br><br>This function (transcodeIO, not transcode) takes, on my<br>
Core2 CPU T7200 @ 2.00GHz:<br><br>real 30m8.794s<br>user 29m30.659s<br>sys 0m10.313s<br><br>1068 Mb total memory in use<br><br><br>The problem here is with snocU, that requires a lot of copying.<br><br>I rewrote the transcode function so that the input data set is split in N parts:<br>
<a href="http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453#a3456" target="_blank">http://hpaste.org/fastcgi/hpaste.fcgi/view?id=3453#a3456</a><br><br>The mapReduce function is the one defined in the Real World Haskell.<br>
<br><br>The new function takes (using only one thread):<br><br>real 18m48.039s<br>user 18m30.901s<br>sys 0m6.520s<br><br>1351 Mb total memory in use<br><br><br>The additional required memory is probably caused by unionsWith, that is not strict.<br>
The function takes less time, since array copying is optimized.<br>I still use snocU, but on small arrays.<br><br>GC time is very high: 54.4%<br><br><br>Unfortunately I can not test with more then one thread, since I get segmentation faults (probably a bug with uvector packages).<br>
<br>I also got two strange errors (but this may be just the result of the segmentation fault, I'm not able to reproduce them):<br><br>tpp.c:63: __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio >= __sched_fifo_min_prio && new_prio <= __sched_fifo_max_prio)' failed.<br>
<br><br>internal error: removeThreadFromQueue: not found<br> (GHC version 6.8.2 for i386_unknown_linux)<br> Please report this as a GHC bug: <a href="http://www.haskell.org/ghc/reportabug" target="_blank">http://www.haskell.org/ghc/reportabug</a><br>
<br><br><br>Now the question is: what data structure should I use to optimize the transcode function?<br><br>IMHO there are two solutions:<br><br>1) Use a "lazy" array.<br> Something like ByteString.Lazy, and what is available in<br>
storablevector package.<br><br> Using this data structure, I can avoid the use of appendU.<br><br>2) Use an "unboxed" list.<br><br> Something like:<br> <a href="http://mdounin.ru/hg/nginx-vendor-current/file/tip/src/core/ngx_list.h" target="_blank">http://mdounin.ru/hg/nginx-vendor-current/file/tip/src/core/ngx_list.h</a><br>
<br> That is: a linked list of unboxed arrays, but unlike the lazy array<br> solution, a snoc operation avoid copying if there is space in the<br> current array.<br><br> I don't know if this is easy/efficient to implement in Haskell.<br>
<br><br>Any other suggestions?<br><br><br>Thanks Manlio Perillo<br>_______________________________________________<br>Haskell-Cafe mailing list<br><a href="mailto:Haskell-Cafe@haskell.org" target="_blank">Haskell-Cafe@haskell.org</a><br>
<a href="http://www.haskell.org/mailman/listinfo/haskell-cafe" target="_blank">http://www.haskell.org/mailman/listinfo/haskell-cafe</a><br></blockquote></div><br>