[Haskell-beginners] Red-black tree performance

Lorenzo Bolla lbolla at gmail.com
Wed Mar 21 11:08:06 CET 2012


Ops, wrong copy and paste!
See correct script below:

import Data.Foldable (foldl')

data Color = Red | Black deriving (Show)

data Tree a = Empty | Node Color (Tree a) a (Tree a)
              deriving (Show)

insert :: Ord a => a -> Tree a -> Tree a
insert x t = makeBlack (ins t)
             where
               ins Empty = Node Red Empty x Empty
               ins (Node color a y b) | x < y  = ins a `seq` balance color
(ins a) y b
                                      | x == y = Node color a y b
                                      | x > y  = ins b `seq` balance color
a y (ins b)

makeBlack :: Tree a -> Tree a
makeBlack (Node _ a y b) = Node Black a y b
makeBlack Empty = Empty

balance :: Color -> Tree a -> a -> Tree a -> Tree a
balance Black (Node Red (Node Red a x b) y c) z d = Node Red (Node Black a
x b) y (Node Black c z d)
balance Black (Node Red a x (Node Red b y c)) z d = Node Red (Node Black a
x b) y (Node Black c z d)
balance Black a x (Node Red (Node Red b y c) z d) = Node Red (Node Black a
x b) y (Node Black c z d)
balance Black a x (Node Red b y (Node Red c z d)) = Node Red (Node Black a
x b) y (Node Black c z d)
balance color a x b = Node color a x b

maxTree :: Ord a => Tree a -> a
maxTree (Node _ Empty n Empty) = n
maxTree (Node _ _ _ t) = maxTree t

toInsert :: [Int]
--  toInsert = [1..1000000]
toInsert = map (`mod` 100) [1..10000000]

main :: IO ()
main = putStrLn $ show $ maxTree $ foldl' (flip insert) Empty toInsert


Sorry for the noise,
L.




On Wed, Mar 21, 2012 at 10:02 AM, Lorenzo Bolla <lbolla at gmail.com> wrote:

>
>
> On Wed, Mar 21, 2012 at 9:27 AM, Heinrich Apfelmus <
> apfelmus at quantentunnel.de> wrote:
>
>> Adrien Haxaire wrote:
>>
>>> I tried with foldl'. I modified the code at several places to match
>>> the argument pattern, and now I see why flip is so useful :) The
>>> conclusion is also interesting: the productivity climbs up to 92%,
>>> while the calculation time raises to 6.3s. I guess that the choice is
>>> space or time, as often.
>>>
>>
>> 92% productivity seems right for me. In contrast, 20% garbage collection
>> may be a sign that something went wrong.
>
> I think that this is likely due to laziness: in the very end, you only
>> query the rightmost element. After a while, the program simply won't
>> evaluate the balancing on the left side of the tree, as you're not asking
>> it to evaluate anything there.
>>
>>
>> So, you're not necessarily comparing apples and apples here. But on the
>> other hand, maybe that's a performance disadvantage of the C++ version. In
>> Haskell, performance depends a lot on usage patterns.
>>
>>
> This is very true.
> In fact, after some tweaking, I found that the best solution is using
> foldl', lazy type and force some strictness in "insert" using "seq". See
> below:
>
> import Data.Foldable (foldl', foldr')
>
> data Color = Red | Black deriving (Show)
>
> data Tree a = Empty | Node Color (Tree a) a (Tree a)
>               deriving (Show)
>
> insert :: Ord a => a -> Tree a -> Tree a
> insert x t = makeBlack (ins t)
>              where
>                ins Empty = Node Red Empty x Empty
>                --  ins (Node color a y b) | x < y  = ins a `seq` balance
> color (ins a) y b
>                --                         | x == y = Node color a y b
>                --                         | x > y  = ins b `seq` balance
> color a y (ins b)
>                ins (Node color a y b) | x < y  = balance color (ins a) y b
>                                       | x == y = Node color a y b
>                                       | x > y  = balance color a y (ins b)
>
> makeBlack :: Tree a -> Tree a
> makeBlack (Node _ a y b) = Node Black a y b
> makeBlack Empty = Empty
>
> balance :: Color -> Tree a -> a -> Tree a -> Tree a
> balance Black (Node Red (Node Red a x b) y c) z d = Node Red (Node Black a
> x b) y (Node Black c z d)
> balance Black (Node Red a x (Node Red b y c)) z d = Node Red (Node Black a
> x b) y (Node Black c z d)
> balance Black a x (Node Red (Node Red b y c) z d) = Node Red (Node Black a
> x b) y (Node Black c z d)
> balance Black a x (Node Red b y (Node Red c z d)) = Node Red (Node Black a
> x b) y (Node Black c z d)
> balance color a x b = Node color a x b
>
> maxTree :: Ord a => Tree a -> a
> maxTree (Node _ Empty n Empty) = n
> maxTree (Node _ _ _ t) = maxTree t
>
> toInsert :: [Int]
> --  toInsert = [1..1000000]
> toInsert = map (`mod` 100) [1..10000000]
>
> main :: IO ()
> main = putStrLn $ show $ maxTree $ foldl' (flip insert) Empty toInsert
>
>
> Note that if the improvement is around 10% for "toInsert" being a
> monotonic sequence of integers, the improvement is much bigger (>2x for me)
> for a more "random" "toInsert" sequence.
>
> L.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20120321/e3259421/attachment-0001.htm>


More information about the Beginners mailing list