powerset

Graham Klyne GK@ninebynine.org
Wed, 04 Jun 2003 21:04:35 +0100


At 14:00 04/06/03 +0100, Keith Wansbrough wrote:
>Looks fine, but you're right - the use of "combinations" is
>inefficient.  It's better to iterate over the elements, rather than
>the length.  Then as you add each element, you consider all the sets
>you have so far, and either add the new element to each or not.  This
>doubles the number of sets.  Hence:
>
>powerset :: [a] -> [[a]]
>powerset [] = [[]]
>powerset (x:xs) = xss ++ map (x:) xss
>                 where xss = powerset xs

Neat!

It happens that for the application I have in mind, it is important to 
generate shorter subsets first, as the required results are almost always 
going to come from smaller subsets of a possibly large base set, and I'm 
aiming to exploit lazy evaluation to keep things tractable.

Looking at your code, I'm wondering if it would be possible to use some 
alternate form of composition instead of ++, and some auxiliary functions 
to pull the results out in the short-to-long sequence.

I'm thinking in terms of building a list of trees..

[[
data NTree a = NTree { nthead::a, ntbody::[NTree a] }

instance Functor NTree where
     fmap f (NTree h ts) = NTree (f h) (map (fmap f) ts)

powerset1 :: [a] -> [NTree [a]]
powerset1 (x:xs) = (NTree [x] (map (fmap (x:)) xss)) : xss
                   where xss = powerset1 xs
powerset1 [] = []

listPowerset :: [NTree [a]] -> [[a]]
listPowerset [] = []
listPowerset ts = (map nthead ts) ++
     listPowerset bodylist
     where
         bodylist = concat $ filter (not . null) $ map ntbody ts

testN1 = listPowerset $ powerset1 [1,2,3,4]
testN2 = listPowerset $ powerset1 "abcdefgh"
]]

The list/tree structure looks something like this:

    [1]
    [2] [2,1]
    [3] [3,1]
        [3,2] [3,2,1]
    [4] [4,1]
        [4,2] [4,2,1]
        [4,3] [4,3,1]
              [4,3,2] [4,3,2,1]

etc.

The list function picks off the members by columns (w.r.t. to above diag)

>Notice how important it is to include the empty set in the set of
>subsets - it won't work at all if you omit it.

Yes, I noticed something similar in my original version.

I've chosen not include the empty subset in my results, but that's easily 
adjusted.

>This formulation is particularly nice because in memory, you *share*
>all of the lists from the previous iteration, rather than making
>copies.

I *think* my revised formulation achieves this.

[...]

>My solution isn't perfect, though - the use of append (++) is
>inefficient; if this could be avoided, it would be faster.

I didn't see any easy way to avoid (++) in my list readout, but I think I 
can claim that the length of the leading list if never more than O(log N) 
the tree size.

If I'm evaluating and using the list lazily, using a typical recursive 
traversal pattern (like the powerset function itself), is there any cause 
for the "++" to actually be evaluated?  I suspect not, but can't be sure.

#g


-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E