Ground Up

Hal Daume III hdaume@ISI.EDU
Thu, 28 Feb 2002 09:28:31 -0800 (PST)


On Thu, 28 Feb 2002, Jerzy Karczmarczuk wrote:

> I didn't follow that discussion, but let's be serious. Really.
> Your second version constructs and destroys plenty of tuples, of
> ephemeric data structures which live one step only, and this is
> obviously costly. "No way to know this?" Are you sure?

And yet there's no reason I shouldn't get update-in-place on those
tuples.  What's more, if I create my own data type for tuples which is
strict in both of its arguments and use a function which is strict in the
tuple (yes, even if i use proper tail recursion), it's still slower. 

> WHAT is a problem?
> I see only one: a Haskell user should master the essentials of 
> a reasonably professional functional style.
> 
> sumProdTo n = spt n 1 1 where
>  spt 1 s p = (s,p)
>  spt n s p = spt (n-1) (n+s) (n*p)

This is obviously the preferred solution, but it seems there should be no
performance difference between this and:

sumProdTo n = spt n (1,1) where
  spt 1 acc   = acc
  spt n (s,p) = spt (n-1) (n+s, n*p)

but there is a huge difference, even if i put seqs in to force evaluation
of the sum and product before creating the tuple.

but this is obviously a made-up situation.  (more down further)

> > ... it's just that i don't think it's
> > fair to say you don't have to understand what the compiler is doing to
> > write code.
> 
> This is not the question of this or that *compiler*, but of understanding
> the basics of data processing independent of the language. I am abhorred by
> the idea of putting down: "this looks like it would speed up your program..."
> in cases where it is rather clear that it might not. Please do the same 
> experience in C with dynamically allocated tuples.

so constructing and tearing apart tuples, you should say then, displays a
lack of understanding of the basics of data processing in haskell.  let's
look at the definition of the standard library function mapAccumL:

> mapAccumL               :: (a -> b -> (a, c)) -> a -> [b] -> (a, [c])
> mapAccumL f s []         = (s, [])
> mapAccumL f s (x:xs)     = (s'',y:ys)
>                          where (s', y ) = f s x
>                                (s'',ys) = mapAccumL f s' xs

this is clearly the same problem.  it's constantly creating and tearing
apart tuples.

or unfoldr:

> unfoldr f b              = case f b of Nothing    -> []
>                                        Just (a,b) -> a : unfoldr f b

the list goes on.

clearly creating small intermediate structures (like tuples) is very
central to haskell.  in fact, for speed reasons i have frequently written
my own versions of the above functions which remove the tuple creation
because it simply makes it too slow.  this is *not* at all what haskell is
about.  it's about writing functions which are small and modular and have
good reuse.  that's why this functions are in the standard libraries.

you can also observe the frequent use of functions which take a state and
return a (state,value) pair.  using functions like these pushes the
creation and destruction of tuples very far.

given the importance tuples and other small data structures have in
haskell, i found it hard to believe that using them would cause me to
suffer such a severe performance penalty.  i *assumed* that since they
were so widely used and so integral to the language, that the compilers
would do a much better job that they do with being intelligent about using
them.  i found this assumption to be incorrect, and that's the primary
reason i think you need to know more about what's going on in the compiler
to write fast programs.

 - Hal