# laziness in `length'

Denys Rtveliashvili rtvd at mac.com
Tue Jun 15 10:52:04 EDT 2010

```Hi Daniel,

Thank you very much for the explanation of this issue.

While I understand the parts about rewrite rules and the big thunk, it
is still not clear why it is the way it is.

Please could you explain which Nums are not strict? The ones I am aware

Also, why doesn't it require building the full thunk for non-strict
Nums? Even if they are not strict, an addition requires both parts to be
evaluated. This means the thunk will have to be pre-built, doesn't it?

With kind regards,
Denys

> On Monday 14 June 2010 16:25:06, Serge D. Mechveliani wrote:
> > Dear people and GHC team,
> >
> > I have a naive question about the compiler and library of  ghc-6.12.3.
> > Consider the program
> >
> >   import List (genericLength)
> >   main = putStr \$ shows (genericLength [1 .. n]) "\n"
> >          where
> >          n = -- 10^6, 10^7, 10^8 ...
> >
> > (1) When it is compiled under  -O,  it runs in a small constant space
> >     in  n  and in a time approximately proportional to  n.
> > (2) When it is compiled without -O,  it takes at the run-time the
> >     stack proportional to  n,  and it takes enormousely large time
> >     for  n >= 10^7.
> > (3) In the interpreter mode  ghci,   `genericLength [1 .. n]'
> >     takes as much resource as (2).
> >
> > Are the points (2) and (3) natural for an Haskell implementation?
> >
> > Independently on whether  lng  is inlined or not, its lazy evaluation
> > is, probably, like this:
> >  lng [1 .. n] =
> >  lng (1 : (list 2 n)) =  1 + (lng \$ list 2 n) =
> >  1 + (lng (2: (list 3 n))) = 1 + 1 + (lng \$ list 3 n) =
> >  2 + (lng (3: (list 4 n)))   -- because this "+" is of Integer
> >  = 2 + 1 + (lng \$ list 4 n) =
> >  3 + (lng \$ list 4 n)
> >  ...
> > And this takes a small constant space.
>
> Unfortunately, it would be
>
> lng [1 .. n]
> ~> 1 + (lng [2 .. n])
> ~> 1 + (1 + (lng [3 .. n]))
> ~> 1 + (1 + (1 + (lng [4 .. n])))
> ~>
>
> and that builds a thunk of size O(n).
>
> The thing is, genericLength is written so that for lazy number types, the
> construction of the result can begin before the entire list has been
> traversed. This means however, that for strict number types, like Int or
> Integer, it is woefully inefficient.
>
> In the code above, the result type of generic length (and the type of list
> elements) is defaulted to Integer.
> When you compile with optimisations, a rewrite-rule fires:
>
> -- | The 'genericLength' function is an overloaded version of 'length'.  In
> -- particular, instead of returning an 'Int', it returns any type which is
> -- an instance of 'Num'.  It is, however, less efficient than 'length'.
> genericLength           :: (Num i) => [b] -> i
> genericLength []        =  0
> genericLength (_:l)     =  1 + genericLength l
>
> {-# RULES
>   "genericLengthInt"     genericLength = (strictGenericLength :: [a] ->
> Int);
>   "genericLengthInteger" genericLength = (strictGenericLength :: [a] ->
> Integer);
>  #-}
>
> strictGenericLength     :: (Num i) => [b] -> i
> strictGenericLength l   =  gl l 0
>               where
>                 gl [] a     = a
>                 gl (_:xs) a = let a' = a + 1 in a' `seq` gl xs a'
>
> which gives a reasonabley efficient constant space calculation.
>
> Without optimisations and in ghci, you get the generic code, which is slow
> and thakes O(n) space.
>
> >
> > -----------------
> > Serge Mechveliani
> > mechvel at botik.ru
>
> _______________________________________________