profiling and strictness (was: Re: [Haskell-cafe] An interesting paper from Google)

Fri Oct 22 17:18:04 EDT 2010

> Any time you see something "inexplicable" like lots of time being attributed
> to something simple like "get", it means that something isn't strict enough
> and "get" is having to force a bunch of lazy evaluations to do its job.
> Since you're using State.Strict but lift-ing to get there, I'd first look at
> the strictness of the monad you're lift-ing from.  (I'm assuming
> State.Strict does what the label says, but it's possible that it's not
> strict in the way you need; strictness is kinda tricky.)

Ahh, thanks, this makes sense to me.  The monads involved are an
ErrorT and one other strict StateT.  When the monads say they are
strict, they seem to mean that >>= will force the evaluation of the
monadic mechanics of the first argument when the second is demanded.
By "monadic mechanics" I mean the work done by >>= itself, so for
StateT it binds the result of the first argument with either 'let' or
'~(a, s') <-' instead of 'case' or '(a, s') <-'.  So in my
understanding, a lazy State will build thunks on every >>= and they
won't ever be forced until you pull the final value and pass it to a
strict function in IO like print or an FFI call, and then they are
forced recursively, at which point there is a stack overflow unless
it's a short sequence.

Talking about strictness seems complicated to me because it's not
really about what is actually forced, it's about who is forced by who
(i.e., when 'a' is forced then 'b' will be) and whether it's forced in
sequence or nested.  So a strict State is also not forced until the
strict IO function, but it turns into a sequential series of thunks
rather than a nested one.

If this is accurate, why would anyone want to use the lazy State?

Anyway, State doesn't provide the really critical bit of strictness,
which is in the actual state being modified.  So 'State.modify' will
inevitably lead to stack overflow if you don't intersperse 'get's in
there.  What is needed is both $! on State.put (or define your own
strict modify that puts $! on put), and strictness annotation of the
fields of the record that is being modified (provided it's a record
and not a scalar type).

In any case, ErrorT has a strict >>= by necessity, both States have a
strict >>=, and all modifys and puts (I think) are strict, as are all
the record fields.  In any case, a simple 'get' should only have to
force the state to whnf, so it shouldn't matter if the fields are huge
thunks or not, so really all that should matter is if the state is
already in whnf or is a huge nest of thunks.  Which a strict >>=
should prevent, right?  This is right next to a 'lookup' function
which is called twice as many times and should be doing more work
since it actually looks in a Map, but is credited with less cpu and
alloc.

A lift . get should turn into

-- ErrorT
a <- do
  a <- StateT (\s -> return (s, s)) -- 'get' of the outer StateT
  return (a, s) -- >>= belongs to inner StateT
return (Right a)

Anyway, it's sort of a mishmash of semi inlined functions since this
is especially nested and hard to understand with transformers, but it
looks like a lot of applications of (\a -> (a, s)) type functions, so
(,) constructors and a 'Right' constructor, inside of 'case's to make
sure this are evaluations and not allocations (and I remember from the
STG paper that a constructor inside a case need not even be
constructed).  So I don't totally understand where the forcing is
happening to make credit the 'get' with so much work.

BTW, I have a theory about what the '0' might mean... perhaps if a
function is inlined away, it still appears in the profile list, but
with 0 entries.

> Moral of the story:  time is accounted to the function that forces
> evaluation of lazy thunks, not to the thunks themselves or the function that
> created the lazy thunks.  (I think the latter is impossible without passing
> around a lot of expensive baggage, and in any case doesn't tell you anything
> useful; unexpected functions taking a lot of time, on the other hand, tells
> you right away that there's excessive laziness in the invocation somewhere
> and gives you a starting point to track it down.)

Indeed, this is a good point that I hadn't realized in quite that way
before, but tracking it down is still the tricky part.