real-time heap profiling?

Jan-Willem Maessen jmaessen@alum.mit.edu
Sun, 29 Dec 2002 15:23:35 -0500


John Meacham <john@repetae.net> writes:
> I have been playing with the heap profiling graphs from ghc and find
> them quite useful and was wondering if there was a tool to display them
> in real-time?

Oddly, I found myself looking at heap profiles in real time in the past
week.  It turns out to be pretty easy to generate a complete heap
profile for a running program using a slightly tricky unix command
line:

  head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
    | hp2ps > FOO.ps

This takes the FOO.hp file of your running program and turns it into a
heap profile FOO.ps which includes all the data from the running
program.  I create a little script which runs this program every few
seconds or so.  By running gv in "watch file" mode I can see the heap
profile of my entire program to date.

It's not hard to cook up similar scripts (for example, to display a
sliding window of the last n profiling samples) if you load FOO.hp
into your favorite editor and take a look at the format.  For example,
here's parts of the file phc.hp, generated by running the pH/EH
compiler on a problematic program:

JOB "phc -hC"
DATE "Thu Dec 26 18:17 2002"
SAMPLE_UNIT "seconds"
VALUE_UNIT "bytes"
BEGIN_SAMPLE 0.00
END_SAMPLE 0.00
BEGIN_SAMPLE 15.07
  ... sample data ...
END_SAMPLE 15.07
BEGIN_SAMPLE 30.23
  ... sample data ...
END_SAMPLE 30.23
... etc.
BEGIN_SAMPLE 11695.47
END_SAMPLE 11695.47

By gluing the header (JOB, DATE, SAMPLE_UNIT, VALUE_UNIT) onto a bunch
of samples extracted from the .hp file, you can run hp2ps on arbitrary
sample data.  Indeed, I use an editor to select and extract
interesting chunks of profile data and look at those chunks in greater
detail.

By using fgrep -n we find the line numbers of the end of each complete
sample:
% fgrep -n END_SAMPLE phc.hp
... lots of output ...
1004960:END_SAMPLE 11621.34
1006610:END_SAMPLE 11636.61
1008257:END_SAMPLE 11651.85
1009926:END_SAMPLE 11667.08
1011524:END_SAMPLE 11682.33
1011526:END_SAMPLE 11695.47

We select the line number of the last complete sample (the end of a
.hp file will contain partially-written sample data while the program
is running, causing hp2ps to choke):

% fgrep -n END_SAMPLE phc.hp | tail -1 | cut -d : -f 1
1011526

This tells us that the first 1011526 lines of phc.hp will form a
properly-formatted heap profile, which we can feed to hp2ps:
  head -1011526 phc.hp | hp2ps > phc.ps

The "watch file" mode of gv is a godsend for this sort of application.
It just wouldn't have been worth the effort to me to cook up a gui for
the sole purpose of displaying an up-to-date profile.

I suspect, by the way, that if .hp files were xml-formatted I would
have had a far less pleasant time of things.  There's such a thing as
TOO much structure.  A lesson for us tool-builders, perhaps?

-Jan-Willem Maessen