[Haskell-cafe] ANNOUNCE: fast-tags-0.0.1

Christopher Done chrisdone at googlemail.com
Sun Apr 1 18:51:17 CEST 2012


On 1 April 2012 00:23, Evan Laforge <qdunkan at gmail.com> wrote:
> Two of them use haskell-src which means they can't parse my code.  Two
> more use haskell-src-exts, which is slow and fragile, breaks on
> partially edited source, and doesn't understand hsc.

For what it's worth:

* As you say below, HSC is easily dealth with by ignoring # lines.

* haskell-src-exts is not slow. It can parse a 769 module codebase racking up
  to 100k lines of code in just over a second on my machine. That's
  good. Also, I don't think speed of the individual file matters, for
  reasons I state below.

* Broken source is not a big issue to me. Code is written with a GHCi session
  on-hand; syntactic issues are the least of my worries. I realise it
  will be for others.

The problem with haskell-src-exts is that it refuses to parse expressions for
which it cannot reduce the operator precedence, meaning it can't parse any
module that uses a freshly defined operator.

The reason I don't think individual file performance matters is that
the output can be cached. There's also the fact that if I modify a
file, and generate tags, I'm likely editing that file presently, and
I'm not likely to need jumping around which tags provides.

> Then there's the venerable hasktags, but it's buggy and the source
> is a mess. I fixed a bug where it doesn't actually strip comments
> so it makes tags to things inside comments, but then decided it
> would be easier to just write my own.

Hasktags is hardly buggy in my experience. The comments bug is minor. But I
agree that the codebase is messy and would be better handled as
Text. But again, speed on the individual basis isn't a massive issue here.

> fast-tags is fast because it has a parser that's just smart enough to
> pick out the tags.  It can tagify my entire 300 module program in
> about a second.

Unfortunately there appears to be a horrific problem with it, as the
log below shows:

$ time (find . -name '*.hs' | xargs hasktags -e)

real	0m1.573s
user	0m1.536s
sys	0m0.032s
$ cabal install fast-tags --reinstall --ghc-options=-O2
Resolving dependencies...
Configuring fast-tags-0.0.2...
Preprocessing executables for fast-tags-0.0.2...
Building fast-tags-0.0.2...
[1 of 1] Compiling Main             ( src/Main.hs,
dist/build/fast-tags/fast-tags-tmp/Main.o )
Linking dist/build/fast-tags/fast-tags ...
Installing executable(s) in /home/chris/.cabal/bin
$ time (find . -name '*.hs' | xargs fast-tags)
^C
real	10m39.184s
user	0m0.016s
sys	0m0.016s
$

I cancelled the program after ten minutes. The CPU was at 100% and
memory usage was slowly climbing, but only slowly. It's not an
infinite loop, however. If I delete the "tags" file and restrict the
search to only the src directory, it completes earlier, but gets slower.

$ time (find src -name '*.hs' | xargs hasktags -e)

real	0m0.113s
user	0m0.112s
sys	0m0.008s
$ time (find src -name '*.hs' | xargs fast-tags)

real	0m0.136s
user	0m0.120s
sys	0m0.020s
$ time (find src -name '*.hs' | xargs fast-tags)

real	0m0.250s
user	0m0.244s
sys	0m0.012s

So there appears to be an exponential component to the program. E.g.

$ time (find . -name '*.hs' | xargs fast-tags)
./lib/text-0.11.1.5/tests/benchmarks/src/Data/Text/Benchmarks/Pure.hs:435:
unexpected end of block after data * =
./lib/split-0.1.2.3/Data/List/Split/Internals.hs:68: unexpected end of
block after data * =
./lib/QuickCheck-2.4.1.1/Test/QuickCheck/Function.hs:51: unexpected
end of block after data * =

real	0m26.993s
user	0m26.590s
sys	0m0.324s

If I try to run again it hangs again. I expect it's somewhere around
sort/merge/removeDups. This is on GHC 7.2.1.

> But it's also incremental, so it only needs to do that the first
> time.

For what it's worth to anybody using hasktags, I've added this to
hasktags: https://github.com/chrisdone/hasktags/commits/master

I save the file data as JSON. I tried using aeson but that's buggy:
https://github.com/bos/aeson/issues/75 At any rate, it should cache
the generated tags rather than the file data, but I'd have to
restructure the hasktags program a bit and I didn't feel like that
yet.

hasktags has no problem with this codebase:

$ time (find . -name '*.hs' | xargs hasktags --cache)

real	0m1.512s
user	0m1.420s
sys	0m0.088s

and with the cache generated, it's half the time:

$ time (find . -name '*.hs' | xargs hasktags --cache)

real	0m0.780s
user	0m0.712s
sys	0m0.072s

> I have vim's BufWrite autocommand bound to
> updating the tags every time a file is written, and it's fast enough
> that I've never noticed the delay.  It understands hsc directly
> (that's trivial, just ignore the # lines) so there's no need to run
> hsc2hs before tagifying.  The result is tags which are automatically
> up to date all the time, which is nice.

This is the use-case I (and the users who have notified me of it) have
with Emacs in haskell-mode.

> If people care about lhs and emacs tags then it wouldn't be hard to
> support those too, and at that point I could replace hasktags and we'd
> be back down to 5 again.  But I'm not even sure anyone uses hasktags,
> since surely someone would have noticed that comment bug.

I like the fast-tags codebase so it would be nice to start using it,
but I hope you can test it on either a more substantial codebase or
just a different codebase. Or just grab some packages from Hackage and
test. Emacs support would be nice, I might add it myself if you can
fix the performance explosion. Right now hasktags is OK for me. I won't be
hacking on it in the future for more features because…

While we're on the topic I think haskell-src-exts is worth investing
time in, as it has semantic knowledge about our code. I am trying to
work on it so that it can preserve comments and output them, so that
we can start using it to pretty print our code, refactor our code,
etc. It could also be patched to handle operators as Operators [Exp]
rather than OpApp x (OpApp y), etc. I think.



More information about the Haskell-Cafe mailing list