Proposal #3271: new methods for Data.Sequence

Louis Wasserman wasserman.louis at gmail.com
Thu Jul 16 19:48:33 EDT 2009


In response to Ross's useful suggestions (which I had not actually noticed
until recently), I have made considerable changes to my Data.Sequence patch
here<http://hackage.haskell.org/trac/ghc/attachment/ticket/3271/new-methods-for-data_sequence.3.dpatch>
.

Here are the relevant points:

   - I completely rewrote the sorting method.  The sort is unstable, but it
   is 40 lines (much more maintainable than the several-hundred line
   implementation from earlier) and runs *twice as fast as* (fromList .
   Data.List.sort . toList) for n=50000.  (For sorted lists, it clocks in at
   about 4x faster.)
   - tails and inits are considerably more sophisticated, running to about
   50 lines apiece (although some of that code is shared between them), but
   - This implementation is genuinely asymptotically faster than the
      previous implementation: evaluating any tail from the sequence
returned by
      tails takes O(log (min (i, n-i))), as opposed to O(n) in scanr (<|) empty
      xs, but evaluating every tail from the sequence takes O(n) overall, as
      opposed to O(n log n) in fromList [drop n xs | n <- [0..length xs]].
      - Without direct access to the internals of the sequence it is
      impossible to match the asymptotic performance of this implementation.
      - This implementation is also a hair faster in practice.
   - partition is now in fact implemented via a simple foldl', which is
   actually faster than my old, several-dozen-line implementation as well as
   obviously simpler.
   - filter has been added to the list of methods available in
   Data.Sequence.
   - iterate has been renamed to iterateN to reinforce the different type,
   which requires a finite bound on the sequence length.
   - On the back end, replicate, iterateN, and sortBy do not use fromList,
   but instead use a common framework that wraps construction of a sequence in
   an Applicative functor.  This automatically induces the node sharing that
   gives replicate performance O(log n), but behaves exactly correctly for all
   its other requirements.
   - As a result, there is a faster alternative to fromList if the length of
   the list is known.  The name and type of this method seems like it might
   become genuinely contentious, so I'm not inclined to expose that faster
   method at the moment.

Louis Wasserman
wasserman.louis at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/libraries/attachments/20090716/4c59c67a/attachment.html


More information about the Libraries mailing list