[Haskell-cafe] The difficulty of designing a sequence class

Robert Dockins robdockins at fastmail.fm
Mon Jul 31 09:16:37 EDT 2006


On Jul 30, 2006, at 5:28 PM, Brian Hulley wrote:
> Robert Dockins wrote:
>> On Sunday 30 July 2006 07:47, Brian Hulley wrote:
>>> Another option, is the Edison library which uses:
>>>
>>>      class (Functor s, MonadPlus s) => Sequence s where
>>>
>>> so here MonadPlus is used instead of Monoid to provide empty and
>>> append. So I've got three main questions:
>>
>>> 1) Did Edison choose MonadPlus just because this fitted in with the
>>> lack of multi-parameter typeclasses in H98?
>> Edison's design hails from a time when MPTCs were not only
>> non-standard (as they still are), but also not widely used, and
>> before fundeps were avaliable (I think).  So the answer to this one
>> is pretty much "yes".
> [snip]
>
> Hi - Thanks for the answers to this and my other questions. One  
> thing I just realised is that there doesn't seem to be any instance  
> declarations anywhere in the standard libs relating Monoid to  
> MonadPlus so it's a bit unsettling to have to make a "random"  
> choice on the question of what kind of object a Sequence is...
>
> I tried:
>
>    class (forall a. Monoid s a) => Sequence s where ...
>
> but of course that doesn't work, so I suppose MonadPlus is the only  
> option when 'a' doesn't appear as a type variable arg of the class  
> being defined.
>
>> BTW, for what purpose are you desiging a new sequence class?  You are
>> clearly aware of other efforts in this area; in what ways to they not
>> meet your needs?
>
> The existing sequence and collection classes I've looked at don't  
> do enough.
>
> For example, when I tried to represent the text in an edit widget,  
> I realised I needed a sequence of characters that could also be  
> considered to be a sequence of lines, and it is necessary to be  
> able to index the sequence by character position as well as by line  
> position, as well as keeping track of the total number of  
> characters, the total number of lines, and the maximum number of  
> characters on any one line (so as to be able to calculate the x,y  
> extents when laying out the widget, assuming a fixed width font  
> (tabs ignored!)), with very efficient split and append operations.

So, what you want is a sequence of sequences that can be  
transparently converted to a "flattened" sequence and vice versa? And  
you also want to keep track of the total number of lines and  
characters within each line.  Additionally, you want to keep track of  
the maximum number of characters in any one line.

> I managed to get a good representation by using a FingerTree of  
> lines where each line uses a ByteString.
> I made my own FingerTree class based on the one referenced in the  
> paper at http://www.soi.city.ac.uk/~ross/papers/FingerTree.html but  
> without the symbolic names which I find totally unreadable and  
> confusing, and also so I could get full control of the strictness  
> of the implementation, and also as a way of understanding them  
> since I'd never come across such a complicated data structure  
> before. (I highly recommend this paper to anyone who wants to learn  
> about FingerTrees, Monoids and other very useful concepts.)
>
> So one thing existing sequence classes don't have (apart from  
> FingerTree) is the concept of measurement which is essential when  
> you want efficient updates. Eg in my text buffer, the measurement  
> maintained for a sequence is the number of chars and number of  
> lines and maximum line length.

Edison has support for transparently keeping track of the size of a  
sequence.

http://www.eecs.tufts.edu/~rdocki01/docs/edison/Data-Edison-Seq- 
SizedSeq.html

It may well be possible to create a slightly generalized wrapper that  
keeps track of arbitrary "measures".  (If they can be computed by a  
function which is associative, commutative and has a unit).
Humm, sort of an incremental fold.... I like it.

> Then I needed a structure for a Trie widget a bit like (details  
> omitted):
>
>      data Node = Expanded Value T | Collapsed Value T | Leaf Value
>      newtype T = T (FingerTree (Key, Node))
>
> where objects of type T could be regarded as a finite map (eg from  
> hierarchical module names to modules) as well as a flattened linear  
> sequence indexed by line number (for display on the screen in a  
> widget given the current scroll bar position), and which also  
> needed to keep track of the total horizontal and vertical extent of  
> the Trie as it would appear in the widget's font.
>
> There are several different kinds of measurement going on in this  
> data structure, as well as the complexity of the extra recursion  
> through the leaf to a new level. Existing sequence abstractions  
> don't seem to provide the operations needed to treat a nested data  
> structure as a single sequence.
>
> In summary:
>
> 1) Often a complex data structure must be able to be simultaneously  
> regarded as a single flattened sequence
> 2) Measurements are needed for efficient updates (may need to keep  
> track of several at once)
> 3) Indexing and size are sometimes needed relative to the flattened  
> sequence not just the top level
> 4) It is useful to have a finite map that can also be regarded as a  
> linear sequence
> 5) Such finite maps may also be nested (when the keys are  
> hierarchical) but this nesting should be hidden from the user...
> 6) I want a design that can allow complex data structures to be  
> built up easily and instanced to the appropriate interfaces
> 7) Also naming conventions in the existing libs are a bit irregular  
> and burdened with old fashioned lisp-isms eg in Data.Edison.Seq  
> there are functions "lview" and "reducel" but I'd argue that there  
> must be one and only one way of forming any identifier in any  
> program namely that the function should appear first followed by  
> qualifiers (so that related functionality always appears together  
> in a lexicographical listing of functions) and it must use camel  
> case with no exceptions at all, thus "viewL" and "reduceL" (and  
> "foldL").

OK.  Point taken.  I'm not sure I agree with the no-exceptions camel- 
case, but the lexicographical-listing-groups-functionality holds  
strong appeal for me.

> 8) More factoring needs to be done since not all sequences need to  
> be indexed or measured or to be "flattened through the leaf" (eg  
> the FingerTree paper already has a separate class for Reduce and I  
> believe their implementation also referred to a class for Foldable)  
> rather than bundling everything in a single Sequence class.
>
> Anyway apologies for my very rambling answer - I'm still a long way  
> from finding a good set of classes to address the above issues :-)


Well, I guess I'd suggest you attempt to identify specific problems  
with already existing packages and attempt to work with those who  
maintain such packages before reinventing something as basic (and  
difficult to get right!) as data structure abstractions.

Such maintainers may be willing to accept patches and/or implement  
requested features in order to reduce fragmentation in this space  
*hint, hint*  :-)


<soapbox type="Edison plug">
I personally think that Edison is a great piece of work, and I took  
up maintainership because I felt it was a great shame that no one was  
using it.  My ultimate goal is to make Edison the package that  
everyone thinks of first when they discover they need a Haskell  
datastructure for some purpose.  Even if Edison does not fill that  
need, I want every Haskeller to compare his needs against what Edison  
provides before striking out on his own, and I want that to be a  
decision made with some hesitation.  Over time I hope to make the  
cases where Edison doesn't cut the mustard fewer and further between.

So, if you've ever looked at Edison, or ever do so in the future, and  
decide it isn't what you need, please let me know why so I can make  
it better for the next time.  After all, squeaky wheels get the  
grease, but only if I can hear the squeaking!
</soapbox>



Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG





More information about the Haskell-Cafe mailing list