Difference between revisions of "User:Benmachine/Overqualified modules"

Revision as of 02:20, 5 September 2012

Overqualified modules

The hierarchical module system was originally proposed as an extension to the Haskell98 standard, and adopted formally in Haskell2010. It is typically regarded as one of the less controversial extensions, because more or less everyone agreed that single-token module names were liable to become a huge tangled mess with everyone stepping on each others' toes.

Data.Data.Data

I lack a little historical context here, since the extension was widespread before I was introduced to Haskell, but I think that the current layout of the module hierarchy is unsatisfactory. Having been given hierarchical modules, Haskellers seem to feel obliged to use them: single-component names are virtually unheard of. Yet in many cases, the additional categorisation seems to add no semantic content whatsoever. What do we learn about a module by its name Data.Bool that was not already evident in the Bool? Why is the Functor type class a piece of Data but the closely-related Applicative type class a Control structure? Why do we have Data.Monoid but Control.Category?

Redundant specification

There are certainly cases where the additional qualification adds meaning. Writing import Haskell at the top of your file seems meaningless, where in import Language.Haskell you have a slightly better idea of what is being requested. However, minimalism is desirable: when adding a component to your module name, ask yourself if it resolves any confusion or prevents any ambiguity. I would argue that in Codec.Binary.UTF8.Generic, for example, nearly all of the name is redundant. There is no UTF-8 that is not a binary codec, and arguably the Generic component of the name is equally unenlightening. Just name the module UTF8, the shortest unambiguous description of its purpose.

Redundant disambiguation

One could argue that keeping module names long reduces the risk of collision. It's true that specifying more information in the module name might reduce the chance of some other module clashing with it, but often people confuse “information content” with “textual length”: clearly, grouping all monad-related modules under Control.Monad instead of just Monad is not going to stop two implementations of Reader from interfering with each other. So keep just the meaningful component of the name: what, after all, could possibly be named Monad except for a module housing the Monad class and related utility functions? Likewise Applicative, List, Exception, IO: all sorts of concepts are clearly going to exist only once in Haskell. Those that don't are no better served being Control.Monad.Reader than Monad.Reader.

If you really want to avoid name collisions, take a leaf from syb's book: previously under the hierarchy Data.Generics, which not only suffered from Data-itis but also adequately described any generic programming mechanism, syb is starting to move over to the new, more specific Generics.SYB hierarchy. This drops the useless Data prefix and instead uses a component – the name of the package – that is very likely to be unique to this particular design and implementation. We appear to lose some "generality", but in reality the knowledge that you were using SYB in particular was probably already encoded in your program, since other generics libraries will have made different design decisions. The new name also emphasises the position of syb as a generics library, not the generics library – on an equal footing with Uniplate and other similar tools.

Internal package politics

Hierarchical modules do make internal structuring of a project easier; one only needs to look at something like Haskore's module list to see that they could clearly not just all be dumped in a single source directory. So that is a legitimate use, but of course there's not necessarily any reason why the internal structure of your project has to be reflected in the external API you provide. If you want twenty helper modules in various tidy subdirectories, fine, but you can probably re-export everything relevant (and it is good design not to export too much) in just a few root modules at the base of your hierarchy. Don't confuse what makes life easy for the library author with what makes things easy for the library user – and don't assume you need to trade one off against the other.

Some syntactical digressions

In addition to the above practical concerns, I also somewhat object to the overuse of the poor little . character. For example, one should in principle be able to write a list of all weekdays as [Monday..], but this actually parses as a qualified reference to the Monday module – you'll need to use the marginally uglier [Monday ..]. This also demonstrates how the syntax for qualified operators is just plain ugly. It's hard to write and equally hard to read 7 Prelude.+ 8 or, to really rub it in, f Control.Category.. g.

Conclusion

Hierarchical modules added some much-needed structure to Haskell's module namespace, but should be used sparingly and responsibly to avoid tragic keyboard wear every time I want to import qualified Text.ParserCombinators.Parsec.Combinator as PCPC. The policy on how best to name your modules has historically been loose, and the coherence of the module landscape has suffered for it.