[Haskell-cafe] Re: [Haskell] URLs in haskell module namespace

Thu Mar 24 09:00:02 EST 2005

Dear list members,

I'd like to share some sketchy ideas I have on the subject to address
some of the issues raised.

At 12:14 22/03/05 +0000, Malcolm Wallace wrote:

>I cannot see any of the Haskell compilers ever implementing this idea
>as presented.  It would introduce an enormous raft of requirements
>(networking client, database mapping, caching, etc) that do not belong
>in a compiler - they belong in separate (preprocessing/packaging)
>tools.  Furthermore, these tools already exist, albeit they are young
>and have a long maturation process still ahead of them.

An external program acting as an "URI streamer" might be the solution.
Such a program (identified via an environment variable or a compiler's
command line option, just like Hugs external editor) would take an URI
as its command line argument, and respond with a streamed contents of
that URI on its stdout. Like e. g. curl/wget -O -. All the compiler
has to do is popen() that program if an import statement contains an
URI.

Using curl/wget helps get around various issues with
proxies/encryption/etc. as those programs are specifically designed
for that. I do not believe this would result in significant overhead
comparing to regular fopen() used by the compiler for opening source
files.

On a non-networked system, such a program would be a not so
complicated shell script pretending it downloads from an URI, but
reading from local disk (flash, any other kind of storage) instead.

To address the problem of module/package URI changes over time, the
following may be suggested. Either purl.org is used (and then it is
responsibility of a package maintainer to keep its URL pointer valid).
Or, some kind of a purl server may be set up somewhere (at haskell.org
for example) which also supports mirroring. This means that for each
package/module registered with this server, multiple locations are
known (well, probably willing people might allocate their computer
resources for that, at least I would not object as long as I have my
own web server). The hypothetical purl server serves redirects as
usual, but shifting randomly to another mirroring location for each
new request for a module/package. So, if an attempt to retrieve a
module/package fails, it may be repeated, and other mirror location
will be tried. Mirrors will be synchronized behind the scenes.

Will such a centralized purl server a bottleneck or a single point of
failure? Probably not more than a centralized Hackage database (or is
it planned to be distributed?)

Also, some resolver might be part of the URI streamer which maps
module names to URIs. For example, the Prelude will most likely be
stored locally, but some other module will not. This means that the
resolver consults the local package database (cabal), and its own
cache, and either streams a local file or popens curl with some URI
constructed specifically based on the desired module name. Once
downloaded, a module may be cached with some TTL, so further
recompilations do not result in curl involvement (until the TTL
expires).

PS This all is written in the assumption that Haskell source files are
served. Binary distributions, of course would require different
techniques.

-- 
Dimitry Golubovsky

Anywhere on the Web