Gut Build System

John D. Earle JohnDEarle at cox.net
Fri Mar 5 12:31:50 EST 2010


Simon Marlow wrote "I suggest that the way to start would be to design and build the 
infrastructure first, and then think about replacing GHC's build system."

Simon Marlow wrote "But if someone else were to do the work, and the result was 
maintainable and has at least the same functionality and performance, 
then it's a possibility." In a broad sense since all software needs to be maintained this is a desirable characteristic. Yes, this would be a good design goal.

I'm going to address this point for a moment and then comment on performance, then functionality. What factors influence whether something is maintained? What comes to my mind is things that are not vital tend to have this problem due to a lack of interest. That the build system succeeds by whatever means necessary is a vital interest. If it is adopted and does not prove to be worse than the previous way of doing things, it should stick. Good design and documentation help so does having a satisfactory feature set. Shell scripts and make files were once state of the art, cutting edge stuff. Today, it is no longer the cutting edge. There is no reason to believe that it won't be better with one possible exception, namely convenience and what sort of learning curve is involved both for the maintainers of the software and its consumers.

It is my impression that Haskell has a steep learning curve, but that is not altogether relevant when you are making an appeal to individuals who have already made the investment in Haskell. It might be a sticking point if we were to try to sell the idea to people who have little interest in making a large investment in becoming familiar with yet another language especially one that has a high learning curve. There are benefits to standardization regardless of the learning curve, however. To having one language instead of many. The Ada language is one such example.

I believe that Haskell is competitive as economy of key strokes is concerned. It just needs to have the same sort of convenient functionality that shell scripts possess, but this seems a matter of interest and motivation as opposed to whether or not it can be done. You can have the conventional C like function, low level interface that is a chore to use, but flexible, and a higher level interface where you pass string arguments that are in a form similar to what would appear on a command line. You might ask, would such a thing be efficient? That is where Template Haskell (TH) comes to the rescue. You can parse that string argument without it resulting in a run time penalty. As the build system is concerned I don't believe this is where the real performance gains will be achieved for reasons I will discuss shortly. There are other reasons to use TH.

It may be possible to use TH to emit standard complaint Haskell, that is Haskell without language extensions which could make the source code accessible to Haskell compilers that do not implement the sort of language extensions that GHC does. 

To discuss performance. It is doubtful that its performance would be less. It is likely there is a reason why few have bothered with speeding up the execution time of shell scripts and make files, however. The time isn't being spent executing the script. The time is being spent waiting for the file system to carry out a requested task. Consequently, it would not be shocking if no real gains are seen, but that presupposes that we kept with the old ways of doing things. 

I am proposing a new way of doing things. Shell scripts and make files were designed to make heavy use of the file system. Why? If you only had 4 kilobytes of RAM at your disposal, you work within your budget whether you like it or not. Instead of storing the data in a block of memory what do you do? You ship it out to a file. Allegedly, the file cache on the machine makes all of this irrelevant. In practice, it is merely an improvement. After running a few bench marks, you will discover that it is best to avoid accessing the file system whenever possible as opposed to as frequently as possible. This is easily demonstrated. 

How long will it take to copy a 100 megabyte file? Compare that with how long it will take to copy 100,000 files that are one kilobyte in size? The build system will copy 100,000 files and then do it a 100 times. When you realize this it begins to make sense why they call them nightly builds. If you are assuming that your RAM budget is tight, it makes sense. This is the way to do it. It works just like CPU registers. You have to push them onto the stack and off of the stack repeatedly. This is what traditional build systems do, but instead of pushing the values onto a stack. This is what makes it nonlinear. It is the assumption that the build system software is making concerning the amount of available RAM that is available. Today, we are far removed from this constraint. We can afford to begin thinking differently.

How the GHC executable and C language compiler executables work is consistent with the sort of model used by shell scripts and make files; consequently, I anticipate that some tweaking may be necessary in exactly how they work, but I am not entirely certain of this. One of my thoughts is to concatenate all the modules and feed them to the GHC or C executable as one large file or stream. This should result in dramatic improvements in speed as well as improvement in the quality of the compiler optimizations. When you do that, this is what you generally observe. The http://www.sqlite.org/ project does this. It may be convenient and desirable to ensure that GHC can accept a stream on standard in, for example. It may be further useful to modify the GHC source code so that GHC can remain resident in memory so there is no time wasted in calling it repeatedly like a CGI script.

As first things first is concerned what I want to do is exactly as Simon Marlow suggested, begin with the library, but I want to do more than this because I feel that would ultimately prove to be a mistake. The reason for this is that such an approach would encourage wrong thinking. We have to make something that solves the same problems as a shell script or make file, do so with a degree of convenience that is competitive, but also employ a different paradigm. What I intend to do is work first on creating a shell script, make file interpreter implemented in Haskell. That's phase one. We then stop using the sh and make executables and use the Haskell replacements. At this point the chief benefits from having done all of this will be the type safety and program analysis. You will know that a variable is unquoted and, consequently, the expression it appears in cannot cope with a space correctly. This alone will be useful. Many more errors will be caught early on with less effort. It will also mean that projects that have nothing to do with Haskell could benefit from it. It would be evangelical.

Phase two will be to supplant the shell scripts and make files altogether with Haskell. Here we will decide what is going to be the Haskell way.

Though I believe it will require a lot of work I believe that the investment to be a wise investment in that it will pay dividends. I am furthermore enthusiastic about it and have the time. I believe that this has intrinsic worth and something worth spending time on.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/cvs-ghc/attachments/20100305/7eb53660/attachment.html


More information about the Cvs-ghc mailing list