[Haskell-cafe] GSoC proposal: Haskell AST-based refactoring and API upgrading tool

Davorak patrick.john.wheeler at gmail.com
Mon Apr 29 21:01:17 CEST 2013


The latest updates on HaRe with GHC API project seem to be posted on the 
google+ community page:
https://plus.google.com/communities/116266567145785623821



On Monday, April 29, 2013 5:09:56 AM UTC-5, Simon Hengel wrote:
>
> Hi Niklas,
> I haven't read the whole proposal as I'm short of time.  But Alan
> Zimmerman is doing a lot of work on integrating HaRe with the GHC API
> [1].  He is alanz on freenode and a regular in #hspec.
>
> I haven't looked at the code, but maybe it's of interest to you.
>
> Cheers,
> Simon
>
> [1] https://github.com/alanz/HaRe/tree/ghc-api
>
> On Mon, Apr 29, 2013 at 02:00:23PM +0800, Niklas Hambüchen wrote:
> > I would like to propose the development of source code refactoring tool
> > that operates on Haskell source code ASTs and lets you formulate rewrite
> > rules written in Haskell.
> > 
> > Objective
> > ---------
> > 
> > The goal is to make refactorings easier and allow global code changes
> > that might be incredibly tedious to do in a non-automated way.
> > By making these transformations convenient, we can make it easier to
> > maintain clean code, add new features or clean up leftovers faster, and
> > reduce the fear and effort to upgrade to newer versions of packages and
> > APIs.
> > 
> > 
> > Transformations
> > ---------------
> > 
> > First, here are a few operations you would use this tool for. Some of
> > them are common operations you would also do in other programming
> > languages, some are more specific to Haskell.
> > 
> > * Changing all occurrences of "import Prelude hiding (catch)" to "import
> > qualified Control.Exception as E"
> > 
> > * Replacing all uses of a function with that function being imported
> > qualified or the other way around
> > 
> > * Adding a field to data constructor a record, setting user-supplied
> > defaults for construction and destruction:
> > 
> >     -- Suppose you want to change one of these
> >     data User = User { name :: String, age :: Int }
> >     data User = User String Int
> > 
> >     -- into one of these
> >     data User = User { name :: String, age :: Int, active :: Bool }
> >     data User = User String Int Bool
> > 
> >     -- the refactoring tool could perform, in all relevant locations:
> >     show (User name age) = ...
> >     show (User name age _) = ...
> > 
> >     -- and also this transformation:
> >     ... u { name = "deleted" } ...
> >     ... u { name = "deleted", active = False } ...
> > 
> >     -- or equivalently with records.
> > 
> >     -- Special cases could be taken care of as specified, such as
> >     --   "whenever an object of [this User type] has
> >     --    of its records passed into some function 'email', do this
> >     --    now only if the user is active, so modify all relevant code
> >     --        email (name u)
> >     --    to
> >     --        if (active u) then email (name u) else return ()
> > 
> >     -- Other examples include adding a position counter to attoparsec.
> > 
> > * Adding a type parameter to a type
> > 
> >     -- This happens a lot on monad transformer stacks, e.g.
> >     newtype MyMonad a b c = MyMonad (ReaderT a (WriterT b ...
> > 
> >     -- and as you would probably agree on, this is not the most
> >     -- comfortable change to make; in big project this can mean
> >     -- hour-long grinding.
> > 
> >     -- It has also recently happened in the basic underlying types
> >     -- of packages like conduit and pipes.
> > 
> > * Adding a new transformer around a monad
> > 
> > * Addressing problems like mentioned in
> > http://blog.ezyang.com/2012/01/modelling-io/:
> >   "There is one last problem with this approach: once the primitives
> > have been selected, huge swaths of the standard library have to be
> > redefined by “copy pasting” their definitions ..."
> > 
> > * Extracting a value into a let or where clause
> > 
> > * Renaming a variable, and all its occurrences that are semantically
> > same variable (based on its scope)
> > 
> > * Changing the way things are done, such as:
> > 
> >     * Replacing uses of fmap with <$>, also taking care of the
> >       corresponding import, and such cases were partial application
> >       is involved
> > 
> >     * Replacing uses of "when (isJust)" to "forM_"
> > 
> > * Making imports clearer by adding all functions used to the file to the
> > import list of the module that gets them in scope
> > 
> > * Finding all places where an exported function does not have all its
> > arguments haddock-documented.
> > 
> > * Performing whole-project refactorings instead of operating on single
> > files only, allowing operations like
> > 
> >     "Find me all functions of this type, e.g.
> >          Maybe a -> (a -> m a) -> m a
> >      in the project and extract them into this new module,
> >      with the name 'onJust'."
> > 
> > 
> > Some of the problems above can be tried to address using regex-based
> > search and replace, but this already fails in the simplest case of
> > "import Prelude hiding (catch)" in case there is more than that imported
> > from Prelude or newlines involved in the import list.
> > 
> > Transformation on the AST are much more powerful, and can guarantee that
> > the result is, at least syntactically, valid. No text base tool can do 
> that.
> > 
> > 
> > Other uses
> > ----------
> > 
> > In addition to being able to perform transformations as mentioned above,
> > the refactoring tool as a library can be leveraged to:
> > 
> > * Support or be the base of code formatting tools such as
> > haskell-stylish, linters, style/convention checkers, static analyzers,
> > test coverage tools etc.
> > 
> > * Implement automatic API upgrades.
> > 
> >   Imagine the author of a library you use deprecates some functions,
> > introduces replacements, adds type parameters. In these cases, it is
> > very clear and often well-documented which code has to be replaced by
> > what. The library author could, along with the new release, publish
> > "upgrade transformations" that you can apply to your code base to save
> > most of the manual work.
> > 
> >   These upgrade transformations could be either parts of the packages
> > themselves or be separately maintained and refined on the feedback of
> > users applying them to their code bases.
> >   This could allow the Haskell community to keep up their fast pace
> > while making API breakage a non-problem.
> > 
> >   Automation is what makes us deal well with test suites, source
> > control, error checking. Taking the pain out of this will increase
> > people's incentive to upgrade their code an to keep it well-maintained
> > by easy refactorings.
> > 
> >   This concept of automatic code upgrades would, to my knowledge, be
> > quite unique in the programming language world, and the possibility to
> > have this is, as many things, the result of Haskell's excellent type 
> system.
> >   In comparison to dynamic and "unsafe" languages, we actually have a
> > lot of information around that we can use to write powerful and
> > expressive tools.
> > 
> > 
> > Splitting into GSoC tasks
> > -------------------------
> > 
> > I think that the project is too large for one summer and should be split
> > into two parts:
> > 
> > 1. Implementation of a "full-source" transformation-enabling AST
> > 2. The transformation engine and API / DSL
> > 
> > 
> > About 1: Creating a full-source AST
> > 
> > Existing parsers for Haskell tend to throw away information that is not
> > necessary for compiling the code. Of course this is not good for a
> > code-to-code transformation tool, stripping comments or whitespace the
> > programmer cares about is not an option.
> > 
> > haskell-src-exts has gained support for comments around version 1.1 (as
> > another GSoC project?), but still comments are treated as foreigners,
> > given to you in a list detached from the AST; their positions are
> > specified with integers, which means that when you modify the AST, you
> > have to take care to adjust the comments as well.
> > 
> > The problem seems to be that haskell-source-exts is made for parsing,
> > its AST is made for transformations. I have yet to see a tool that
> > actually *modifies* haskell-src-exts's AST - most of them use it for
> > parsing, and then apply some form of pretty-printing to get back to code.
> > 
> > We need an AST that contains *all* information about the original source
> > code, which means render . parse = id; this is why I called it "full
> > source" AST.
> > Also, this AST should have all elements available in a way that
> > encourages modification and does not discriminate against operations
> > like re-indentation or alignment.
> > 
> > I like to think of this as something that could eventually be part of
> > ghc, in a pipeline like
> > 
> >     GHC's parser
> >  -> full-source AST
> >  -> reduced AST that strips parts irrelevant for compilation
> >  -> usual compilation pipeline
> > 
> > I am thinking of this being in a GHC environment because it would
> > guarantee that it is up-to-date with the currently supported language
> > extensions etc.
> > However, I also see that this might slow down development and make
> > things harder, so at least for the beginning it might be better to do it
> > as as a normal library; I would love to get feedback on this.
> > 
> > It might be a good idea to start with GHC's or haskell-source-exts
> > parser and modify them to obtain the full-source AST. I believe though
> > that it would be beneficial if the AST would be to a certain extent be
> > decoupled from the parser, such that other people could write other
> > parsers (e.g. using other parsing libraries like Parsec, attoparsec,
> > uu-parsinglib, trifecta) that produce the same compatible AST type.
> > 
> > I believe that creating this AST makes a GSoC project for itself, and
> > that it would be beneficial for all efforts processing Haskell as a
> > source language.
> > 
> > 
> > About 2: Transformation engine and how to write transformations
> > 
> > Transformations should be Haskell code that operates on the AST.
> > 
> > They would be similar to how you write TemplateHaskell in a way, yet
> > much easier and more intuitive to use in most cases. I suggest the
> > creation of a monadic DSL that allows you to select and match those
> > parts of the code you are interested in, perform some case analysis and
> > computation to determine how you want to transform it, and then give you
> > convenient ways to express your changes in that DSL.
> > 
> > Functions in this DSL would roughly belong to one of the kinds
> >     * finding/matching
> >     * transforming/rewriting.
> > 
> > I have not thought about how this DSL would exactly look like, but I
> > could imagine myself writing a high-level transformation like this:
> > 
> >     is           <- getImports
> >     let matching = filter (importsFunction "catch")
> >                  . filter (hasModuleName "Prelude")
> >     mapM_ (rewrite . removeImportFunction "catch") matching
> > 
> > One of the priority goals of this GSoC project is making this API
> > convenient; transformations should read more naturally and be much
> > shorter than equivalent TemplateHaskell.
> > 
> > This extensive, convenience oriented monadic DSL should aim to have a
> > reasonably large set of functions, and "do things for you" even if you
> > could build them yourself with three or four combinators.
> > It should base itself on a minimal set of core transformations.
> > I think that the Haskell community has a lot of expertise, especially
> > from parsing libraries, in creating this core API + convenience
> > combinators duo, that could aid the GSoC student to get this right.
> > 
> > Along with this way to specify transformations, a transformation engine
> > has to be constructed which can perform them somewhat efficiently over
> > large code-bases.
> > 
> > While speed should should not be a main target for this project and
> > functionality and expressiveness are the main goals, the transformation
> > engine should be designed at least with speed in mind, which can be
> > worked on at a later stage outside of the GSoC project.
> > 
> > This second part of the proposal should result in a library to perform
> > transformations and an executable that can apply them to a folder of
> > source files.
> > 
> > 
> > Possible extensions and follow-up projects
> > ------------------------------------------
> > 
> > * Interactive refactoring tool
> > 
> > The refactoring tool could be driven by an interactive application that
> > lets you write some transformations, apply them to your code base, view
> > the diffs and build the project; on build failures or otherwise
> > not-complete transformations it would allow to refine your
> > transformations, or write custom special cases for selected files, line
> > ranges or functions.
> > This way you could quickly upgrade your code base in a fast
> > write-check-repeat cycle.
> > 
> > * An API upgrade infrastructure
> > 
> > As mentioned above, transformations could be published along with new
> > software packages that allow their dependants to easily upgrade their
> > code to the latest API version with minimal manual effort.
> > An infrastructure or standard way of way of doing this could be 
> established.
> > For now, I think that package maintainers would publish their upgrade
> > transformations in their projects' or separate code repositories, and
> > users would apply them with the refactoring tool as needed.
> > I can also imagine a more dedicated infrastructure though, where API
> > upgrades are stored on Hackage or a similar database.
> > 
> > 
> > Summary
> > -------
> > 
> > This proposal contains two GSoC projects that I believe to be reasonably
> > sized for one summer each.
> > 
> > I think that they make good projects according to GSoC standards as they
> > are focused, feasible, and aimed at creating real-world code that bring
> > a clearly visible benefit.
> > 
> > The entry barrier is not too high since knowledge of compiler or runtime
> > internals is not required. However, applicants should probably have a
> > fairly good understanding of parsers and how they are dealt with in
> > Haskell, and be somewhat familiar with the Haskell community in order to
> > find sources of good feedback and for being able to judge whether the
> > tools being created would be convenient for the community to use.
> > 
> > 
> > Discussion
> > ----------
> > 
> > Now I would be glad to get some responses on this proposal; I have
> > written a bit of text, but it is still a very rough idea and I would
> > love to hear your thoughts about it.
> > 
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskel... at haskell.org <javascript:>
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskel... at haskell.org <javascript:>
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130429/efca6fb9/attachment.htm>


More information about the Haskell-Cafe mailing list