lines/unlines and "inverse"

Lars Henrik Mathiesen thorinn@diku.dk
21 Jul 2002 11:47:37 -0000


> From: Ian Lynagh <igloo@earth.li>
> Date: Sat, 20 Jul 2002 15:03:22 +0100
> 
> [The Revised Haskell 98 report] says
> 
> -- lines breaks a string up into a list of strings at newline
> -- characters. The resulting strings do not contain newlines.
> -- Similary, words breaks a string up into a list of words, which
> -- were delimited by white space.  unlines and unwords are the
> -- inverse operations. unlines joins lines with terminating
> -- newlines, and unwords joins words with separating spaces.
> 
> I think the use of "inverse" is potentially confusing given,
> well, they aren't inverses (or even left or right inverses).
> 
> Ian, who thinks (unlines . lines == id) would have been useful. Oh well.

Well, you do have

      lines . unlines = id
      unlines . lines . unlines == unlines
      words . unwords . words = words

(unwords . words . unwords) cannot be simplified to unwords, though;
the results will differ on input that contains 'words' with leading,
trailing, or multiple consecutive spaces.

However, if you observe a few reasonable constraints on the input to
any of the functions, you can get it back by feeding the output to the
'inverse' function:

      words: input must have no leading or trailing blanks
      lines: input must end in a newline
      unwords: input list elements must be non-empty and must not
	       contain blanks
      unlines: input list elements must not contain newlines

Some of the restrictions on words and unwords could be removed if
words was changed to split on every space in the input, generating
empty words if necessary. But since the purpose of these functions is
text processing, most people would probably just wrap it in a helper
function to drop the empty strings.

Further than that, as long as the underlying idea is one of strings
being composed of words/lines that are also strings, unwords/unlines
cannot be injective functions. Unless you want to borrow a leaf from
TCL and let unwords/unlines add quoting of embedded space/newline that
is then removed by the corresponding words/lines function.

Lars Mathiesen (U of Copenhagen CS Dep) <thorinn@diku.dk> (Humour NOT marked)