Personal tools

Talk:Toy compression implementations

From HaskellWiki

(Difference between revisions)
Jump to: navigation, search
m
(I'm impressed...)
 
Line 18: Line 18:
 
(encode_LZW chars) uses 'chars' to make the initial table for the 'work' function by turning the list of characters into a list of length 1 strings.
 
(encode_LZW chars) uses 'chars' to make the initial table for the 'work' function by turning the list of characters into a list of length 1 strings.
   
The where (tok,rst) definition can be read right to left:
+
The <hask>where (tok,rst)</hask> definition can be read right to left:
* The zip (inits lst) (tails lst) computes every possible way to split lst input into a prefix and suffix, in increasing length of prefix.
+
* The <hask>zip (inits lst) (tails lst)</hask> computes every possible way to split <hask>lst</hask> input into a prefix and suffix, in increasing length of prefix.
* The tails function just drops the head because it doesn't want to consider the length 0 prefix
+
* The <hask>tail</hask> function just drops the head because it doesn't want to consider the length 0 prefix
* takeWhile applies the predicate (`elem` table) to the prefix.
+
* <hask>takeWhile</hask> applies the predicate <hask>(`elem` table)</hask> to the prefix. This will always succeed on the length 1 prefix, and may find longer prefixes in the table.
This will always succeed on the length 1 prefix, and may find longer prefixes in the table.
+
* The <hask>last</hask> function take the last prefix in the table, which will always be the longest such prefix
* The last function take the last prefix in the table, which will always be the longest such prefix
+
* <hask>tok</hask> is this prefix, and <hask>rest</hask> is the remaining suffix to process.
* 'tok' is this prefix, and 'rest' is the remaining suffix to process
 
   
The (work table lst) is (:) applied to a head and tail. The head of the list is the position of the longest 'tok' prefix in the table, thus (table !! index) == tok. The tail of the list is a recursive call to work with an updated table.
 
   
The updated table' is the old table with string 'tok1' appended. And tok1 is one character longer than the longest matching prefix tok. Thus the table is expanded with longer and longer subsequences of the compressed string. When any of these are found again they are represented as their index in the table.
+
Wow... a most ingenious (and inefficient) approach! Well, now it makes sense anyway. [[User:MathematicalOrchid|MathematicalOrchid]] 13:34, 9 March 2007 (UTC)
 
"hello world, hello world" goes from 24 characters to 19 numbers since "he" "ll" "o " "wo" and "rl" are encoded as single numbers. The table grows from the initial 95 entries to 95+19-1 = 113.
 

Latest revision as of 13:34, 9 March 2007

Much kudos for fixing the underflow error. The new LZW implementation is much smaller, but... how in the name of God does it actually work? o_O MathematicalOrchid 11:36, 9 March 2007 (UTC)

To understand it I rewrote it a bit:

encode_LZW :: (Eq t) => [t] -> [t] -> [Int]
encode_LZW alphabet = work (map (:[]) alphabet) where
  work table []  = []
  work table lst = index : work table' rest
    where (tok, rest) = last . takeWhile ((`elem` table) . fst) . tail $ zip (inits lst) (tails lst)
          index = fromJust (elemIndex tok table)
          table' = table ++ [tok']
          tok' = tok ++ [head rest]

The idea of the the table, which is the 1st argument to 'work', is that some prefix of the input is already in the table.

(encode_LZW chars) uses 'chars' to make the initial table for the 'work' function by turning the list of characters into a list of length 1 strings.

The
where (tok,rst)
definition can be read right to left:
  • The
    zip (inits lst) (tails lst)
    computes every possible way to split
    lst
    input into a prefix and suffix, in increasing length of prefix.
  • The
    tail
    function just drops the head because it doesn't want to consider the length 0 prefix
  • takeWhile
    applies the predicate
    (`elem` table)
    to the prefix. This will always succeed on the length 1 prefix, and may find longer prefixes in the table.
  • The
    last
    function take the last prefix in the table, which will always be the longest such prefix
  • tok
    is this prefix, and
    rest
    is the remaining suffix to process.


Wow... a most ingenious (and inefficient) approach! Well, now it makes sense anyway. MathematicalOrchid 13:34, 9 March 2007 (UTC)