<div dir="ltr"><div>I&#39;m learning about the Map Reduce computation frequently used with big data.  For the fun of it, I decided to write a very high-level spec of Map Reduce.  Here is what I came up with.  Enjoy.<br><br>

</div>John<br><div><br>&gt; module MapReduce where<br>&gt; import Data.List (nub)<br><br>A high-level specification of Map Reduce as a Haskell program.  The<br>program uses lists to represent multisets.  As multisets have no<br>

implied ordering, the ordering implied by lists in this specification<br>should be ignored.<br><br>The database is a multiset of key-value pairs.<br><br>&gt; type Key = String<br>&gt; type Value = String<br>&gt; type Datum = (Key, Value)<br>

&gt; type Data = [Datum]<br><br>A mapper maps a datum to a finite multiset of key-value pairs.<br><br>&gt; type Mapper = Datum -&gt; Data<br><br>A reducer takes a key and a multiset of values and produces a finite<br>multiset of values.<br>

<br>&gt; type Reducer = (Key, [Value]) -&gt; [Value]<br><br>A step is a mapper followed by a reducer<br><br>&gt; type Step = (Mapper, Reducer)<br><br>A program is a finite sequence of steps<br><br>&gt; type Program = [Step]<br>

<br>The semantics of a program is provided by the run function.<br><br>&gt; run :: Program -&gt; Data -&gt; Data<br>&gt; run [] d = d<br>&gt; run (s : p) d =<br>&gt;   run p (step s d)<br><br>The three parts of a step are mapping, shuffling, and reducing.<br>

<br>&gt; step :: Step -&gt; Data -&gt; Data<br>&gt; step (m, r) d =<br>&gt;   let mapped = transform m d<br>&gt;       shuffled = shuffle mapped in<br>&gt;   reduce r shuffled<br><br>The first part of a step is to transform the data by applying the<br>

mapper to each datum and collecting the results.<br><br>&gt; transform :: Mapper -&gt; Data -&gt; Data<br>&gt; transform m d =<br>&gt;   [p | u &lt;- d, p &lt;- m u]<br><br>Next, values with common keys are collected.  Keys are unique after<br>

shuffling.<br><br>&gt; shuffle :: Data -&gt; [(Key, [Value])]<br>&gt; shuffle d =<br>&gt;   [(k, vs) | k &lt;- nub (map fst d), -- nub eliminates duplicate keys<br>&gt;              let vs = [v | (k&#39;, v) &lt;- d, k&#39; == k]]<br>

<br>A reducer is applied to the data associated with one key, and always<br>produces data with that key.<br><br>&gt; reduce :: Reducer -&gt; [(Key, [Value])] -&gt; Data<br>&gt; reduce r rs = <br>&gt;   [(k, v) | (k, vs) &lt;- rs, v &lt;- r (k, vs)]<br>

<br></div></div>