What about something as simple as this?<div><br></div><div><br></div><div><div>import Control.Monad (forM)</div><div>import System.Directory (doesDirectoryExist, getDirectoryContents)</div><div>import System.FilePath ((</>))</div>
<div>import qualified Data.ByteString as B</div><div>import Data.Digest.OpenSSL.MD5 (md5sum)</div><div>import qualified Data.Map as M</div><div><br></div><div>getRecursiveContents :: FilePath -> IO [FilePath]</div>
<div>getRecursiveContents topdir = do</div><div> names <- getDirectoryContents topdir</div><div> let properNames = filter (`notElem` [".", ".."]) names</div><div> paths <- forM properNames $ \name -> do</div>
<div> let path = topdir </> name</div><div> isDirectory <- doesDirectoryExist path</div><div> if isDirectory</div><div> then getRecursiveContents path</div>
<div> else return [path]</div><div> return (concat paths)</div><div><br></div><div>getMD5 :: FilePath -> IO String</div><div>getMD5 file = md5sum `fmap` B.readFile file</div><div><br></div>
<div>main :: IO ()</div><div>main = do</div><div> files <- getRecursiveContents "."</div><div> md5s <- sequence $ map getMD5 files</div><div> let m = M.fromListWith (++) $ zip md5s [[f] | f <- files]</div>
<div> putStrLn $ M.showTree m</div></div><div><br></div><div><br></div><div>The biggest part is the "getRecursiveContent", shamelessly stolen from RWH.</div><div><br></div><div>L.</div><div><br></div><div>
<br></div><div><br><br><div class="gmail_quote">On Sun, Mar 18, 2012 at 5:43 PM, Yawar Amin <span dir="ltr"><<a href="mailto:yawar.amin@gmail.com">yawar.amin@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Michael,<br>
<br>
Michael Schober <Micha-Schober <at> <a href="http://web.de" target="_blank">web.de</a>> writes:<br>
<br>
> [...]<br>
<div class="im">> I took the liberty to modify the output a little bit to my needs - maybe<br>
> a future reader will find it helpful, too. It's attached below.<br>
<br>
</div>I kind of played around with your example a little bit and wondered if it<br>
could be implemented in terms of just the basic Haskell Platform<br>
modules and functions. So as an exercise I rolled my own directory<br>
traversal and duplicate finder functions. This is what I came up with:<br>
<br>
- walkDirWith: walks a given directory with a given function that takes a<br>
Handle to any (unknown type) value, and returns association lists of<br>
paths and the unknown type values.<br>
<br>
- filePathMap: I think roughly analogous to your duplicates function.<br>
<br>
- main: In the third line of the main function, I use hFileSize as an<br>
example of a function that takes a Handle to an IO value, in this case IO<br>
Integer. A hash function could easily be put in here. The last line<br>
pretty-prints the Map in a tree-like format.<br>
<br>
import System.IO<br>
import System.Environment (getArgs)<br>
import System.Directory ( doesDirectoryExist<br>
, getDirectoryContents)<br>
import Control.Monad (mapM)<br>
import Control.Applicative ((<$>))<br>
import System.FilePath ((</>))<br>
<div class="im">import qualified Data.Map as M<br>
<br>
</div>walkDirWith :: FilePath -> (Handle -> IO r) -> IO [(r, FilePath)] -><br>
IO [(r, FilePath)]<br>
walkDirWith path f walkList = do<br>
isDir <- doesDirectoryExist path<br>
if isDir<br>
then do<br>
paths <- getDirectoryContents path<br>
concat <$> mapM (\p -> walkDirWith (path </> p) f walkList)<br>
[p | p <- paths, p /= ".", p /= ".."]<br>
else do<br>
rValue <- withFile path ReadMode f<br>
((:) (rValue, path)) <$> walkList<br>
<br>
filePathMap :: Ord r => [(r, FilePath)] -> M.Map r [FilePath]<br>
filePathMap pathPairs =<br>
foldl (\theMap (r, path) -> M.insertWith' (++) r [path] theMap)<br>
M.empty<br>
pathPairs<br>
<br>
main :: IO ()<br>
<div class="im">main = do<br>
[dir] <- getArgs<br>
</div> fileSizes <- walkDirWith dir hFileSize $ return []<br>
putStr . M.showTree $ filePathMap fileSizes<br>
<br>
Obviously there's no right or wrong way to do it, but I'm wondering<br>
what you think.<br>
<br>
Regards,<br>
<br>
Yawar<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
_______________________________________________<br>
Beginners mailing list<br>
<a href="mailto:Beginners@haskell.org">Beginners@haskell.org</a><br>
<a href="http://www.haskell.org/mailman/listinfo/beginners" target="_blank">http://www.haskell.org/mailman/listinfo/beginners</a><br>
</div></div></blockquote></div><br></div>