Personal tools

Haskell Quiz/IP to Country/Solution Dolio

From HaskellWiki

< Haskell Quiz | IP to Country
Revision as of 04:04, 19 September 2007 by Dolio (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Searching a big CSV file seemed like an ideal use of the famed ByteString library, so I hacked up a quick solution. It uses lazy chunked input for hopefully cache-efficient processing, but deals with the chunks in terms of their strict byte string implementations to also avoid as much overhead as possible (not a particularly hard scheme to set up once you've seen it).

It's fast. Looking up the IP on the quiz page takes roughly 0.04 seconds (as opposed to 0.30 on the reference implementation), and about half a second on an IP that isn't in the database (forcing the entire file to be processed), which seems not too shabby. However, no specs were given for the machine the reference implementation was run on, so the comparisons above are rather worthless. :)

This just processes the raw file downloaded from the website linked in the quiz, and processes it linearly. One could probably devise an optimized version of the database, or a more efficient searching scheme and gain performance, but the naive solution is still plenty fast.

{-# LANGUAGE PatternGuards #-}
 
module Main(main) where
 
import Data.Maybe
import System.Environment
 
import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Lazy.Char8 as L
 
-- Process a file by line. For each line in the file denoted by
-- the FilePath, the function is called. If the result of the
-- computation is True, processing is cut off early.
--
-- This uses lazy chunked reading, but operates on the chunks
-- one by one for (hopefully) maximum speed.
processFile :: FilePath -> (B.ByteString -> IO Bool) -> IO ()
processFile path op = proc . L.toChunks =<< L.readFile path
 where
 proc []     = return ()
 proc [c]    = proc' (B.lines c) >> return ()
 proc (c:cc:cs) = do b <- proc' (B.lines c')
                     if b then return () else proc cs'
  where (c', t) = B.breakEnd (=='\n') c
        cs' = B.append t cc : cs
 proc' []     = return False
 proc' (x:xs) = do b <- op x
                   if b then return True else proc' xs
 
-- Given an ip, represented as a 4-tuple, and a line expected to come
-- from the ip database, determines whether the ip matches. If it does,
-- the corresponding country is printed, and an exit is signaled.
ipSearch :: (Int, Int, Int, Int) -> B.ByteString -> IO Bool
ipSearch (a,b,c,d) s
    | Just (from, to, country) <- parse s,
      from <= ip,
      ip <= to                             = B.putStrLn country >> return True
    | otherwise                            = return False
 where
 ip = d + 256*c + 256*256*b + 256*256*256*a
 
parse s = case B.split ',' s of
               [f,t,_,_,_,_,c] -> do (from,_) <- B.readInt (B.tail f)
                                     (to,  _) <- B.readInt (B.tail t)
                                     return (from, to, B.tail (B.init c))
               _               -> Nothing
 
main = do (ips:_) <- getArgs
          processFile "IpToCountry.csv" (ipSearch $ ipParse ips)
 where
 ipParse = convert . B.split '.' . B.pack
 convert [a,b,c,d] = fromJust $ do (a',_) <- B.readInt a
                                   (b',_) <- B.readInt b
                                   (c',_) <- B.readInt c
                                   (d',_) <- B.readInt d
                                   return (a',b',c',d')