[Haskell-beginners] please critique my first stab at systems programming

Thu Apr 14 15:02:44 CEST 2011

searchForPattern' (and therefore searchForPattern) will read the entire
file, indeed. To check only the first 'n' chunks in a file you'd have to

1) receive a lazy bytestring, check first n*chuckSize bytes.
2) receive the handle, loop over n chunks.

El jue, 14-04-2011 a las 14:50 +0200, Daniel Fischer escribió:
> On Thursday 14 April 2011 14:30:59, Felipe Almeida Lessa wrote:
> > On Thu, Apr 14, 2011 at 8:50 AM, Daniel Fischer
> > 
> > <daniel.is.fischer at googlemail.com> wrote:
> > > $ cabal install stringsearch
> > > -- faster searches for a pattern than provided by bytestring
> > > 
> > > import qualified Data.ByteString.Char8 as BC
> > > import qualified Data.ByteString.Lazy as L
> > > import qualified Data.ByteString.Lazy.Search as L
> > > 
> > > blockSize = 512
> > > 
> > > main = do
> > >   stuff <- L.readFile "blocks"
> > >   case L.indices (BC.pack "PART") stuff of
> > >     [] -> putStrLn "Not Found."
> > >     (i:_) -> putStrLn $ "Found at " ++ show (i `quot` blockSize) ++
> > > "." putStrLn "Done."
> > 
> > This solves a different problem.  The original looked for PART only on
> > the beginning of each block,
> 
> Well, I would have expected that, but
> 
> > searchForPattern' index handle pat = do
> > 
> >     eof <- hIsEOF handle
> >     if eof then return Nothing
> >     
> >       else do
> >       
> >         bytes <- B.hGet handle chunkSize
> >         case BC.breakSubstring pat bytes of
> >         
> >           (x, y) | BC.null y -> searchForPattern' (index + 1) handle pat
> >           
> >                  | otherwise -> return (Just index)
> 
> searches the entire block and checks whether the second component is empty 
> to see whether there's any match at all. So I thought that'd be the desired 
> behaviour.
> 
> If one wants to only look at the start of the blocks, isPrefixOf  would be 
> a much more efficient way than breakSubstring. It would probably be still 
> more efficient to read larger chunks than 512 bytes and step through them 
> with 512-byte steps to check for a "PART" there (unless the "PART" appears 
> in the first few blocks).
> 
> > while your solution will find everything
> > that has PART in it.  Because this is I/O bound, probably the real
> > time taken won't be very different, but the program will surely use
> > more CPU time and report more false positives.
> > 
> > Cheers,
> 
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners