[Haskell-beginners] Defining custom parser using Parsec

Jimmy Wylie jwylie at uno.edu
Sun Oct 17 17:59:22 EDT 2010


  Hi everyone,

I'm working on a digital forensics application that will take a file 
with lines of the following format:

"MD5|name|inode|mode_as_string|UID|GID|size|atime|mtime|ctime|crtime"

This string represents the metadata associated with a particular file in 
the filesystem.

I created a data type to represent the information that I will need to 
perform my analysis:

data Event = Event {
      fn          :: B.ByteString,
      mftNum :: B.ByteString,
      ft           :: B.ByteString,
      fs           :: Integer,
      time       :: Integer,
      at           :: AccessType
      mt          :: AccessType
      ct           ::  AccessType
      crt          :: AccessType
      } deriving (Show)

data AccessType = ATime | MTime | CTime | CrTime
                   deriving (Show)

I would like to create a function that takes the Bytestring representing 
the file and returns a list of Events:
createEvents :: ByteString -> [Event]
(For now I'm creating a list, but depending on the type of analysis I 
decide to do, I may change this data structure)

I understand that I can use the Parsec Library to do this.  I read RWH, 
and noticed they have the endBy and sepBy combinators, but my issue with 
these is that using these funcitons performs too many transformations on 
the data.
endBy will return a list of strings, which then will be used by sepBy 
which will then return a [[ByteString]] which I will then have to 
iterate through to create the final [Event].

What I would like to do is define a custom parser, that will go from the 
ByteString to the [Event] without the overhead of those intermediate 
steps. This function needs to be as fast as possible, as these files can 
be rather large, and I will be performing many different tests and 
analysis on the data.  I don't want the parsing to be a bottleneck.

I'm under the impression that the Parsec library will allow me to define 
a custom parser to do this, but I'm having problems understanding the 
library, and the documentation for it.

A gentle shove in the right direction would be greatly appreciated.

Thanks for your help,
Jimmy






More information about the Beginners mailing list