Monday, March 07, 2011

Notes on Mail header (and MIME) parsers...

I'm trying to resurrect my old gawk based blogging system BLOGnBOX. It (ab)uses gawk to do everything from POP3 mail retrieval (you email your blog entry...) to FTP based posting of the blog (it is a static html blog).

I intend on cleaning it up by doing away from the gawk abuses. I am either going to make it (Plan 9) rc based (with Plan 9 awk and some C for the networking) or perhaps Haskell.  That is quite a choice, eh?

I've done a bit of Haskell over the past few months and feel strong enough to do the next generation BLOGnBOX, but the main problem is actually getting the thing going. (This is a nighttime CFT and, well, I have to get into a Haskell frame of thinking).

The first task up is a parser for mime encoded email. I plan on using regular expressions (yes, I know -- use Parsec or something more Haskell-ish).  Awk is somewhat of a natural for this, but Gawk has a little more "oomph".  I can visualize how I would do it in Awk, but the Haskell is not coming naturally.

Well, it isn't all that difficult to get started in Haskell:


module MailParser where
import Text.Regex
import qualified Data.Map  as Map
type Header = Map.Map String [String]
header_regex = mkRegex "^(From|To|Subject)[ ]*:[ ]*(.+)"
parseHeader :: String -> Header -> Header
parseHeader s h = case matchRegex header_regex s
                                     of Nothing -> h
                                         Just (k:v) -> Map.insert k v h

  Well, that is a beginning. Of course, I should be using ByteStrings for efficiency...  and, yes... I know... I know... I should be using Parsec

/todd

No comments:

Post a Comment