Difference between revisions of "Libraries and tools/MIME Strike Force"

From HaskellWiki
Jump to navigation Jump to search
(Added more sections.)
(Additional notes)
Line 3: Line 3:
 
The goal of the MIME Strike Force is to create the one, true MIME library for Haskell. Currently, there are a lot of partial MIME libraries, but nothing really complete.
 
The goal of the MIME Strike Force is to create the one, true MIME library for Haskell. Currently, there are a lot of partial MIME libraries, but nothing really complete.
   
In this document MIME includes basic RFC2822 messages.
+
In this document the term MIME includes basic RFC2822 messages.
   
 
= Use Cases =
 
= Use Cases =
   
This section describes different tasks the MIME library will be used for, and any special requirements of each usage.
+
This section describes different tasks the MIME library will be used for, and any special requirements for each usage.
   
 
== Composing MIME Messages ==
 
== Composing MIME Messages ==
Line 26: Line 26:
   
 
* Only parse as much semantic information as is needed. For example, the MTA does not need to decode all the attachments, etc. In fact, a MTA might only care about RFC2822, so it should not have to be forced to deal with the rest of the layers added by additional RFCs.
 
* Only parse as much semantic information as is needed. For example, the MTA does not need to decode all the attachments, etc. In fact, a MTA might only care about RFC2822, so it should not have to be forced to deal with the rest of the layers added by additional RFCs.
* Add new headers without modifying any of the formatting of exist headers.
+
* Add new headers without modifying any of the formatting of existing headers.
 
* Modify an existing header, perserving the comments and formatting as much as is sensible
 
* Modify an existing header, perserving the comments and formatting as much as is sensible
* Process mail messages that contain syntax errors that don't directly interfer with what the program is trying to do. For example, an invalidly formated Date field should not cause an error, if the program does not examine the Date field.
+
* Process mail messages that contain syntax errors that don't directly interfer with what the program is trying to do. For example, an invalidly formatted Date field should not cause an error, if the program does not examine the Date field.
   
Most existing Haskell MIME parsers have the following properties:
+
Most current Haskell MIME parsers have some or all of the following properties:
   
* They whole message must be parsed, even if most of the information is not used
+
* The whole message must be parsed, even if most of the information is not used
 
* They store only the semantic information, so (showMessage . parseMessage) does not produce an output with the same MD5SUM as the input.
 
* They store only the semantic information, so (showMessage . parseMessage) does not produce an output with the same MD5SUM as the input.
 
* They are too strict about rejecting invalid messages
 
* They are too strict about rejecting invalid messages
   
In addition to an MTA, another program to consider is a email virus checker which will decode the attachments to check for viruses. It will always add a header to indicate the message has been scanned, and occasionally remove an attachment that contained a virus.
+
In addition to an MTA, another program to consider is a email virus checker which will decode the attachments to check for viruses. It will always add a header to indicate the message has been scanned, and occasionally remove an attachment that contained a virus. But, otherwise, it should leave the message unmodified.
   
 
== Mail User Agent ==
 
== Mail User Agent ==
Line 44: Line 44:
 
A MUA would probably want the following features:
 
A MUA would probably want the following features:
   
* Ability to only download and decode large attachments at a users request. (i.e. a dial-up user using IMAP).
+
* Ability to only download and decode large attachments at a user's request. (i.e. a dial-up user using IMAP).
 
* Ability to display invalid emails. For example, the client should be able to display a message with an invalid Date field. Although this might interfer with sorting by date, the user will be more upset about not being able to read the email at all.
 
* Ability to display invalid emails. For example, the client should be able to display a message with an invalid Date field. Although this might interfer with sorting by date, the user will be more upset about not being able to read the email at all.
 
* Ability to read messages that use LF instead of CRLF as the line terminator
 
* Ability to read messages that use LF instead of CRLF as the line terminator
Line 65: Line 65:
 
* Support different string encodings (unicode, etc).
 
* Support different string encodings (unicode, etc).
 
* Have a good test suite
 
* Have a good test suite
* Be extensible. It would be nice if support for additional RFCs could be implemented with out having to modify the existing libraries.
+
* Be extensible. It would be nice if support for additional RFCs could be implemented without having to modify the existing libraries.
  +
* Support non-email usage of MIME. For example, the web uses MIME for things like form encoding, etc.
  +
* Be modular. It should be possible use portions of the library, such as a Base64 encoder/decoder, or a x-www-form-encoded encoder/decoder in a CGI program without having to jump through lots of hoops.
  +
  +
= Other Things to Consider =
  +
  +
== Code Reuse ==
  +
  +
There is already a bunch of code around for things like Base64 encoding/decoding. We should attempt to reuse these libraries, rather than reinvent them, if possible. However, I expect that in a number of cases, the existing code will use String, and not be easily extended to use ByteString. So, a rewrite might be easier.
  +
  +
== License ==
  +
  +
This code will be licensed under the BSD3 license (without the advertising clause).

Revision as of 20:08, 18 March 2007

MIME Strike Force

The goal of the MIME Strike Force is to create the one, true MIME library for Haskell. Currently, there are a lot of partial MIME libraries, but nothing really complete.

In this document the term MIME includes basic RFC2822 messages.

Use Cases

This section describes different tasks the MIME library will be used for, and any special requirements for each usage.

Composing MIME Messages

The MIME library must provide a set combinators for creating valid MIME messages. The combinators should allow the user to compose any valid MIME message, but restrict the user from creating invalid MIME messages.

Error conditions, such as missing required header fields (orig-date, originator, etc), should ideally be checked via the type-system at compile time.

Formatting issues, like line-length limitations, string encoding, etc, should be handled transparently at run-time.

The code that shows the final, formatted message should be able to terminate the lines with LF or CRLF.

It might also be nice if the combinators did not require a monadic interface.

Modifying MIME Messages

Some programs, such as a mail transfer agent (MTA), will want to look at only a few select headers, and add or modify a few headers. In this case the MIME library should allow the program to:

  • Only parse as much semantic information as is needed. For example, the MTA does not need to decode all the attachments, etc. In fact, a MTA might only care about RFC2822, so it should not have to be forced to deal with the rest of the layers added by additional RFCs.
  • Add new headers without modifying any of the formatting of existing headers.
  • Modify an existing header, perserving the comments and formatting as much as is sensible
  • Process mail messages that contain syntax errors that don't directly interfer with what the program is trying to do. For example, an invalidly formatted Date field should not cause an error, if the program does not examine the Date field.

Most current Haskell MIME parsers have some or all of the following properties:

  • The whole message must be parsed, even if most of the information is not used
  • They store only the semantic information, so (showMessage . parseMessage) does not produce an output with the same MD5SUM as the input.
  • They are too strict about rejecting invalid messages

In addition to an MTA, another program to consider is a email virus checker which will decode the attachments to check for viruses. It will always add a header to indicate the message has been scanned, and occasionally remove an attachment that contained a virus. But, otherwise, it should leave the message unmodified.

Mail User Agent

A mail user agent (MUA), such as mutt, pine, thunderbird, etc, will need to parse a message and display it to the user. It will also need a mechanism to recognized MIME content and display it to the user or save it to disk, etc.

A MUA would probably want the following features:

  • Ability to only download and decode large attachments at a user's request. (i.e. a dial-up user using IMAP).
  • Ability to display invalid emails. For example, the client should be able to display a message with an invalid Date field. Although this might interfer with sorting by date, the user will be more upset about not being able to read the email at all.
  • Ability to read messages that use LF instead of CRLF as the line terminator

Summary of Features

So, the library needs to:

  • Provide an API for creating MIME messages that does not require the user to have read any RFCs
  • Provide the ability to decode the message lazily
  • Have a permissive parser that attempts to parse invalid email
  • Provide an API for modifying messages that modifies the message as little as possible
  • Allow applications to use as much or little of the MIME stack as they want.

Other Desired Features

The library should also:

  • Support Strings, ByteStrings, etc
  • Support different string encodings (unicode, etc).
  • Have a good test suite
  • Be extensible. It would be nice if support for additional RFCs could be implemented without having to modify the existing libraries.
  • Support non-email usage of MIME. For example, the web uses MIME for things like form encoding, etc.
  • Be modular. It should be possible use portions of the library, such as a Base64 encoder/decoder, or a x-www-form-encoded encoder/decoder in a CGI program without having to jump through lots of hoops.

Other Things to Consider

Code Reuse

There is already a bunch of code around for things like Base64 encoding/decoding. We should attempt to reuse these libraries, rather than reinvent them, if possible. However, I expect that in a number of cases, the existing code will use String, and not be easily extended to use ByteString. So, a rewrite might be easier.

License

This code will be licensed under the BSD3 license (without the advertising clause).