Personal tools

Libraries and tools/MIME Strike Force

From HaskellWiki

< Libraries and tools(Difference between revisions)
Jump to: navigation, search
(Added section on Composing MIME Messages)
 
 
(9 intermediate revisions by 5 users not shown)
Line 1: Line 1:
= MIME Strike Force =
+
The goal of the '''MIME Strike Force''' is to create the one, true MIME library for Haskell. Currently, there are a lot of partial MIME libraries, but nothing really complete.
   
The goal of the MIME Strike Force is to create the one, true MIME library for Haskell. Currently, there are a lot of partial MIME libraries, but nothing really complete.
+
In this document the term MIME includes basic RFC2822 messages.
   
In this document MIME includes basic RFC2822 messages.
+
== Use cases ==
   
= Use Cases =
+
This section describes different tasks the MIME library will be used for, and any special requirements for each usage.
   
This section describes different tasks the MIME library will be used for, and any special requirements of each usage.
+
=== Composing MIME messages ===
 
== Composing MIME Messages ==
 
   
 
The MIME library must provide a set combinators for creating valid MIME messages. The combinators should allow the user to compose any valid MIME message, but restrict the user from creating invalid MIME messages.
 
The MIME library must provide a set combinators for creating valid MIME messages. The combinators should allow the user to compose any valid MIME message, but restrict the user from creating invalid MIME messages.
Line 16: Line 16:
   
 
The code that shows the final, formatted message should be able to terminate the lines with LF or CRLF.
 
The code that shows the final, formatted message should be able to terminate the lines with LF or CRLF.
  +
  +
It might also be nice if the combinators did not require a monadic interface.
  +
  +
=== Modifying MIME messages ===
  +
  +
Some programs, such as a mail transfer agent (MTA), will want to look at only a few select headers, and add or modify a few headers. In this case the MIME library should allow the program to:
  +
  +
* Only parse as much semantic information as is needed. For example, the MTA does not need to decode all the attachments, etc. In fact, a MTA might only care about RFC2822, so it should not have to be forced to deal with the rest of the layers added by additional RFCs.
  +
* Add new headers without modifying any of the formatting of existing headers.
  +
* Modify an existing header, preserving the comments and formatting as much as is sensible
  +
* Process mail messages that contain syntax errors that don't directly interfer with what the program is trying to do. For example, an invalidly formatted Date field should not cause an error, if the program does not examine the Date field.
  +
  +
Most current Haskell MIME parsers have some or all of the following properties:
  +
  +
* The whole message must be parsed, even if most of the information is not used
  +
* They store only the semantic information, so (showMessage . parseMessage) does not produce an output with the same MD5SUM as the input.
  +
* They are too strict about rejecting invalid messages
  +
  +
In addition to an MTA, another program to consider is a email virus checker which will decode the attachments to check for viruses. It will always add a header to indicate the message has been scanned, and occasionally remove an attachment that contained a virus. But, otherwise, it should leave the message unmodified.
  +
  +
=== Mail user agent ===
  +
  +
A mail user agent (MUA), such as mutt, pine, thunderbird, etc, will need to parse a message and display it to the user. It will also need a mechanism to recognized MIME content and display it to the user or save it to disk, etc.
  +
  +
A MUA would probably want the following features:
  +
  +
* Ability to only download and decode large attachments at a user's request. (i.e. a dial-up user using IMAP).
  +
* Ability to display invalid emails. For example, the client should be able to display a message with an invalid Date field. Although this might interfer with sorting by date, the user will be more upset about not being able to read the email at all.
  +
* Ability to read messages that use LF instead of CRLF as the line terminator
  +
  +
== Summary of features ==
  +
  +
So, the library needs to:
  +
  +
* Provide an API for creating MIME messages that does not require the user to have read any RFCs
  +
* Provide the ability to decode the message lazily
  +
* Have a permissive parser that attempts to parse invalid email
  +
* Provide an API for modifying messages that modifies the message as little as possible
  +
* Allow applications to use as much or little of the MIME stack as they want.
  +
  +
== Other desired features ==
  +
  +
The library should also:
  +
  +
* Support Strings, ByteStrings, etc
  +
* Support different string encodings (unicode, etc).
  +
* Have a good test suite
  +
* Be extensible. It would be nice if support for additional RFCs could be implemented without having to modify the existing libraries.
  +
* Support non-email usage of MIME. For example, the web uses MIME for things like form encoding, etc.
  +
* Be modular. It should be possible use portions of the library, such as a Base64 encoder/decoder, or a x-www-form-encoded encoder/decoder in a CGI program without having to jump through lots of hoops.
  +
  +
== Other things to consider ==
  +
  +
=== Code re-use ===
  +
  +
There is already a bunch of code around for things like Base64 encoding/decoding. We should attempt to reuse these libraries, rather than reinvent them, if possible. However, I expect that in a number of cases, the existing code will use String, and not be easily extended to use ByteString. So, a rewrite might be easier.
  +
  +
==== Existing code ====
  +
  +
* [http://darcs.haskell.org/packages/cgi/Network/CGI/Multipart.hs Network.CGI.Multipart]
  +
* [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/mime-string-0.1 mime-string], strong in parsing, weak in flattening back to a string again
  +
* [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/base64-string-0.1 base64-string]
  +
* [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/iconv-0.2 iconv], for converting between different character sets
  +
* [http://www.n-heptane.com/nhlab/repos/haskell-mime haskell-mime], some parsec parsers for various RFCs, and the beginnings of a combinator library for composing messages. Neither fully meets the goals of this project.
  +
  +
=== License ===
  +
  +
This code will be licensed under the BSD3 license (without the advertising clause).
  +
  +
Note this implies the code should '''''not''''' be posted on this wiki as it would contravene the [[HaskellWiki:Copyrights | site copyright]].

Latest revision as of 16:45, 20 March 2007

The goal of the MIME Strike Force is to create the one, true MIME library for Haskell. Currently, there are a lot of partial MIME libraries, but nothing really complete.

In this document the term MIME includes basic RFC2822 messages.

Contents

[edit] 1 Use cases

This section describes different tasks the MIME library will be used for, and any special requirements for each usage.

[edit] 1.1 Composing MIME messages

The MIME library must provide a set combinators for creating valid MIME messages. The combinators should allow the user to compose any valid MIME message, but restrict the user from creating invalid MIME messages.

Error conditions, such as missing required header fields (orig-date, originator, etc), should ideally be checked via the type-system at compile time.

Formatting issues, like line-length limitations, string encoding, etc, should be handled transparently at run-time.

The code that shows the final, formatted message should be able to terminate the lines with LF or CRLF.

It might also be nice if the combinators did not require a monadic interface.

[edit] 1.2 Modifying MIME messages

Some programs, such as a mail transfer agent (MTA), will want to look at only a few select headers, and add or modify a few headers. In this case the MIME library should allow the program to:

  • Only parse as much semantic information as is needed. For example, the MTA does not need to decode all the attachments, etc. In fact, a MTA might only care about RFC2822, so it should not have to be forced to deal with the rest of the layers added by additional RFCs.
  • Add new headers without modifying any of the formatting of existing headers.
  • Modify an existing header, preserving the comments and formatting as much as is sensible
  • Process mail messages that contain syntax errors that don't directly interfer with what the program is trying to do. For example, an invalidly formatted Date field should not cause an error, if the program does not examine the Date field.

Most current Haskell MIME parsers have some or all of the following properties:

  • The whole message must be parsed, even if most of the information is not used
  • They store only the semantic information, so (showMessage . parseMessage) does not produce an output with the same MD5SUM as the input.
  • They are too strict about rejecting invalid messages

In addition to an MTA, another program to consider is a email virus checker which will decode the attachments to check for viruses. It will always add a header to indicate the message has been scanned, and occasionally remove an attachment that contained a virus. But, otherwise, it should leave the message unmodified.

[edit] 1.3 Mail user agent

A mail user agent (MUA), such as mutt, pine, thunderbird, etc, will need to parse a message and display it to the user. It will also need a mechanism to recognized MIME content and display it to the user or save it to disk, etc.

A MUA would probably want the following features:

  • Ability to only download and decode large attachments at a user's request. (i.e. a dial-up user using IMAP).
  • Ability to display invalid emails. For example, the client should be able to display a message with an invalid Date field. Although this might interfer with sorting by date, the user will be more upset about not being able to read the email at all.
  • Ability to read messages that use LF instead of CRLF as the line terminator

[edit] 2 Summary of features

So, the library needs to:

  • Provide an API for creating MIME messages that does not require the user to have read any RFCs
  • Provide the ability to decode the message lazily
  • Have a permissive parser that attempts to parse invalid email
  • Provide an API for modifying messages that modifies the message as little as possible
  • Allow applications to use as much or little of the MIME stack as they want.

[edit] 3 Other desired features

The library should also:

  • Support Strings, ByteStrings, etc
  • Support different string encodings (unicode, etc).
  • Have a good test suite
  • Be extensible. It would be nice if support for additional RFCs could be implemented without having to modify the existing libraries.
  • Support non-email usage of MIME. For example, the web uses MIME for things like form encoding, etc.
  • Be modular. It should be possible use portions of the library, such as a Base64 encoder/decoder, or a x-www-form-encoded encoder/decoder in a CGI program without having to jump through lots of hoops.

[edit] 4 Other things to consider

[edit] 4.1 Code re-use

There is already a bunch of code around for things like Base64 encoding/decoding. We should attempt to reuse these libraries, rather than reinvent them, if possible. However, I expect that in a number of cases, the existing code will use String, and not be easily extended to use ByteString. So, a rewrite might be easier.

[edit] 4.1.1 Existing code

  • Network.CGI.Multipart
  • mime-string, strong in parsing, weak in flattening back to a string again
  • base64-string
  • iconv, for converting between different character sets
  • haskell-mime, some parsec parsers for various RFCs, and the beginnings of a combinator library for composing messages. Neither fully meets the goals of this project.

[edit] 4.2 License

This code will be licensed under the BSD3 license (without the advertising clause).

Note this implies the code should not be posted on this wiki as it would contravene the site copyright.