Discussion:
MIME handling
Alex Morega
2010-04-03 21:00:10 UTC
Permalink
Hello,

I've been looking at the MIME parser class, adding tests and fixing some bugs. Here are some thoughts/ideas.

Taking inspiration from Python's email module, we could have three separate entities: a MIME parser, a MIME generator, and an in-memory representation of messages. The parser produces an event stream which can get turned into an in-memory message (with attachments sent directly to disk); the generator works in reverse.

I've noticed some other pars of the code that claim to work with MIME documents. Also, many classes that are not included in the LetterBox framework (but are in the repository anyway). Are they used anywhere?

Finally, the MIME parsing will need to be bullet-proof, since it handles non-trusted data. This means at least having well-defined error paths. Some fuzz testing would also be useful.

Cheers,
-- Alex
Guy English
2010-04-03 23:35:42 UTC
Permalink
Post by Alex Morega
Taking inspiration from Python's email module, we could have three separate entities: a MIME parser, a MIME generator, and an in-memory representation of messages. The parser produces an event stream which can get turned into an in-memory message (with attachments sent directly to disk); the generator works in reverse.
I wrote the code that’s there now. It’s basically just at a simple parser stage. Yes - we need a MIME generator. The class and file should be renamed to be more parser specific.
Post by Alex Morega
I've noticed some other pars of the code that claim to work with MIME documents. Also, many classes that are not included in the LetterBox framework (but are in the repository anyway). Are they used anywhere?
There is older code that isn’t included in LetterBox that was grandfathered in from the code base Gus started with. None of the older MIME code is used and it’s not included in the project build. Those classes and files should ultimately be removed and the newer stuff renamed to make more sense. It hasn’t been done yet because there were already annoying merge issues getting the newer MIME stuff into the main branch and it wasn’t clear where or what we wanted to do so having the older stuff around as a guide for a bit seemed sensible.
Post by Alex Morega
Finally, the MIME parsing will need to be bullet-proof, since it handles non-trusted data. This means at least having well-defined error paths. Some fuzz testing would also be useful.
I agree completely.

- Guy
Alex Morega
2010-04-04 18:30:45 UTC
Permalink
Post by Guy English
Post by Alex Morega
Taking inspiration from Python's email module, we could have three separate entities: a MIME parser, a MIME generator, and an in-memory representation of messages. The parser produces an event stream which can get turned into an in-memory message (with attachments sent directly to disk); the generator works in reverse.
I wrote the code that’s there now. It’s basically just at a simple parser stage. Yes - we need a MIME generator. The class and file should be renamed to be more parser specific.
I've split out a MIME message class from the parser, and merged LBMIMEPart with LBMIMEMultipartMessage to create LBMIMEMessage.

http://github.com/alex-morega/letters/commit/52754746dcf93417503e60541bd8e07e4e11bac1

Next I'll add more tests and make the parser more resilient to errors. What do you think?

Cheers,
-- Alex
Alex Morega
2010-04-10 19:08:32 UTC
Permalink
Post by Alex Morega
Next I'll add more tests and make the parser more resilient to errors.
Done that, and also removed the call to "LBMIMEStringByDecodingEncodedWord" - it seems that this should be called later, when figuring out what to do with a parsed message. I've sent a pull request if nobody has any objections.

Cheers,
-- Alex
Alex Morega
2010-04-11 22:44:09 UTC
Permalink
Post by Alex Morega
Post by Alex Morega
Next I'll add more tests and make the parser more resilient to errors.
Done that, and also removed the call to "LBMIMEStringByDecodingEncodedWord" - it seems that this should be called later, when figuring out what to do with a parsed message. I've sent a pull request if nobody has any objections.
Since nobody is complaining, I've messed with the MIME code some more. :) Mostly cleanup, some more tests, a bit of documentation. The parser no longer attempts to decode payloads, and a message will only perform content-transfer decoding, returning an NSData object. Charset decoding is up to the user.

Next I'll have a go at writing the MIME generator (the reverse of LBMIMEParser). Afterwards, some tests for encoded header values (RFC 2231) - there seems to be some code for that in LBMIMEParser, but we'll see.

So far I've been using the Python email module as a reference, except when it does weird things, presumably for backwards compatibility.

Cheers,
-- Alex

Loading...