Discussion:
LetterBox testing framework
Duncan Wilcox
2010-01-25 22:27:07 UTC
Permalink
What we have is really two projects. The main app and UI is very much in flux, might feature unicorns and/or ponies, while the IMAP library, LetterBox, has known functionality and less known complexity (but is hard to do).

The good thing about this is that LetterBox is already available in source and hackable (http://github.com/ccgus/letters). It needs to be made rock solid and, given the nature of an open source project, resilient to changes from people who are just jumping in for a quick fix.

LetterBox needs to talk to wild IMAP implementations, but still fulfill a well defined contract, not unlike what WebKit does with funky web servers and HTML.

Having done a couple minor contributions to WebKit, I believe a strong aspect of their development process is their reliance on a large regression testing suite.

I last touched it a couple years ago, but what they basically have is a wrapper around the rendering engine that reads a funky HTML/CSS/JS snippet and dumps the full rendering tree with coordinates and contents, or optionally a bitmap. A matching rendering tree or bitmap (they do image diffs) is a pass, otherwise a fail.

Constraining checkins to passing tests, and writing tests for every bug and every feature added, is a reasonable way of ensuring you only ever make progress. I'd say that for WebKit it shows.

So this got me thinking what could be the equivalent of the WebKit tests for an IMAP client. We can't run a local instance of every version of every IMAP server, if nothing else because gmail or mobileme's IMAP servers aren't available.

What we can do though is capture a session that completed successfully and turn it into a dummy server that replies to the requests with the recorded replies.

What the test would do is exercise the semantics offered by LetterBox, so stuff like:

- login
- enumerate folders
- scan folders and get messages sizes and flags
- download select headers, all headers, body
- create folder
- store message
- copy messages
- move messages

Clearly running this against a new real IMAP server might and will fail, so assuming it finally runs fine (and doesn't run other tests) the session can be captured and turned into a fake server. This will work because the fake replies will always report the same information, and the test will always perform the same commands. Clearly a radical change in how the client works will make the patterns change, and will require a re-capture.

Capturing needs not only do basics but also push the limits of both client and server, so it would include things like:

- creating very long folder names
- creating folder names with unicode characters in them
- create deep folder nesting

Perhaps capturing could be entirely automatic, it creates its own test folders and message structure within an empty IMAP account, or perhaps it could require manually setting up the account contents.

libEtPan has the ability to trace a connection (mailstream_debug = 0 in mailstream_log.c will create a libetpan-stream-debug.log), a quick and dirty python script can turn that into a dummy IMAP server python app.

There are other things that need to be tested, like memory use with a 100k message folder or concurrent access through multiple connections, but that could be for 1.1 of the test suite.

And this is just the IMAP part (though arguably SMTP is a bit simpler).

Thoughts?

Duncan
Gus Mueller
2010-01-25 22:39:12 UTC
Permalink
On Jan 25, 2010, at 2:27 PM, Duncan Wilcox wrote:
<snip>
Post by Duncan Wilcox
Perhaps capturing could be entirely automatic, it creates its own test folders and message structure within an empty IMAP account, or perhaps it could require manually setting up the account contents.
libEtPan has the ability to trace a connection (mailstream_debug = 0 in mailstream_log.c will create a libetpan-stream-debug.log), a quick and dirty python script can turn that into a dummy IMAP server python app.
<snip>

Going with python would be a good idea, the only problem I think we'd run into is probably the funky unicode stuff… too bad python3k isn't a default install.

I'd imagine a test would go something like (pretend this is objc code):

test start
pick a python script (which basically opens up a port and replies a set of IMAP instructions)
connect to that server, do stuff, test, do stuff, test.
kill python script (and it shuts down the port) from the objc side.
tada.

-gus

--

August 'Gus' Mueller
Flying Meat Inc.
http://flyingmeat.com/
Duncan Wilcox
2010-01-25 22:48:35 UTC
Permalink
Post by Gus Mueller
Going with python would be a good idea, the only problem I think we'd run into is probably the funky unicode stuff… too bad python3k isn't a default install.
Does unicode stuff travel unencoded over IMAP? Anyway the script would only really need to play back the exact bytes it recorded, without really needing to know what it is. The test client would check consistency/sanity, the first time around (on capture) to make sure everything it working properly, after that to make sure something hasn't changed inside LetterBox that breaks.
Yup.

Duncan
Gus Mueller
2010-01-25 22:54:15 UTC
Permalink
Post by Duncan Wilcox
Post by Gus Mueller
Going with python would be a good idea, the only problem I think we'd run into is probably the funky unicode stuff… too bad python3k isn't a default install.
Does unicode stuff travel unencoded over IMAP? Anyway the script would only really need to play back the exact bytes it recorded, without really needing to know what it is. The test client would check consistency/sanity, the first time around (on capture) to make sure everything it working properly, after that to make sure something hasn't changed inside LetterBox that breaks.
Actually, it would probably be really simple for an objc class to do this instead of python. Just a simple script with stuff like <waitforreply> in there or something…

Of course, how do we test that we're sending the right commands? Maybe that could be part of the script too…

-gus

--

August 'Gus' Mueller
Flying Meat Inc.
http://flyingmeat.com/
bear
2010-01-25 22:59:05 UTC
Permalink
Post by Duncan Wilcox
Post by Gus Mueller
Going with python would be a good idea, the only problem I think we'd run into is probably the funky unicode stuff… too bad python3k isn't a default install.
Does unicode stuff travel unencoded over IMAP? Anyway the script would only really need to play back the exact bytes it recorded, without really needing to know what it is. The test client would check consistency/sanity, the first time around (on capture) to make sure everything it working properly, after that to make sure something hasn't changed inside LetterBox that breaks.
Unicode is not an issue from the testing point of view for Python 2.5+
as most likely you are going to be sending/receiving data pulled in
from disk and only updated with some dynamic information. Unicode in
Python 2.5 just requires some thought, it's not rocket science.
Actually, it would probably be really simple for an objc class to do this instead of python.  Just a simple script with stuff like <waitforreply> in there or something…
Of course, how do we test that we're sending the right commands?  Maybe that could be part of the script too…
The reason Python is a great choice is that it can be run from the
command line on any computer and doesn't require a Mac.

The collecting of what is considered good and bad responses/behaviour
and coding a proxy to emulate that is the larger consideration,
regardless of the implementation language.

It may be easier to create VM images of known servers and code some
Python to "drive" them in a scripted manner and record how the client
library responds.
--
Bear

***@gmail.com (xmpp, email)
***@code-bear.com (xmpp, email)
http://code-bear.com/bearlog (weblog)

PGP Fingerprint = 9996 719F 973D B11B E111 D770 9331 E822 40B3 CD29
Duncan Wilcox
2010-01-25 23:02:24 UTC
Permalink
Post by bear
It may be easier to create VM images of known servers and code some
Python to "drive" them in a scripted manner and record how the client
library responds.
You can easily save the transcripts of tens or hundreds of IMAP servers, and their configuration variants, and their different versions.

It's not as practical for VMs, and it's impossible for popular but funky IMAP implementations, like gmail, mobileme, yahoo, hotmail, etc.

Duncan
bear
2010-01-25 23:08:29 UTC
Permalink
Post by Duncan Wilcox
Post by bear
It may be easier to create VM images of known servers and code some
Python to "drive" them in a scripted manner and record how the client
library responds.
You can easily save the transcripts of tens or hundreds of IMAP servers, and their configuration variants, and their different versions.
It's not as practical for VMs, and it's impossible for popular but funky IMAP implementations, like gmail, mobileme, yahoo, hotmail, etc.
Then your talking about creating a Python version of netcat to allow
for scripted playback of call-response

sounds doable, and your right - VM's would be overkill - just a
directory of call-response data per server really. (good and bad)
--
Bear

bear42-***@public.gmane.org (xmpp, email)
bear-***@public.gmane.org (xmpp, email)
http://code-bear.com/bearlog (weblog)

PGP Fingerprint = 9996 719F 973D B11B E111 D770 9331 E822 40B3 CD29
John C. Welch
2010-01-25 23:12:02 UTC
Permalink
Post by bear
It may be easier to create VM images of known servers and code some
Python to
Post by bear
"drive" them in a scripted manner and record how the client
library responds.

YES + ELEVENTYBILLION!

Having a raft of Server VMs is veryveryvery VERY important here. IMAP is a
really strange protocol.
--
John C. Welch Writer/Analyst
Bynkii.com Mac and other opinions
jwelch-PGqpvYsgDA3QT0dZR+***@public.gmane.org
Caio Chassot
2010-01-26 07:17:15 UTC
Permalink
Post by bear
Post by bear
It may be easier to create VM images of known servers and code some
Python to
Post by bear
"drive" them in a scripted manner and record how the client
library responds.
YES + ELEVENTYBILLION!
Having a raft of Server VMs is veryveryvery VERY important here. IMAP is a
really strange protocol.
I'm glad we have full agreement on something.
Duncan Wilcox
2010-01-25 23:00:04 UTC
Permalink
Post by Gus Mueller
Actually, it would probably be really simple for an objc class to do this instead of python. Just a simple script with stuff like <waitforreply> in there or something…
Yeah well I guess my idea is the log turns into a python IMAP server. Generating an ObjC fragment can also be done but what would you do then? Shoehorn it on the back of libEtPan?
Post by Gus Mueller
Of course, how do we test that we're sending the right commands? Maybe that could be part of the script too…
Well that's part of the initial capture phase. The test app, a LetterBox client, reads/writes to the IMAP server and makes sure everything is consistent. Human checked, but saved as an IMAP server script for posterity.

Duncan
Jonathan Wight
2010-01-25 23:04:26 UTC
Permalink
Post by Gus Mueller
Post by Duncan Wilcox
Post by Gus Mueller
Going with python would be a good idea, the only problem I think we'd run into is probably the funky unicode stuff… too bad python3k isn't a default install.
Does unicode stuff travel unencoded over IMAP? Anyway the script would only really need to play back the exact bytes it recorded, without really needing to know what it is. The test client would check consistency/sanity, the first time around (on capture) to make sure everything it working properly, after that to make sure something hasn't changed inside LetterBox that breaks.
Actually, it would probably be really simple for an objc class to do this instead of python. Just a simple script with stuff like <waitforreply> in there or something…
Of course, how do we test that we're sending the right commands? Maybe that could be part of the script too…
You've really just invented netcat | expect. :-)

What I'd think might be useful is a menu option in the app itself to "Save Anonymised IMAP Log" that could be then turned into test scripts.

Letters.app is going to have to work with a LOT of weird servers (as Duncan points out) and there's no way the developers will be able to write test scripts for all of them. Getting the users (beta-testers) to submit their own test scripts (via anonymised log files) might be one option.

Also handy to send with crash reports too?

Jon.
Gus Mueller
2010-01-25 23:38:45 UTC
Permalink
Post by Jonathan Wight
You've really just invented netcat | expect. :-)
Yea, I was actually thinking that…
Post by Jonathan Wight
Letters.app is going to have to work with a LOT of weird servers (as Duncan points out) and there's no way the developers will be able to write test scripts for all of them. Getting the users (beta-testers) to submit their own test scripts (via anonymised log files) might be one option.
This will probably be the best way.
Post by Jonathan Wight
Also handy to send with crash reports too?
I'll make sure they all go to JG.

-gus

--

August 'Gus' Mueller
Flying Meat Inc.
http://flyingmeat.com/
bear
2010-01-25 23:49:34 UTC
Permalink
Post by Gus Mueller
Post by Jonathan Wight
You've really just invented netcat | expect. :-)
Yea, I was actually thinking that…
Post by Jonathan Wight
Letters.app is going to have to work with a LOT of weird servers (as Duncan points out) and there's no way the developers will be able to write test scripts for all of them. Getting the users (beta-testers) to submit their own test scripts (via anonymised log files) might be one option.
This will probably be the best way.
Anonymizing mail (and other pim related data) was something we tried
to handle well during my time at OSAF for the Chandler Project. IIRC
we created some code to strip out anything that looked personal and
replace it with random data but yet left the required linkages in
place.

Let me look in my backups to see if I can find that code.

Worse case you could replace email ids with a lookup-table driven
random result and all messag bodies with Lorem-Ipsum generated text.
--
Bear

***@gmail.com (xmpp, email)
***@code-bear.com (xmpp, email)
http://code-bear.com/bearlog (weblog)

PGP Fingerprint = 9996 719F 973D B11B E111 D770 9331 E822 40B3 CD29
Caio Chassot
2010-01-26 07:19:41 UTC
Permalink
Post by Gus Mueller
Post by Jonathan Wight
Letters.app is going to have to work with a LOT of weird servers (as Duncan points out) and there's no way the developers will be able to write test scripts for all of them. Getting the users (beta-testers) to submit their own test scripts (via anonymised log files) might be one option.
This will probably be the best way.
I guess we're already planning to have an Activity Viewer which in at least one mode just dumps the plain text IMAP session for inspection, right? If only because we'll need it during development. (Yes, there's always `tail`.)
Post by Gus Mueller
From that window it would make sense to provide an anonymized export option.
Dan Callahan
2010-01-25 23:03:40 UTC
Permalink
Post by Gus Mueller
Going with python would be a good idea, the only problem I think we'd run into is probably the funky unicode stuff… too bad python3k isn't a default install.
I wouldn't be surprised if py3k made it into 10.7, and until then, I
don't think it's too unreasonable to ask developers to install
utilities to aid in testing. Python.org even provides a nice installer
in a disk image.

The real barrier, I think, would be language familiarity amongst
Letters' dev community. I can certainly do Python, but I've yet to
write Hello World in ObjC.

-Dan
Loading...