
This became ridiculously handy for things like signing up for websites that send you one confirmation email, then save or sell or spam your email address forever. And of course, it *is* very handy for users. But think about it from mailinator's side. Its basically signing up to receive spam for that address forever. That's a tall order and one that seems to have the possibility of a terrible demise. Someday, enough email could come in that will simply smush Mailinator. But, as of this writing, that day isn't today.
I have in that 3.5 years received hundreds of "thank you" emails, a pile of "it doesn't work" emails, a radio interview, articles in the Washington Post, New York Times, and Delta Skymiles magazine, 1 call from both Scotland Yard and the LAPD, and a total of 4 subpoenas (1 of those being a Federal Grand Jury subpoena issued by the FBI).
At this point, Mailinator averages approximately 2.5million emails per day. I have seen hourly spikes that would result in about 5million in a day. (Edit: Feb 2007 - One month later we're averaging 4.5million emails a day with spikes over 6million) In addition, the system also services several thousand web users and several thousand RSS users per day.

In the world of email services, this probably isn't all that much. The most interesting part to me is that the complete set of hardware that mailinator uses is one little server. Just one. A very modest machine with an AMD 2Ghz Athlon processor, 1G of ram (although it really doesn't need that much), and a boring IDE, 80G hard drive (Check ServerBeach's Category 1 Powerline 2100 for the exact specs). And honestly, its really not very busy at all. I've read the blogs of some copycat services of Mailinator where their owners were upgrading their servers to some big iron. This was really the impetus for me writing down this document - to share a different point of view.
Mailinator easily handling a few million emails a day wasn't always the case. The initial mailinator system was quite busy. And in fact, got overwhelmed about a year ago when email traffic started topping 800,000 a day (that's my recollection anyway). In an effort to squeeze life out of the
server
If you don't know what Mailinator is, take a small tour through the (rather funny) FAQ.
Lossy lossie lossee
There is a very important point to note about the Mailinator service. And that is, that it is indeed - free. Although it might not seem like it, it has an immense impact on the design (as you'll see). This allowed me to favor performance across the board of the design. This fact influenced decisions from how I dealt with detecting spam all the way down to how I synchronized some code blocks. No kidding.
The basic tenet is that I do not have to provide perfect service. In order to do that, my hardware requirements would be much higher. Now that would all be fine and dandy if people were paying for the service. I could then provide support and guarantees. But given its free I instead went for, in order, these two design decisions:
1) Design a system that values survival above all else even users (as of course, if its down, users aren't really getting much out of it)
2) Provide 99.99% uptime and accuracy for users.
If you wonder what I mean about "survival" in the first line, it basically means that Mailinator is attacked on literally a daily basis. I wanted to make a system that could survive the large majority of those attacks. Note - I'm not interested in it surviving all of them. Because again, if some zombie network decided to Denial-of-service me - I really have no chance of thwarting it without some serious hardware. The good news is that if someone goes to all the trouble of smashing Mailinator (again referencing the fact we're lossy), I really don't lose much sleep over it. It sucks for my users - but there really isn't anything I can do anyway. I'm not trying to be cavalier about this - I went to great lengths to handle attacks, I'm just saying its a cold reality that I simply cannot stop them all. Thus I accept them as part of the game.
The platform
The original Mailinator used a relatively standard
unix
Sendmail --> disk --> Mailinator <-- Tomcat Servlet Engine
The Java based mailinator app then grabbed the emails using IMAP and/or POP (it changed over time) and deleted them. I should have used an mbox interface but I never got around to implementing that. The system then loaded all emails into memory and let them sit there. Mailinator only allowed about 20000 emails to reside in memory at once. So when a new one came, the oldest one got pushed out.
The FAQ advertises that emails stick around for "a couple of hours." And that was true, but exactly how long mattered on the rate of incoming emails. You'll also note an interesting side effect that since all emails lived in memory, if the server came down - all emails were lost! Talk about exploiting the fact that my service was free huh? This may seem dubious but the code was really quite stable and ran for weeks and months without downtime.
I thought about saving emails into a database of course but honestly, all this bought me was emails that stuck around longer. And, that in and of itself sort of went against my intent for mailinator. The ideas was, sign-up for something, goto Mailinator, click the link, and forget about it. If you want a
mailbox
where emails last a few days, thats fine, but there are many other alternatives out there - that's not what Mailinator is about. I forgot the database idea and now shoot for mails that last somewhere around 3-4 hours.
This all worked fabulously for awhile. It pretty much filled up all 1G of ram of the server. Finally when the incoming email rate started surpassing 800,000 a day, the system started to break down. I believe it was primarily the disk contention between unix mail apps and the Java app locking mailboxes. Regardless, there were many issues with that system that bugged me for a long time. The root of most of those problems really boiled down to one thing - the disk. The disk activity of sendmail, procmail, logging and whatever else was a silly bottleneck. And it needed to go.
More than a year ago now I did a full rewrite. Much of the anti-spam code that I'll describe later was already in this code-base but was improved and extended for the new system.
Synchronous vs. Asynchronous I/O
I've read a fair number of articles on the wonders of asynchronous I/O (java's NIO library). I don't doubt them but I decided against using it. Primarily, again, because I did a great deal of work in multithreaded environments and knew that area well. I figured if I had performance issues later, I could always switch over to NIO as a learning experience.
The biggest thing I knew I needed to do with Mailinator was to remove the unix application components. Mailinator needed to stop outsourcing its email receipt and do it itself. This basically meant I needed to write my own SMTP server. Or at least, a subset of one. Firstly, Mailinator has never had the ability to send email so I didn't need to code that part up. Second, I had really different needs for receiving email. I wanted to get it as fast as possible -or- refuse it as fast as possible.
SMTP has a rich dialog for errors but I chose to only support one error message. And that error is, appropriately enough - "User Unknown". That's a touch ironic since Mailinator accepts any user at all. Simply said, if you do anything that the Mailinator server doesn't like - you'll get a user unknown error. Even if you haven't sent it the username yet.
I looked at Apache James as a base which is a pure java SMTP server but it was way too comprehensive for my needs. I really just found some code examples and the SMTP specs and wrote things basically from scratch. From there, I was able to get an email, parse it, and put it right into memory. This bypassed the old system's step of writing it to disk all the way. From wire to user, mailinator mail never touches the disk. In fact, the Mailinator server's
disk
Now to address persistence concerns right away - Mailinator doesn't run diskless, but it does run very asynchronously with regards to the disk. Emails are not written to disk EVER unless the system is coming down and is instructed to write them first (so it can reload them upon reboot). This little fact has been very handy when I've been subpoenaed. I simply do not have access to any emails that were sent to Mailinator in the past. If it is possible that I can get an email - so can you just by checking that inbox. If you can't get it then that means its long deleted from memory and nothing is going to get it back.
Mailinator also used to do logging (again, shut-off because of pesky subpoenas). But it did it very "batchy". It wrote several thousand logs lines to memory before doing one disk write. In effect we never want to have contention based on the incredibly slow disk.
Now if this all sounds a bit shaky, as in we might just lose an email now and then - you're right. But remember, our goal is 99.99% accuracy. Not 100%. That's an important distinction. The latest incarnation of Mailinator literally runs for months unattended. We do lose emails once in awhile - but its rare and usually involves a server crash. We accept the loss and by far most users never encounter it.
Emails
The system now is one unit. The web application, the email server, and all email storage run in one JVM.

The system uses under 300 threads. I can increase that number but haven't seen a need as of yet. When an email arrives (or attempts to arrive) it must pass a strong set of filters that are described below. If it gets past those filters it is then stored in memory - however, it is first compressed to save in-memory space. Over 99% of emails that arrive are never looked at, so we only ever decompress an email if someone actually "looks" at them.
Because of this, I am able to store many more emails than the original system's 20000. The current mailinator stores about 80000 emails and uses under 300M or ram. I probably should increase this number as plenty of ram is just sitting around. The average email lifespan is about 3-4 hours with this pool. The amount of incoming email has gone way up, so even by increasing this pool, we're largely staying steady as far as email lifespan. I could probably kick that up to 200,000 or so and increase the lifespan accordingly but I haven't seen a great need yet.
Another inherent limit that the system imposes is on mailboxes themselves. Popular mailboxes such as joe@mailinator.com and bob@mailinator.com get much more email than average. Every inbox is limited to only 10 emails. Thus, popular boxes inherently limit themselves on the amount of email they can occupy in the pool. Use of popular inboxes is discouraged anyway and generally become the creme de la cesspool of spam.
Two more
memory
Spam and Survival
I'd like to emphasize here that Mailinator's mission is NOT to filter spam. If you want penis enlargement or sheep-of-the-month club emails, that's pretty much what Mailinator is good for. We are clear in the FAQ. Mailinator provides pretty good anonymity - but we do NOT guarantee it. We also do NOT guarantee ANY privacy. Its really easier that way for us. Still, it does a pretty damn good job even so. We might log you (used to and it might get turned on again someday, never know) and we DO respond to subpoenas (that whole "jail" thing is a strong motivator).
So, in essence I have no real interest in filtering out spam. I do however, have a great deal of interest in keeping Mailinator alive. And spammers have this nasty habit of sending Mailinator so much crap that this can be an issue. So - Mailinator has a simple rule. If you do anything (spammer or not) that starts affecting the system - your emails will be refused and you may be locked out.
In the new system I created a data structure I call an AgingHashmap. It is, as it indicates a hashmap (String->int) that has elements that "age".
The first type of spammer I encountered was one machine blasting me with thousands of emails. So, now, every time an email arrives, its senders IP is put into an AgingHashmap with a counter of 1. If that IP does not send us anymore email for (let's say) a minute, then that entry automatically leaves the AgingHashmap. But, let's say that IP address sends us another email 2 seconds later. We then find the first entry in the AgingHashmap and increase that counter to 2. If we see another email from that IP, it goes to 3 and so on. Eventually, when that counter reaches some threshold we ban all emails from that IP for some amount of time.
We can put this in words as so (values are examples):
If any IP address sends us 20 emails in 2 minutes, we will ban all email from that IP address for 5 minutes. Or more precisely, we will ban all email from that IP until it stops trying to send to us for at least 5 minutes.
This is really what the AgingHashmap is good for. We can setup some parameters and detect frequency of some input, then cause a ban on that input. If some IP address sends us email every second for 100 days straight, we'll ban (or throw away) every last email after the first 20.
Here's a graph of an average 24 hours of banned IP address emails. Notice at 10am and 11am some joker (i.e., some single IP address) sent us over 19000 emails per hour.

I do have some code that has Java talk back to unix's iptables system to do very hard blocking of IP addresses but its not on right now. Partially because there's no need (yet) and partially because I like to see the stats.
The funny part of this is the error Mailinator gives. Remember the "User Unknown"? Once an IP address is banned and then it tries to open a new connection it will send the SMTP greeting of "HELO". Mailinator will then reply "User Unknown" and close the connection. Of course, it didn't even get the username yet.
Zombies
The next problem came from zombie
networks
Here's a similar graph. Keep in mind these emails got past the IP filter - so basically they are "same subject" emails from many disparate sources.

You could argue we should ban them forever, but then we'd have to keep track of them and the Mailinator system is inherently transient. Forgetting is core to what it does. This blocking is more expensive than IPs as comparing subjects can be costly. And of course, we have to have enough of a conversation with the sending server to actually get the subject.
Pottymouth!
Finally, we ran into some issues on emails that just weren't cool. As I said, I'm far more interested in keeping Mailinator alive than blocking out your favorite porn newsletter. But, some unhappy people used Mailinator for some really not happy purposes. Simply put, as a last layer, subjects are searched for words that indicate hate or crimes or just downright nastiness.
Boing
Another major influx that happened early on was a plethora of bounce messages. Now thats sort of odd isn't it? I mean Mailinator doesn't send email. In fact, it CAN'T send email so how could it get bounce messages? Well, some spammy type folks thought it'd be neat to send out spam from their servers using forged Mailinator addresses as a return address. Thus when those emails bounced, the bounce came here.
What's worse, is I still get email from people who think Mailinator sent them
spam
The good news is that bounces are very easy to detect, and are really the first line of our defense. Bouncing SMTP servers aren't particularly evil, they're just doing their job so when I say "user unknown" they believe me and go away.
On an abstract level, here is what happens to an email as it enters the system.

(and to be fair, there might just be another layer or two thats not on that diagram!)
Anti-Spam revolt
There are 2 more, somewhat conflicting features of the Mailinator server that should be noted. For one, its a clear fact that when we're busy, we're busy. An easy DoS against us would be to open a socket to our server and leave it open. This is an inherent vulnerability in any server (maybe especially multithreaded servers). So, as a basic idea Mailinator closes all connections if they are silent for more than a second or two. Actually, the amount of time is variable (read below). Clearly, we are DoS'able by sending us many many connections, but this blocks at least one trivial way of bringing us down.
Secondly, although we demand servers talking to us are very speedy. We reserve the right to be very NOT speedy. Here's the logic. When Mailinator is not terribly busy, we still demand responses quickly, but we give responses slowly. In fact, the less busy we are, the slower we give responses. It is possible that sending an email into the Mailinator SMTP server could take a very long time (like 10 or 20 or 30 seconds) even for a very small amount of data.
Why? Well.. think about it. Let's say you're spamming. You want to send out a zillion emails as fast as possible. You want every receiving SMTP server to get your email, deliver it to the poor sod who wants (or doesn't want) weener enlargement and then close the connection so you can go on to the next. If you encounter some darn SMTP server that takes 20 seconds to receive your email, the speed at which you can send out your emails diminishes. You might just even think about avoiding such SMTP servers.
It might be a pipe dream to think this is slowing down any spammers, but this does tend to keep my quieter times lasting longer. And it doesn't really hurt me - or my users. And if we eventually get terribly busy, those delays are scaled down to make sure we don't lose any emails.
Sites will ban it
Every time I read some comment about Mailinator, someone always points out something like "Yeah, well sites will start banning any email from Mailinator and then it will be worthless". Guys. Its been 3 years. A handful of sites have indeed blocked email from Mailinator, but my user base and the number of read emails has only gone up. Clearly, people are finding Mailinator more useful than ever.
I have added at times additional domains (like sogetthis.com and fakeinformation.com) that point to mailinator. Often if a site bans mailinator.com proper, you can use one of those to same effect.
Overall
Many copycat sites have appeared over the years which is pretty reasonable. This idea itself is obvious. The only real hurdle was that it seemed impossible to do given the amount of useless email you'd get. But the copycats had the advantage of seeing that Mailinator actually does work, so they knew what to shoot for. Only a few post their daily email numbers but I've yet to see any that come close to mailinator's incoming email (not that this is necessarily a good thing). I also see that many are using an architecture similar to Mailinator's original which is just fine so long as they either don't get any massive increases in email or are happy to keep buying bigger hardware.
Overall, Mailinator has been a great experience. It was a terribly fun exercise in optimization, security, and generally making things work. Thousands of people use it everyday and its amazing how many people know about it when it comes up in conversation. I've thought many times about how to make a business around it, and there is always an angle, but I've just been to busy with other things.
My hope is that its useful for you and that you tell your friends.
Note: Eternal thanks to Jack Lawrence of Syracuse, NY who, in a drunken stupor gave me the core idea (story here), Nicci Gabriel of www.sideofsauce.com for the seriously cool web design, and to Brian Pipa of www.candyaddict.com who, as a big fan of Mailinator, added the very cool "Spam Map" and the RSS feeds.
 
 
 
46 comments:
It's really cool reading how you managed to get a service up and running. I really appreciate your philosophy of implementing a free service. Not having to be perfect is great! But, then, being perfect is even greater (and it seems your application was very stable and reliable, indeed). Maybe has something to do with the Japanese concept Kaizen...
I can remember of a functionality that is hosted on a web server. I did it with Java, got it up and running and then did something different for three years or so. Never in this time there was any error or malfunction reported with the web server I setup, or the service I provided with it. This is the power of Java. It's smart and reliable (and at the end, readable, if you look at someone's source code --> what about Perl or the so-beloved Ruby?).
Love reading more from you!
Great article and a great service!! Keep up the great work.
Well, I'd like to add up to the "thank you" messages you talk about. Thank you for mailinator, which I have used regularly for 2 years or so.
Thank you for this post too. I always wondered how it worked and the reading's been really interesting
Fantastic article! I learned a lot about optimization, software development, and design decisions by reading this. Not having to be 100% correct was a great decision and I'm sure that it has made life a lot easier. It has also made me interested in creating a similar service, if only to be a learning experience.
Many thanks for your write-up and your service. You've been invaluable.
How do you pay for the bandwidth? It's gotta be insane! Even if it is text based.
yet another prove that all genius things are simple...
You rock.
'nuff said.
When I saw the blurb at reddit I was already thinking: He must be doing the smart thing and skipping the disk entirely.
Though I submit, I'm impressed. Especially because I thought you wouldn't be able to pull this off without nio.
If you ever run into too many sockets or threads problems, nio can help there. It'll make your app capable of handling waaaaay more sockets, and will make it even faster, but the NIO library isn't pretty; there are tons of frustrating little details you have to worry about. I'll be more than happy to help out if you need it, I have some experience wrestling with raw NIO.
Thanks for mailinator, at any rate - I've been using it for years now, never let me down.
Very solid article. Kinda like a post-mortem, without the mort.
Only one question: given that this is a service running on a single machine, presumably with a single NIC and at most a handful of drives, why does it need 300 threads? What could those threads be blocking on other than other (competing) threads?
(I'm pretty clueless about SMTP; if these threads are spending most of their time waiting for responses from other SMTP servers I guess that may make sense, but my experiences with Java's thread-sludge make me think that a response queue might let you scale even further...)
absolutely brilliant
I loved your article. Very wel written, easy to understand even for a layman, informative, and of course Mailinator is a great idea.
Keep on!
Only wonder what the hardware performance hit would be using a virtual machine that could be used in conjunction with a re-director to provide another layer of survivability.
great article. I love the slow download bit.
Just great. Beauty in simplicity.
I am glad you decided against storing emails for a few days as that is what mailinator is not about.
Keep up the great work.
Great article. I work at Yahoo and we use a lot of the same strategies to handle our global-scale load. One thing that we've found that you may be interested in trying is that, for Strings less than 200 or so characters, it generally takes up less memory and uses less CPU to store them as UTF-8 encoded byte arrays than to compress them.
If you ever decide to tackle NIO, I would be very interested to see the difference in performance. Mailinator seems like it would make an ideal case study.
Great technical article, yet simple. I loved your writing style :)
Thanks for the great item. I've already passed on the link to several people I know. I really like to see design decisions brought out into the open in like this.
great article. great service. thankyou.
kevin.
I notice you mention that no emails will ever originate from mailinator.com, and that blocking it is acceptable. Since this is the case, I wanted to ask why you have not put an SPF record in place for mailinator.com?
Adding an SPF record of "v=spf1 -all" would let other domains know that no emails will originate from there, without the need for anybody to block you. Admittedly, not all mail servers check SPF records, but the majority now do, so it would be beneficial for those domains since they could instantly reject mail from mailinator.com without having to add you to any kind of block list.
Impressive throughput. I wish this approach was suitable for my purposes but I need to be as reliable as possible, which unfortunately means disk I/O. Why not use an SPF record on Mailinator.com? See the syntax page. Something like "v=spf1 -all" would be ideal, since Mailinator never sends mail.
I like your mail-to-RSS :)
Cheers,
Tim
That was a great read. Thank you for sharing and mailinator which Ive been using for a few years now.
-sk
>"Now if this all sounds a bit shaky, as in we might just lose an email now and then - you're right. But remember, our goal is 99.99% accuracy. Not 100%."
Bob Cringely reported that some ISPs are not as reliable as your service:
http://www.pbs.org/cringely/pulpit/2006/pulpit_20061201_001274.html
>"Swimming upstream through Earthlink customer support, my buddy finally found a technical contact who freely acknowledged the problem. Since June, he was told, Earthlink's mail system has been so overloaded that some users have been missing up to 90 percent of their incoming e-mail. It isn't bounced back to senders; it just disappears. And Earthlink hasn't mentioned the problem to these affected customers unless they complain. The two groups affected are those who get their mail with an Earthlink-hosted domain and those with aliased e-mail addresses like my friend's Blackberry."
See:
http://www.pbs.org/cringely/pulpit/2006/pulpit_20061201_001274.html
You FAQ link in incorrect - but a great article, thanks!
Just wonder why need to do this?
"The biggest thing I knew I needed to do with Mailinator was to remove the unix application components. Mailinator needed to stop outsourcing its email receipt and do it itself. This basically meant I needed to write my own SMTP server. Or at least, a subset of one."
I can contribute some domain names to the points-to-mailinator list, if it helps. contack rickdothullatgeemailcotdom
I'd really like to read more about the legal issues and how you handled those.
I'm curious about those few sites that DO ban mailinator, do you have a list of them for the casual user to double check? I may be mistaken, but a few Gmails I sent today haven't yet appeared in my box, and it's been almost an hour :-/ I wouldn't think Google would be a bad guy!
Or could you say if there's any more intelligent way for us to check if they are banning than just trying test emails? (Like direct communication with the server, etc)
These comments have been invaluable to me as is this whole site. I thank you for your comment.
Great work! thanks for the nice post..
Thanks for the great article and service!
Great article. Informative, amusing, and quite enjoyable.
I am curious about the bandwidth issue. What is the volume and what are the associated costs?
great stuff, use it a lot. Only I would appreciate a bit more on the RSS stuff on your FAQ. There is nothing and you might want to keep this as a inofficial/non-supported feature for what ever reason. Fair enough. Otherwise, would be nice to have some ideas on it in the article. I only get bits and peaces from searching on google, like the feed-address. I have no clue, though, at the moment how to "save" my emails in my feed. I thought the idea is to be able to get emails without logging on mailinator.com every time. But as the emails only survive for 3-4 hours, how do I collect them in Thunderbird at night (no, I am not running a server and switch my computer off).
Many thanks,
Jon from Iceland
Very interesting article? Thanks and keep up the good work !
Greate paper. It gives me a good travel in Mailinator internal architectur.
I'll tip my hat to you as well. Top work, fella!
That was incredible. I really admire you for making this and explaining to us mere mortals how it all works. Very interesting post and very well written. 10/10 for the service and for you :)
It's a great story. I like technical stories like this. It's good practice of sharing experience.
I've found this useful.
Thank you for the story.
Keep up the great work.
Michael
hey! really cool you explain how this works. I'm a server programmer too, and been using mailinator since day 1. I love the RSS feature so much! Keep up the good work!
If you ever decide to treat Nio, I would be very curious to see the performance difference. Mailinator seems that it would be an ideal case study, thanks for sharing your idea.
I enjoyed reading it. I'm supposed to be somewhere else in a minute but I stuck to reading the story. I like the quality of your blog :D
I love Mailinator. I click Ads on Mailinator everytime I visit to support the site. Thank you for creating such an awesome service.
When you detect a certain IP address sending too much email at too great a rate, instead of 'banning' it and rejecting further email as you currently do, you could instead teergrube it.
This would consume few resources on your part, and would cause problems and additional work for spammers.
I realise this isn't the primary goal of mailinator, but I think it would be a very nice additional feature.
Rather than just ban an IP sending too much mail (one per sec) why not create a honeytrap to slow the sender down a lot ?
Also I wonder how well the system works now given that a few years have passed since the original ideas were implemented.
Mailinator helps many people. Thank you again for the great project.
Wish i would have found this earlier!
cheers
Thank you very for your work in providing this service! It is great!
You are a f..king HERO!
Post a Comment