7

We run an e-mail server for a few customers, and we've recently run into a bit of a conundrum.

We had a user who sent an e-mail to an incorrect e-mail address. The incorrectly specified domain unfortunately existed. It did not have MX records, and the A record of the domain went to a server which did not speak SMTP. Therefore, the e-mail server attempted delivery and did not succeed because no e-mail server was running.

For that reason, our e-mail server, entirely in accordance with the SMTP RFC, attempted re-delivery over the course of five days and finally gave up and sent a notice to the sender after 5 days of unsuccessful delivery.

Section 4.5.4.1 of RFC5321 (Simple Mail Transfer Protocol) says:

Retries continue until the message is transmitted or the sender gives up; the give-up time generally needs to be at least 4-5 days.

Therefore, the mail server in its default configuration, in this case has operated in accordance with the RFC, meaning that a user specifying the wrong e-mail address in this case would not receive notice of that except five days later.

At this point, my boss has asked whether it would be possible to reduce the give-up time to something shorter, say 1 day. His reasoning is that it is better that the user be notified earlier of non-delivery, and that the user may attempt re-delivery at a later date, or delivery through an alternate channel. It sounds like a reasonable thing to do, but in general I'm wary of performing any kind of configuration changes which contradict what's in the RFC.

Is there any non-obvious reason why it would be a bad idea to reduce the give-up time to 24 hours, beyond just saying "the RFC says otherwise"?

Also, what do the bigger e-mail providers out there (the Googles, Microsofts, AOLs and Yahoos) do in this scenario?

Per von Zweigbergk
  • 2,615
  • 2
  • 17
  • 27
  • Which MTA are you using? – JayMcTee Nov 10 '15 at 15:23
  • @JayMcTee We're running Zimbra Collaboration Suite, which uses postfix under the hood. I deliberately chose *not* to specify the MTA in the question because the MTA itself isn't really relevant, the question is applicable regardless of the specific MTA software used. It might just as well be sendmail, qmail or MS Exchange and the same question would apply. – Per von Zweigbergk Nov 10 '15 at 15:25
  • 7
    Not really. `sendmail`, for example, also sends a warning that delivery has not *yet* been successful, at (I think) the four hour mark. Since those also say who it's from and to, the user should get warning that something's up well before the five-day failure mark. If your MTA doesn't generate those warnings, then choice of MTA probably *is* a factor. – MadHatter Nov 10 '15 at 16:07

3 Answers3

11

Why shouldn't you give up delivering email after one day? One good reason is weekends.

Email is not now, and never was, particularly reliable. In the early days of the Internet, the 1980s, it was entirely possible for email to take a couple of days just to reach its destination, what with some network links not being 24x7, over expensive long distance dialup calls (back then it cost per minute to call two towns away, nevermind the cost of a call from Sydney to Los Angeles), or even over amateur radio. As a result, it could take a while to deliver email, and the protocols had to cope with unreliable and part-time connections. They do this very well, but even then, mail could get delayed or lost.

Certainly today, email has an illusion of reliability, if only because the underlying transports are more reliable, and many uninformed people (like most of our users) have an expectation that it is reliable, but that expectation does not match reality. Without a significant change to email delivery protocols, which will probably never happen, email, like anything built by humans, will always be less than 100% perfect.

Sometimes, we sysadmins take advantage of that.

For instance, in an office where everyone is only there Monday-Friday, I can have an email outage lasting all weekend if necessary. Of course, it virtually never is necessary to be out that long, but I have had to have email down for over 24 hours in rare cases.

In such a case, if you give up after 24 hours, email sent Friday afternoon may not reach its recipient. The sender won't find out until Monday morning, but if you had kept trying, the recipient would have had it by Monday morning.

Further, it's very important to set user expectations appropriately. The fact that Internet email is not and never will be 100% reliable needs to be clearly understood, even as we like to think that it is.

The RFC says you should keep trying, precisely because things go wrong, and it's intended that the mail be delivered eventually, if possible, but at some point you do have to give up. It might be OK to reduce this to three days. I've always thought five days was too long to wait for delivery for most messages on a 24x7 Internet.


As for your given mail server:

Postfix can notify senders when an email message has been delayed, but this feature is turned off by default. This warning should be sufficient to let your users know that something might have gone wrong, such as a mistyped email address, and will arrive much sooner than the 24 hours your boss has proposed.

To enable it, set delay_warning_time to the desired value in main.cf.

delay_warning_time=4h

Beginning with version 3.0, Postfix can also notify those same senders when delayed messages are finally delivered. This is also off by default, as it can result in a lot of notifications. But if you want this, enable confirm_delay_cleared in main.cf.

confirm_delay_cleared=yes
Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
1

I'm going to take the other side of this than most of the answers here.

The ISP I work for serves about 3000 customers, and uses Qmail as our MTA for those customer's mailboxes.

We have run our system with a 2 day queue lifetime for nearly 2 years, and have not received any complaints, nor had an issues with delivering mail. It has lowered the queue size, which has made it much easier to spot compromised accounts (rare, but they do occur) and clean them up.

Queue lifetimes over a day is just Cargo Cult system administration, and a holdover from when the internet was much less "always on". Good sysadmins follow "best practices", but even better ones understand why it was best practice, and change the "best practice" to a better practice when the situation differs from the one the previous "best practice" was developed in.

Azendale
  • 1,505
  • 2
  • 11
  • 14
0

I recommend against changing the give-up time. Say the recipient's office (with on-site email) was wiped out by a \tornado|earthquake|fire\ over the weekend. If the company uses offsite tape backups for their DR plan, you'd better believe it's going to take longer than 24 hours to go from tornado to accepting email again. 5 days would be too long in this scenario, but that's not the root of the problem.

Whether it's 5 days, 48 hours, or 24 hours, all those time periods are too long of a delay to be alerted to unsent email, and all of them are too short to accommodate every possible reason for server failure. If not using sendmail, maybe look into sendmail as MadHatter suggested. At the very least, you should configure some alerts for yourself (and/or others) if anything sits in the queue longer than a few hours.

Neil
  • 842
  • 6
  • 13