4

We are sending mail using exim (configured through WHM). With the same exact configuration, seemingly most of the time exim sends the correct AUTH LOGIN username and password, however sometimes it is omitting this part and sendgrid is refusing with the following error:

host smtp.sendgrid.net [158.85.10.138]
SMTP error from remote mail server after MAIL FROM:<info@*******.net> SIZE=2048:
550 Unauthenticated senders not allowed

Digging into it, I can reproduce the problem (not every time, sometimes) with the following:

# exim -v **@****.com
From: **@****.com
To: **@****.com
Subject: Test exim

This is an exim test.

These are the exim configuration settings:

Section: AUTH

sendgrid_login:
  driver = plaintext
  public_name = LOGIN
  client_send = : ******** : **************

Section: ROUTERSTART

send_via_sendgrid:
  driver = manualroute
  domains = ! +local_domains
  transport = sendgrid_smtp
  route_list = "* smtp.sendgrid.net::587 byname"
  host_find_failed = defer
  no_more

Section: TRANSPORTSTART

sendgrid_smtp:
  driver = smtp
  hosts = smtp.sendgrid.net
  hosts_require_auth = smtp.sendgrid.net
  hosts_require_tls = smtp.sendgrid.net
  authenticated_sender_force = true

When it works, this is the log:

LOG: MAIN
  cwd=/home/** 3 args: exim -v **@****.com
From: **@****.com
To: **@****.com
Subject: Test exim

This is an exim test.
LOG: MAIN
  <= **@****-**.com U=jf P=local S=354 T="Test exim"
**@****-** [~]# LOG: MAIN
  cwd=/var/spool/exim 4 args: /usr/sbin/exim -v -Mc 1ZjWZy-0005re-S3
delivering 1ZjWZy-0005re-S3
Transport port=25 replaced by host-specific port=587
Connecting to smtp.sendgrid.net [198.37.144.212]:587 ... connected
  SMTP<< 220 ismtpd-064 ESMTP service ready
  SMTP>> EHLO **.****-**.com
  SMTP<< 250-**.***.26.157
         250-SIZE 20480000
         250-STARTTLS
         250-AUTH PLAIN LOGIN
         250-8BITMIME
         250-PIPELINING
         250 AUTH=PLAIN LOGIN
  SMTP>> STARTTLS
  SMTP<< 220 Begin TLS negotiation now
  SMTP>> EHLO **.****-**.com
  SMTP<< 250-**.***.26.157
         250-8BITMIME
         250-SIZE 20480000
         250-AUTH=PLAIN LOGIN
         250-AUTH PLAIN LOGIN
         250 PIPELINING
  SMTP>> AUTH LOGIN
  SMTP<< 334 VXNlcm5hbWU6
  SMTP>> ****
  SMTP<< 334 UGFzc3dvcmQ6
  SMTP>> ********************
  SMTP<< 235 Authentication successful.
  SMTP>> MAIL FROM:<**@****-**.com> SIZE=1388 AUTH=**@****-**.com
  SMTP>> RCPT TO:<**@****.com>
  SMTP>> DATA
  SMTP<< 250 Sender address accepted
  SMTP<< 250 Recipient address accepted
  SMTP<< 354 Continue
  SMTP>> writing message and terminating "."
  SMTP<< 250 Delivery in progress
  SMTP>> QUIT
LOG: MAIN
  => **@****.com R=send_via_sendgrid T=sendgrid_smtp H=smtp.sendgrid.net [198.37.144.212] X=TLSv1.2:AES128-GCM-SHA256:128 A=sendgrid_login C="250 Delivery in progress"
LOG: MAIN
  Completed

Pay special attention to the line SMTP>> AUTH LOGIN and the six right below, ending in the 235 Authentication successful. line.

Here is a failure:

LOG: MAIN
  cwd=/home/** 3 args: exim -v **@****.com
From: **@****.com
To: **@****.com
Subject: Test exim

This is an exim test.
LOG: MAIN
  <= **@****-**.com U=jf P=local S=340 T="Test exim"
**@****-** [~]# LOG: MAIN
  cwd=/var/spool/exim 4 args: /usr/sbin/exim -v -Mc 1ZjWjO-0006T8-Eq
delivering 1ZjWjO-0006T8-Eq
Transport port=25 replaced by host-specific port=587
Connecting to smtp.sendgrid.net [198.37.144.225]:587 ... connected
  SMTP<< 220 ismtpd-078 ESMTP service ready
  SMTP>> EHLO ****-**.com
  SMTP<< 250-**.***.157
         250-SIZE 20480000
         250-STARTTLS
         250-AUTH PLAIN LOGIN
         250-8BITMIME
         250-PIPELINING
         250 AUTH=PLAIN LOGIN
  SMTP>> STARTTLS
  SMTP<< 220 Begin TLS negotiation now
  SMTP>> EHLO ****-**.com
  SMTP<< 250-**.***.26.157
         250-8BITMIME
         250-SIZE 20480000
         250-AUTH=PLAIN LOGIN
         250-AUTH PLAIN LOGIN
         250 PIPELINING
  SMTP>> MAIL FROM:<**@****-**.com> SIZE=1374
  SMTP>> RCPT TO:<**@****.com>
  SMTP>> DATA
  SMTP<< 550 Cannot receive from specified address <**@****-**.com>: Unauthenticated senders not allowed
  SMTP<< 503 Must have sender before recipient
  SMTP<< 503 Must have valid receiver and originator
  SMTP>> QUIT
LOG: MAIN
  ** **@****.com R=send_via_sendgrid T=sendgrid_smtp H=smtp.sendgrid.net [198.37.144.225] X=TLSv1.2:AES128-GCM-SHA256:128: SMTP error from remote mail server after MAIL FROM:<**@****-**.com> SIZE=1374: 550 Cannot receive from specified address <**@****-**.com>: Unauthenticated senders not allowed
LOG: MAIN
  cwd=/var/spool/exim 8 args: /usr/sbin/exim -v -t -oem -oi -f <> -E1ZjWjO-0006T8-Eq
LOG: MAIN
  <= <> R=1ZjWjO-0006T8-Eq U=mailnull P=local S=1383 T="Mail delivery failed: returning message to sender"
LOG: MAIN
  cwd=/var/spool/exim 4 args: /usr/sbin/exim -v -Mc 1ZjWjn-0006TR-BX
delivering 1ZjWjn-0006TR-BX
LOG: MAIN
  Completed
LOG: MAIN
  => ** <**@****-**.com> R=localuser T=local_delivery
LOG: MAIN
  Completed

Note that SMTP>> AUTH LOGIN and it's following six lines are not called...?

The question is, why is exim omitting the actual SMTP>> AUTH LOGIN part of the request, but only sometimes? It's using the same exact connection configuration (you can see it referenced in both logs as R=send_via_sendgrid T=sendgrid_smtp).

Jared Farrish
  • 145
  • 1
  • 8
  • Perhaps the hostname resolves to different sets of IP addresses when `route_list` in the director is being evaluated and when `hosts` in the transport is being evaluated. Try hard-coding an IP address in both as a test. – wurtel Oct 07 '15 at 08:50
  • @wurtel - Sendgrid is load balancing, so it's possible to get different IP addresses. Wouldn't exim have to be looking up the related ip address for this to be an issue though? – Jared Farrish Oct 07 '15 at 13:50
  • exim *does* lookup IP addresses for host lists (unfortunately IMHO). I had a blacklist of hosts and after some time I started getting "address not found for xyz.example.com" for every email message coming in. The A record for the host had been removed, but exim was checking evry host in the blacklist for every incoming message. – wurtel Oct 07 '15 at 14:01
  • @wurtel - Huh. That's interesting. Would not have that the case. – Jared Farrish Oct 07 '15 at 14:19
  • @JaredFarrish Did you ever end up solving this? I think we may be experiencing the same thing. We get 550 Unauthenticated with Exim and Sendgrid and sometimes it works for a given sendgrid IP and sometimes it doesn't, so I don't think it's a specific Sendgrid server issue. – sa289 Oct 19 '15 at 18:53
  • @sa289 - My belief is that wurtel is on to the problem, with the IP address changing in the midst of exim running both calls, meaning it doesn't see the latter in the former, even though it's the same host. I wasn't able to test solutions, but I do notice that generally the IP's rotated by Sendgrid are "static", meaning you could plug in the IP addresses instead of `smtp.sendgrid.com` in both lists. Or reference only one IP address. Some combination of that. The other sendgrid-related solution is the pricing tier that guarantees the static IP address. Let me know what works. `:)` – Jared Farrish Oct 19 '15 at 19:01
  • Adding -d+auth to Exim command line might help to debug this issue. – kupson Oct 19 '15 at 19:35
  • @sa289 - Looks like `hosts_require_auth = *` might be the ticket. We set this on our server about six hours ago and haven't had another auth error since. I'll watch it a bit longer and if no more errors, I'll post it as the answer. – Jared Farrish Nov 05 '15 at 22:44
  • @JaredFarrish - any interest in trying out my answer below to see if you can use that to reproduce the issue under your new configuration? If I recall correctly when I have that script running I get the problem almost instantly. – sa289 Nov 05 '15 at 23:11
  • @sa289 - This is our production server, so probably not. – Jared Farrish Nov 05 '15 at 23:51
  • @JaredFarrish - I tested and your asterisk approach worked. That led me to some more searching online and I found what appears to be a more targeted solution that's used for Mandrill and for Gmail. I've updated my answer to include this solution. – sa289 Nov 09 '15 at 04:30

1 Answers1

4

EDIT Nov 8, 2015:

The solution is to use

hosts_require_auth = <; $host_address
hosts_require_tls = <; $host_address

Instead of

hosts_require_auth = smtp.sendgrid.net
hosts_require_tls = smtp.sendgrid.net

The unusual-looking <; syntax is in case the hostname resolves to an IPv6 address (since from what it sounds like the colons in an IPv6 address can break things otherwise). The $host_address variable is used to solve the case where the IP address that the hostname resolves to changes (such as in this case with smtp.sendgrid.net resolving to multiple IPs and sometimes a different one being resolved to mid-process) - I think it prevents it from needing to be looked up again (see also The smtp transport). Credit Configuring Exim to use Gmail as a Smarthost and also the OP and @wurtel.

PREVIOUS ANSWER / HOW TO TEST AND REPRODUCE::

I believe @wurtel's comment on the question about the IP address changing is right on because I can reliably reproduce this by using a script to rapidly change which IP smtp.sendgrid.net resolves. I also confirmed that without any entry in the hosts file, this issue happens at least every 2 hours when I'm sending an email every 5 seconds, but when I hard-coded an IP into /etc/hosts, I went for 8 hours without any errors at that frequency of sending, so that's the temporary workaround I'm using for now.

Here's how I reproduced this issue. NOTE: this is current as the end of October 2015 - if you're trying it after that the IPs may be different so run dig smtp.sendgrid.net and use two IPs it returns instead.

Add this to /etc/hosts

108.168.190.108 smtp.sendgrid.net

Save this to a PHP file, replacing test@example.com with an email address you control.

#!/usr/local/bin/php
<?php
while (true) {
    mail('test@example.com', 'Test email deletethiswithafilter', 'test ' . time());
    usleep(500000);
}

Save this to a PHP file and run it while the above is running also. Every 5ms - 15ms it will toggle the entry in the hosts file for smtp.sendgrid.net between the two IPs that it resolves to which are 158.85.10.138 and 108.168.190.108

#!/usr/local/bin/php
<?php
while (true) {
    passthru('new_hosts_file_contents=`cat /etc/hosts | sed \'s/108.168.190.108/ip108/g\' | sed \'s/158.85.10.138/ip158/g\' | sed \'s/ip108/158.85.10.138/g\' | sed \'s/ip158/108.168.190.108/g\'`; echo "$new_hosts_file_contents" > /etc/hosts 2>&1');
    usleep(10000 + rand(-5000,5000));
}
sa289
  • 1,308
  • 2
  • 17
  • 42
  • Please add the information from the blog entry explaining the use of `<;` and `$host_address`. – Jared Farrish Nov 09 '15 at 21:47
  • @JaredFarrish Added – sa289 Nov 09 '15 at 23:07
  • Cool, thanks. Just for documentation purposes, `hosts_require_auth = *` and `hosts_require_tls = *` also works if IPv6 support isn't needed. – Jared Farrish Nov 09 '15 at 23:27
  • @JaredFarrish I bet the wildcards would work with IPv6 as well (though I don't know for sure). I had wondered if there could be unexpected consequences though of using a catch-all approach like that rather than just using `$host_address` which targets exactly the host that's relevant to the current operation. – sa289 Nov 10 '15 at 00:39