Repeated inbound SMTP failure (timeout) from specific domains

From: Dane (danemartin_at_nospammyspamspam.atgmail.com)
Date: 12/15/04


Date: Tue, 14 Dec 2004 23:56:43 -0800

I've been trying to troubleshoot a very strange mail issue on an SBS2003
(Exchange 2003) server for about 6 weeks now and am desperately looking for
help.

Here is a sample of what I'm working with (failed inbound SMTP session):

date time c-ip cs-username cs-method cs-uri-query sc-status sc-win32-status
sc-bytes cs-bytes time-taken
2004-12-14 16:09:42 65.54.187.77 hotmail.com EHLO +hotmail.com 250 0 171 16
0
2004-12-14 16:09:42 65.54.187.77 hotmail.com MAIL
+FROM:<johnrabel@hotmail.com> 250 0 46 33 0
2004-12-14 16:09:42 65.54.187.77 hotmail.com RCPT
+TO:<***@recipientdomain.com> 250 0 0 25 93
2004-12-14 16:20:10 65.54.187.77 hotmail.com TIMEOUT hotmail.com 121
1304114753 84 4 627797
2004-12-14 16:20:10 65.54.187.77 hotmail.com QUIT hotmail.com 240 628063 84
4 627797

A small formatted Excel spread*** with more logs of both failed and
successful sessions can be quickly opened at:

http://pics.virtuality.org/linkto/email_troubles.xls

An Exchange server I manage has been processing inbound SMTP connections
that result in a 121 TIMEOUT. The messages never get delivered locally and
an NDR is not sent from my Exchange server, though the originating server
usually kicks back a failure notification to the sender after the retry
period expires.

Of the thousands of messages that the server processes daily, a fairly
steady group of sender domains consistently (but not always) have trouble
delivering email to my server while most messages come through fine. A log
review shows that the failed connections get "stuck" after my server
receives the RCPT command.

The sessions on my side look like:

EHLO remotemailserver.com
MAIL ...
RCPT ...
(10 minute wait)
TIMEOUT ...
QUIT ...

At first I assumed it was a firewall issue on my side that was blocking the
BDAT verb which would normally come after RCPT when advertising ESMTP. I
removed the CHUNKING advertisement to prevent binary data formats from being
used for inbound SMTP and forced HELO for outbound SMTP, but the problem
persisted.

I then decided to replace the consumer model SMC firewall with a Cisco 2651
router with the firewall feature set (NBAR and CBAC). Unfortunately the
firewall upgrade didn't change or improve the mail symptoms one bit, so I
can't imagine it's still a firewall issue at this point.

One consistent anomoly in the logs has to do with the sc-win32-status result
on the connections that time out, though I don't know what the result means.
The TIMEOUT line for a failed connection has a sc-win32-status result with a
very large number such as 2175011793. (more examples in the .xls file link
above)

So far I've seen failed inbound sessions from 6 legitimate businesses that
communicate with users on my Exchange server, and occasionally from domains
like hotmail.com, ebay.com and other very large mail domains, but never any
UCE or junk mail sessions. I've attempted to recreate the problem using my
own email accounts, both with and without attachments, but have not been
able to recreate the problem.

The failed messages never successfuly get delivered after retries... they
permanently fail.

I've searched both the web and newsgroups and found similar symptoms from
folks going back to 2002 and using both Exch2k and 2k3, but never any
solutions that were documented.

Any suggestions would be greatly appreciated. I'm going nuts trying to
figure this out!


Loading