Re: Repeated inbound SMTP failure (timeout) from specific domains
From: Carol Chisholm (carol.lists_at_smalldomain.ch)
Date: 02/05/05
- Previous message: Eddie: "routing using dns"
- In reply to: Carol Chisholm: "Re: Repeated inbound SMTP failure (timeout) from specific domains"
- Messages sorted by: [ date ] [ thread ]
Date: Sat, 05 Feb 2005 09:32:28 +0100
I have found that changing the NIC in the Exchange server from a
gigabit one to a very standard 10/100 can also resolve this problem.
This is not actually a solution, because the problem is domain
dependent, but it is a relatively easy workaround, which I imagine is
effective because the less sophisticated, older NIC will make smaller
packets, and the drivers for a cheap NIC are less likely to use any
sophisticated techniques which might get lost in misconfigured
routers.
There is also the Extended DNS issues mentioned in MS article 828731
when you have Exchange 2003 SP1.
Carol
On Tue, 01 Feb 2005 19:50:46 +0100, Carol Chisholm
<carol.lists@smalldomain.ch> wrote:
>Here is my latest update on my particular case.
>
>I had changed the MTU down to 1400 but not lower...
>
>This concerns a new company where I set up a new Exchange 2003 server
>.....and e-mail started to arrive…
>
>Not all mail arrived, and after a considerable amount of reading
>logfiles I found that in fact mail from some mail hosts did not
>arrive.
>
>One ISP affected is bezquint.com. They have two blocks of servers, one
>of which we can receive from and one of which we can not receive from.
>
>Another affected ISP is VTX, and where I do have some cooperative
>contacts to help with testing. There are many more.
>
>When I called these ISPs they all obligingly checked my DNS, reverse
>DNS, telnetted into my server and sent me a message saying all was
>well.
>
>I tested further with VTX and we found that when mail was sent
>"manually" with telnet, ehlo and so on, it was transferred, and when
>it was send "batch" through the mail system, it sat in the queue at
>VTX until it timed out. No NDR, no nothing at my end.
>
>At my end I see a connection, and EHLO, and no DATA, or BDAT or
>whatever… Here is a snip from a logfile. I had the timeout set to 20
>minutes at this stage so there is no immediate timeout, but it does
>come later.
>
>>>2004-11-29 13:24:53 212.37.192.53 smtp1.internet-fr.net SMTPSVC1
>>>NBINEUS001 10.1.1.1 0 EHLO - +smtp1.internet-fr.net 250 0 214 26 0
>>>SMTP - - - -
>>>2004-11-29 13:24:53 212.37.192.53 smtp1.internet-fr.net SMTPSVC1
>>>NBINEUS001 10.1.1.1 0 MAIL - +FROM 250
>>>0 54 51 31 SMTP - - - -
>>>2004-11-29 13:24:53 212.37.192.53 smtp1.internet-fr.net SMTPSVC1
>>>NBINEUS001 10.1.1.1 0 RCPT - +TO: 250 0 27 24 0 SMTP -
>
>Anyway having identified the problem I changed the firewall, changed
>the ADSL router, build the second server as an Exchange server (it was
>supposed to be the terminal server for Access Accounts). I reduced all
>the MTUs (NIC, firewall, ADSL modems & routers) to 1400, 1464, 1492. I
>have no blacklists, I uninstalled the anti-virus software. Finally in
>desperation I took the second server to another building, with one of
>the firewalls I had been testing with and plugged it into a cable
>modem rather than an ADSL modem. And all the mail arrived!
>
>So now I have one server offsite, two firewalls, a firewall-firewall
>VPN and two Exchange servers.
>
>Today when you called I was doing more testing with VTX. I build a
>third server on some very old hardware I had lying around, and took it
>into the office. It has the same software as the other two (which do
>not get mail over the ADSL connection in the office).
>
>I set up a test domain (link216.ch) and had VTX send me mail. When
>they send to my old hardware, mail came through instantaneously. When
>I changed the route to send incoming SMTP to the new Proliant server
>(same modem, same firewall, same switch, same LAN) mail does not
>arrive. (well some mail does, but not from certain hosts). As before
>it sits in the queue at VTX until it times out.
>
>Of course I need to bring the server back so I can set up a terminal
>server. I'd also like to elucidate the problem, of course.
>
>My conclusion at this stage is that it has to be a network setting.
>The new servers are off-the-shelf Proliants with 10/100/1000 NICS.
>(NC7760 and NC7761 if I remember correctly). My old machine has
>probably got a 3C905 or 3C590 or some such in it.
>
>At this stage I'm wondering whether to add a very standard NIC to the
>new server and try again.
>
>Now I've read Johan's post I'll reduce the MTU size further as well.
>
>Carol Chisholm
>
>On 28 Jan 2005 13:54:06 -0800, parcival@gmail.com wrote:
>
>>I was having the same problem, also with a small business server.
>>However, I don't think the server was the problem.
>>
>>After spending about 3 days on the phone with Microsoft Support, and
>>analyzing many smtp conversations at packet level *some* of the packets
>>seemed become malformed before reaching server (from the servers that
>>were causing the problem sessions)
>>
>>We did many tests with the MTU size, trying to get it as high as
>>possible without any success. Then i called linksys support, and they
>>suggested to set the mtu size to the lowest value possible (576) for
>>trouble shooting purposes. Amazingly, I started to receive all the test
>>messages. Then I slowly moved the MTU value higher and higher and am
>>now receiving email from hotmail at an MTU size of 1400.
>>
>>My best guess of what is causing this problem is some router on the
>>backbone of the Internet that is not configured properly and is messing
>>with the packets that come through (ie, the reason why only *some* of
>>the mailservers have problems). Setting the MTU lower on our
>>router/firewall makes the mail servers negotiate a smaller framsize
>>which doesn't end up being corrupted somewhere along the way.
>>
>>I hope this helps.
>>--Johan
>>
>>
>>
>>Dane wrote:
>>> I've been trying to troubleshoot a very strange mail issue on an
>>SBS2003
>>> (Exchange 2003) server for about 6 weeks now and am desperately
>>looking for
>>> help.
>>>
>>> Here is a sample of what I'm working with (failed inbound SMTP
>>session):
>>>
>>> date time c-ip cs-username cs-method cs-uri-query sc-status
>>sc-win32-status
>>> sc-bytes cs-bytes time-taken
>>> 2004-12-14 16:09:42 65.54.187.77 hotmail.com EHLO +hotmail.com 250 0
>>171 16
>>> 0
>>> 2004-12-14 16:09:42 65.54.187.77 hotmail.com MAIL
>>> +FROM:<johnrabel@hotmail.com> 250 0 46 33 0
>>> 2004-12-14 16:09:42 65.54.187.77 hotmail.com RCPT
>>> +TO:<***@recipientdomain.com> 250 0 0 25 93
>>> 2004-12-14 16:20:10 65.54.187.77 hotmail.com TIMEOUT hotmail.com 121
>>> 1304114753 84 4 627797
>>> 2004-12-14 16:20:10 65.54.187.77 hotmail.com QUIT hotmail.com 240
>>628063 84
>>> 4 627797
>>>
>>> A small formatted Excel spread*** with more logs of both failed and
>>> successful sessions can be quickly opened at:
>>>
>>> http://pics.virtuality.org/linkto/email_troubles.xls
>>>
>>> An Exchange server I manage has been processing inbound SMTP
>>connections
>>> that result in a 121 TIMEOUT. The messages never get delivered
>>locally and
>>> an NDR is not sent from my Exchange server, though the originating
>>server
>>> usually kicks back a failure notification to the sender after the
>>retry
>>> period expires.
>>>
>>> Of the thousands of messages that the server processes daily, a
>>fairly
>>> steady group of sender domains consistently (but not always) have
>>trouble
>>> delivering email to my server while most messages come through fine.
>>A log
>>> review shows that the failed connections get "stuck" after my server
>>> receives the RCPT command.
>>>
>>> The sessions on my side look like:
>>>
>>> EHLO remotemailserver.com
>>> MAIL ...
>>> RCPT ...
>>> (10 minute wait)
>>> TIMEOUT ...
>>> QUIT ...
>>>
>>> At first I assumed it was a firewall issue on my side that was
>>blocking the
>>> BDAT verb which would normally come after RCPT when advertising
>>ESMTP. I
>>> removed the CHUNKING advertisement to prevent binary data formats
>>from being
>>> used for inbound SMTP and forced HELO for outbound SMTP, but the
>>problem
>>> persisted.
>>>
>>> I then decided to replace the consumer model SMC firewall with a
>>Cisco 2651
>>> router with the firewall feature set (NBAR and CBAC). Unfortunately
>>the
>>> firewall upgrade didn't change or improve the mail symptoms one bit,
>>so I
>>> can't imagine it's still a firewall issue at this point.
>>>
>>> One consistent anomoly in the logs has to do with the sc-win32-status
>>result
>>> on the connections that time out, though I don't know what the result
>>means.
>>> The TIMEOUT line for a failed connection has a sc-win32-status result
>>with a
>>> very large number such as 2175011793. (more examples in the .xls file
>>link
>>> above)
>>>
>>> So far I've seen failed inbound sessions from 6 legitimate businesses
>>that
>>> communicate with users on my Exchange server, and occasionally from
>>domains
>>> like hotmail.com, ebay.com and other very large mail domains, but
>>never any
>>> UCE or junk mail sessions. I've attempted to recreate the problem
>>using my
>>> own email accounts, both with and without attachments, but have not
>>been
>>> able to recreate the problem.
>>>
>>> The failed messages never successfuly get delivered after retries...
>>they
>>> permanently fail.
>>>
>>> I've searched both the web and newsgroups and found similar symptoms
>>from
>>> folks going back to 2002 and using both Exch2k and 2k3, but never any
>>> solutions that were documented.
>>>
>>> Any suggestions would be greatly appreciated. I'm going nuts trying
>>to
>>> figure this out!
- Previous message: Eddie: "routing using dns"
- In reply to: Carol Chisholm: "Re: Repeated inbound SMTP failure (timeout) from specific domains"
- Messages sorted by: [ date ] [ thread ]