Re: How to terminate a socket in CLOSE_WAIT state

From: hector (nospam_at_nospam.com)
Date: 05/21/04


Date: Fri, 21 May 2004 03:57:46 -0400

Francois, long time no talk.

In my experience, when this is happening it is simple a PE "programming
error" somewhere in the code. This should always be a controlled state
with the socket released within a few seconds of proper closure on both
ends.

Maybe this info will help:

Specifically with this CLOSE_WAIT, in one instance we found it to be
happening with our FTP server with WS_FTP clients. IPSwitch provided a
developer version to help solve the problem. Here are my product update
notes I have on it:

o WIldcat! FTP Server fixed for certain FTP clients who use both passive
and non-passive commands in their logic. Certain FTP clients like, WS_FTP,
will use the PASV command to start a download. If the download fails for
some good reason, WS_FTP will try to use the non-passive PORT command to
repeat the process. This was causing a PASSIVE opened socket to be left in
a CLOSE_WAIT state and on occasion, left the FTP user hung in Wildcat. This
is now fixed.

In another instance, we found recent IE updates within the last year or so
causing RSETs to occur and many CLOSE_WAIT sockets. Search the net for the
various forms of the behavior by thousands of users reporting things like
"Page can't be displayed" and "Browser automatically refreshes..." etc.

After extensive research to hit this problem over the head, without MS
confirmation, it is my belief Microsoft change the socket behavior to solved
TCP half close issues, and in doing so broke many existing servers out there
who were not ready for it. The problem showed up with IE and I could only
repeat under a DSL line.

The thing is, MS actually does it right. If you follow MSDN, it is
correctly "describes" what needs to be done but uses event driven operations
for examples. However, in synchronous mode, you may not catch it. Any
way, here are my Jan/03 summary notes with customers and testers who helped
test the fixed version:

(The following was sent to Microsoft)

After all my analysis, I believe I finally solve the "IE puzzle." I had to
read up and get a better understanding of TCP/IP specifications.

In my previous messages, I indicated adding a sleep (thus delaying the
close), helped solve the problem we were experiencing with IE. I was not
comfortable with this sleep solution and continued with more investigation.

Again, the MSDN documention indicated specific steps for a graceful shutdown
to send the remaining data. I took this to mean it applied to a "receiver
application" (like a browser) since it is the server that is doing the
sending,
not receiving in this particular state. In addition, with the SO_LINGER
setting, the purpose of this option is to block a close while there is data
in the send queue. So I didn't think these steps applied to the server
attempting
to close the socket.

After analyzing tcp/ip packets and read tcp/ip information in the book
TCP/IP
Illustrated Volume 1, specifically about "TCP Half Close" operations, I
realized the server using the recv() during a shutdown(), was not to expect
data, but instead expect the "half close" from the receiver.

So instead of using a sleep like so:

        Send HTTP response
        ...
        ...

        Sleep(300);
        closesocket(sock)

I changed the logic to follow what the MSDN documentation says to do:

        Send HTTP response
        ...
        ...

        // notify receiver we are about to close, no more data will be sent.
        // note: this does not close the socket

        shutdown(sock,SD_SEND);

        // new - wait until receiver closes socket.
        // note: sanity check removed from loop

        char buf[8*1024];
        while (recv(sock, buf,sizeof(buf)) > 0);

        // finally close the socket

        closesocket(sock)

This fixed the problem! 100% with no sleeps and/or no send speed throttles.

This is the proper way to do it. I understand the TCP/IP specification
better now.

1) The shutdown() tells the receiver the server is done sending data. No
more data is going to be send. More importantly, it doesn't close the
socket. At the socket layer, this sends a TCP/IP FIN packet to the
receiver.

So when you send data, PSH (push data) packets are sent. The receiver
sends ACK packet for each PSH.

            PSH 1 -->
                      <-- ACK 1
            PSH 2 -->
                      <-- ACK 2
            PSH 3 -->
                      <-- ACK 3
            PSH 4 -->
                      <-- ACK 4

etc.. There is NO specific order to this. It can be:

            PSH 1 -->
            PSH 2 -->
                      <-- ACK 1
            PSH 3 -->
                      <-- ACK 2
            PSH 4 -->
                      <-- ACK 3
                      <-- ACK 4

And this depends much on the network transmission. i.e., ADSL connections
has a faster receive (download), slower send (upload).

When the shutdown() command is called, it will send a FIN signal. The FIN
can be in its own packet or part of the last PSH packet.

2) At this point, the socket layer has to wait until the receiver has
acknowledged the FIN packet by receiving a ACK packet. This is done by
using the recv() command in a loop until 0 or less value is returned. Once
recv() returns 0 (or less), 1/2 of the socket is closed.

3) Then you can close the second half of the socket by calling
closesocket();

According to TCP/IP Illustrated Volume 1, using Shutdown is not common in
applications (page 238, chapter 18.5): Most applications will terminate
the socket in both directions using the closesocket() command.

Summary:

If hosting applications do not support TCP Half Close, they might begin to
see problems with specific versions of Microsoft IE and/or combos of the
operating system.

Supporting TCP Half Close fixes and enhances our web server and solves this
problem for our customers who have IE users. So from our standpoint, the
problem is now solved.

Since our web server has been in existence since 1996 and has been well
engineered and put through the test of time, the recent avalanche of IE
issues tells me that Microsoft has changed something recently in either in
IE or in the Winsock layer in regards to SO_LINGER and closesocket(). The
net effect is that many users are now experiencing this "Page Cannot be
Displayed" and probably, without verification, for web sites not using the
Microsoft IIS web server. We are going to do a final test on this
presumption.

According to MSDN, using SO_LINGER is suppose to block the closing of the
socket until data in the SEND queue has been exhausted (sent). That's the
purpose of the SO_LINGER option and had always worked for us since we never
saw this IE issue before.

In other words, if there is still data in the send queue when the
closesocket() is called, the SO_LINGER socket setting (see setsockopt() in
MSDN) is suppose to BLOCK the sending the FIN packet to the receiver until
all the data has been sent.

However, this now brings up network transmission and packet sequence issues.

Although the socket send queue is now empty, depending on the user's
connectivity reliable and speed, it is my suspicion not all the PSH (push
data) packets were acknowledged by the receiver by the time it received a
FIN packet. Hence when a PSH packet arrived after a FIN, it was for a
closed socket. Hence according to RFC 793, if an out of sequence packet
arrives, a RST (reset) is sent back to the server.

This explains why we were seeing multiple resends of the URL request by the
IE browser.

For some people, they got the PAGE CANNOT BE DISPLAYED error. I
personally never saw this. Only the resends. I should note one of our
testers indicated this might depend on the IE setting "Show Friendly URL
errors" options. When ON, he got the page error. When Off, he got the
resends.

However, I am not sure that is consistent. It believe it all depends on the
network transmission, i.e, a timing issue related to the user's internet
connectivity.

In any case, the problem is solved for us.

I hope the above info provides some info to Microsoft and other host
developers who might run into the problem.

Thanks to All who tested this IE issue at our web site.

---
Hector Santos
WINSERVER "Wildcat! Interactive Net Server"
support: http://www.winserver.com
sales: http://www.santronics.com
"Francois PIETTE" <francois.piette@overbyte.be> wrote in message
news:40ac7920$0$8414$a0ced6e1@news.skynet.be...
> In a server application, sometimes there are a lot of sockets in
CLOSE_WAIT
> state (shown by netstat utility). It could even consume all available
> sockets.
> The questions are:
> - How to close such socket so that they can be reused ?
> - How to reduce the time the system allow a socket to be in CLOSE_WAIT ?
> - How to completely avoid this state and still gracefully close tcp
sessions
> ?
>
> -- 
> francois.piette@overbyte.be
> The author for the freeware multi-tier middleware MidWare
> The author of the freeware Internet Component Suite (ICS)
> http://www.overbyte.be
>
>