Re: How to terminate a socket in CLOSE_WAIT state

From: Francois Piette (francois.piette_at_overbyte.be)
Date: 05/21/04


Date: Fri, 21 May 2004 11:27:01 +0200

Thanks you for this long and interesting reply.

--
francois.piette@overbyte.be
Author of ICS (Internet Component Suite, freeware)
Author of MidWare (Multi-tier framework, freeware)
http://www.overbyte.be
"hector" <nospam@nospam.com> a écrit dans le message de
news:efNlmiwPEHA.832@TK2MSFTNGP09.phx.gbl...
> Francois, long time no talk.
>
> In my experience, when this is happening  it is simple a PE  "programming
> error" somewhere in the code.    This should always be a controlled state
> with the socket released within a few seconds of proper closure on both
> ends.
>
> Maybe this info will help:
>
> Specifically with this CLOSE_WAIT, in one instance we found it to be
> happening with our FTP server with WS_FTP clients.   IPSwitch provided a
> developer version to help solve the problem.  Here are my product update
> notes I have on it:
>
> o  WIldcat! FTP Server fixed for certain FTP clients who use both passive
> and non-passive commands in their logic. Certain FTP clients like, WS_FTP,
> will use the PASV command to start a download.  If the download fails for
> some good reason, WS_FTP will try to use the non-passive PORT command to
> repeat the process.  This was causing a PASSIVE opened socket to be left
in
> a CLOSE_WAIT state and on occasion, left the FTP user hung in Wildcat.
This
> is now fixed.
>
> In another instance, we found recent IE updates within the last year or so
> causing RSETs to occur and many CLOSE_WAIT sockets.   Search the net for
the
> various forms of the behavior by thousands of users reporting things like
> "Page can't be displayed" and  "Browser automatically refreshes..."  etc.
>
> After extensive research to hit this problem over the head, without MS
> confirmation, it is my belief Microsoft change the socket behavior to
solved
> TCP half close issues, and in doing so broke many existing servers out
there
> who were not ready for it.  The problem showed up with IE and I could only
> repeat under a DSL line.
>
> The thing is,  MS actually does it right.  If you follow MSDN,  it is
> correctly "describes" what needs to be done but uses event driven
operations
> for examples.  However, in synchronous mode,  you may not catch it.  Any
> way, here are my Jan/03 summary notes with customers and testers who
helped
> test the fixed version:
>
> (The following was sent to Microsoft)
>
> After all my analysis, I believe I finally solve the "IE puzzle."  I had
to
> read up and get a better understanding of TCP/IP specifications.
>
> In my previous messages, I indicated adding a sleep (thus delaying the
> close),  helped solve the problem we were experiencing with IE.   I was
not
> comfortable with this sleep solution and continued with more
investigation.
>
> Again, the MSDN documention indicated specific steps for a graceful
shutdown
> to send the remaining data.  I took this to mean it applied to a "receiver
> application" (like a browser) since it is the server that is doing the
> sending,
> not receiving in this particular state.   In addition, with the SO_LINGER
> setting, the purpose of this option is to block a close while there is
data
> in the send queue.  So I didn't think these steps applied to the server
> attempting
> to close the socket.
>
> After analyzing tcp/ip packets and read tcp/ip information in the book
> TCP/IP
> Illustrated Volume 1, specifically about "TCP Half Close" operations,  I
> realized the server using the recv() during a shutdown(), was not to
expect
> data, but instead expect the "half close" from the receiver.
>
> So instead of using a sleep like so:
>
>         Send HTTP response
>         ...
>         ...
>
>         Sleep(300);
>         closesocket(sock)
>
> I changed the logic to follow what the MSDN documentation says to do:
>
>         Send HTTP response
>         ...
>         ...
>
>         // notify receiver we are about to close, no more data will be
sent.
>         // note: this does not close the socket
>
>         shutdown(sock,SD_SEND);
>
>         // new - wait until receiver closes socket.
>         // note: sanity check removed from loop
>
>         char buf[8*1024];
>         while (recv(sock, buf,sizeof(buf)) > 0);
>
>         // finally close the socket
>
>         closesocket(sock)
>
> This fixed the problem! 100% with no sleeps and/or no send speed
throttles.
>
> This is the proper way to do it.   I understand the TCP/IP specification
> better now.
>
> 1) The shutdown() tells the receiver the server is done sending data. No
> more data is going to be send.  More importantly, it doesn't close the
> socket.  At the socket layer,  this sends a TCP/IP FIN packet to the
> receiver.
>
> So when you send data,  PSH (push data) packets are sent.  The receiver
> sends ACK packet for each PSH.
>
>             PSH  1 -->
>                       <--  ACK 1
>             PSH  2 -->
>                       <--  ACK 2
>             PSH  3 -->
>                       <--  ACK 3
>             PSH  4 -->
>                       <--  ACK 4
>
> etc..  There is NO specific order to this.  It can be:
>
>             PSH  1 -->
>             PSH  2 -->
>                       <--  ACK 1
>             PSH  3 -->
>                       <--  ACK 2
>             PSH  4 -->
>                       <--  ACK 3
>                       <--  ACK 4
>
> And this depends much on the network transmission. i.e.,  ADSL connections
> has a faster receive (download), slower send (upload).
>
> When the shutdown() command is called, it will send a FIN signal.  The FIN
> can be in its own packet or part of the last PSH packet.
>
> 2) At this point, the socket layer has to wait until the receiver has
> acknowledged the FIN packet by receiving a ACK packet.  This is done by
> using the recv() command in a loop until 0 or less value is returned.
Once
> recv() returns 0 (or less),  1/2 of the socket is closed.
>
> 3) Then you can close the second half of the socket by calling
> closesocket();
>
> According to TCP/IP Illustrated Volume 1, using Shutdown is not common in
> applications (page 238, chapter 18.5):   Most applications will terminate
> the socket in both directions using the closesocket() command.
>
> Summary:
>
> If hosting applications do not support TCP Half Close, they might begin to
> see problems with specific versions of Microsoft IE and/or combos of the
> operating system.
>
> Supporting TCP Half Close fixes and enhances our web server and solves
this
> problem for our customers who have IE users.  So from our standpoint, the
> problem is now solved.
>
> Since our web server has been in existence since 1996 and has been well
> engineered and put through the test of time,  the recent avalanche of IE
> issues tells me that Microsoft has changed something recently in either in
> IE or in the Winsock layer in regards to SO_LINGER and closesocket().
The
> net effect is that many users are now experiencing this "Page Cannot be
> Displayed" and probably, without verification,  for web sites not using
the
> Microsoft IIS web server.  We are going to do a final test on this
> presumption.
>
> According to MSDN, using SO_LINGER is suppose to block the closing of the
> socket until data in the SEND queue has been exhausted (sent).  That's the
> purpose of the SO_LINGER option and had always worked for us since we
never
> saw this IE issue before.
>
> In other words, if there is still data in the send queue when the
> closesocket() is called, the SO_LINGER socket setting (see setsockopt() in
> MSDN) is suppose to BLOCK the sending the FIN packet to the receiver until
> all the data has been sent.
>
> However, this now brings up network transmission and packet sequence
issues.
>
> Although the socket send queue is now empty, depending on the user's
> connectivity reliable and speed,  it is my suspicion not all the PSH (push
> data) packets were acknowledged by the receiver by the time it received a
> FIN packet.  Hence when a PSH packet arrived after a FIN, it was for a
> closed socket.  Hence according to RFC 793, if an out of sequence packet
> arrives, a RST (reset) is sent back to the server.
>
> This explains why we were seeing multiple resends of the URL request by
the
> IE browser.
>
> For some people, they got the PAGE CANNOT BE DISPLAYED error.    I
> personally never saw this. Only the resends.  I should note one of our
> testers indicated this might depend on the IE setting "Show Friendly URL
> errors" options.   When ON, he got the page error.  When Off,  he got the
> resends.
>
> However, I am not sure that is consistent.  It believe it all depends on
the
> network transmission, i.e, a timing issue related to the user's internet
> connectivity.
>
> In any case,  the problem is solved for us.
>
> I hope the above info provides some info to Microsoft and other host
> developers who might run into the problem.
>
> Thanks to All who tested this IE issue at our web site.
>
> ---
> Hector Santos
> WINSERVER "Wildcat! Interactive Net Server"
> support: http://www.winserver.com
> sales: http://www.santronics.com
>
>
>
> "Francois PIETTE" <francois.piette@overbyte.be> wrote in message
> news:40ac7920$0$8414$a0ced6e1@news.skynet.be...
> > In a server application, sometimes there are a lot of sockets in
> CLOSE_WAIT
> > state (shown by netstat utility). It could even consume all available
> > sockets.
> > The questions are:
> > - How to close such socket so that they can be reused ?
> > - How to reduce the time the system allow a socket to be in CLOSE_WAIT ?
> > - How to completely avoid this state and still gracefully close tcp
> sessions
> > ?
> >
> > -- 
> > francois.piette@overbyte.be
> > The author for the freeware multi-tier middleware MidWare
> > The author of the freeware Internet Component Suite (ICS)
> > http://www.overbyte.be
> >
> >
>