RE: BUG in Winsock on P4 HT CPU

From: Glambert (Glambert_at_discussions.microsoft.com)
Date: 09/07/04


Date: Tue, 7 Sep 2004 08:05:03 -0700

If you read the help on WSAAsyncSelect it states
-------------------------------------------------------------
For FD_READ, FD_OOB, and FD_ACCEPT events, message posting is
level-triggered. This means that if the reenabling routine is called and the
relevant condition is still met after the call, a WSAAsyncSelect message is
posted to the application. This allows an application to be event-driven and
not be concerned with the amount of data that arrives at any one time.
Consider the following sequence:

Network transport stack receives 100 bytes of data on socket s and causes
Windows Sockets 2 to post an FD_READ message.
The application issues recv( s, buffptr, 50, 0) to read 50 bytes.
Another FD_READ message is posted since there is still data to be read.
With these semantics, an application need not read all available data in
response to an FD_READ message—a single recv in response to each FD_READ
message is appropriate. If an application issues multiple recv calls in
response to a single FD_READ, it can receive multiple FD_READ messages. Such
an application can need to disable FD_READ messages before starting the recv
calls by calling WSAAsyncSelect with the FD_READ event not set.

---------------------------------------------------------------

Based on the above, if you want to call recv multiple times within
OnReceive(), you should first disable FD_READ (via AsyncSelect()), issue all
of your recv (Receive()), and then reenable FD_READ (via
AsyncSelect(FD_READ)). This will work correctly. However, for CAsychSocket
derived classes you need to keep track of the events you have enabled (I wish
there was a way to query winsock for the events you are currently registered
for).

The tough part is I did not see any explicit documentation that states "You
cannot call Receive() multiple times from with OnReceive()". I read the
WSAAsyncSelect dcoumentation to eventually figure it out (and then I read it
about 10 times before realising what had to be done). I am not an MFC expert
by any means so perhaps the OnReceive() documentation is out there
somewhere...but it would have been nice to see it explicitly stated.

"Greg Ennis" wrote:

> I have confirmed a bug in Winsock API when running on fast Hyperthread P4
> CPU's. If you look in the newsgroups for posts by Glambert and myself you
> will see other references to it, but here is the basic facts of the matter.
>
> Winsock has a problem with intermittently not posting the socket
> notification message when data is received on a socket. This only happens on
> fast HT enabled P4 CPUs.
>
> Here is how you can reproduce it. Write a simple application that creates a
> loopback CAsyncSocket socket and connects to itself. In each socket's
> OnReceive handler, read 2 bytes and send 2 bytes back over the socket. When
> the connection is established, send 2 bytes from the 'client' socket to kick
> it off.
>
> This app should run forever, just sending bits back and forth to itself. And
> indeed it does... on CPU's that do not support HT, or CPUs with HT disabled.
> Run it on a HT CPU and guess what? After only exchanging a few packets, it
> stops. It stops dead because one of the sockets did NOT receive the socket
> notification message on its window handle. I have debugged it thoroughly and
> I have set it up using 2 computers over ethernet, one of which is a P4 HT,
> and watched using TCP packet sniffer. The P4 HT CPU is ALWAYS the guilty
> party, it is the one that stops sending!
>
> Here is further proof of the bug, as GLambert has discovered. If you call
> AsyncSelect() after reading all the available data, it works fine. The socket
> notification message appears to always be posted correctly. However, I can't
> guarantee this on all platforms, since it is a workaround at best.
>
> Waiting on an answer from MS from this.
>



Relevant Pages

  • Re: Receive no data
    ... It is a best practice to shutdown ... a socket for sending once all data has been sent. ... I see multiple recv() statements in the code. ...
    (microsoft.public.win32.programmer.networks)
  • Re: recv() hangs until SIGCHLD ?
    ... Both lsof and ls /proc//fd show that the socket used is in ESTABLISHED mode but when checking on the host on which it's connected we can't find the corresponding client socket. ... We are correctly handling EINTR in sendand recv() by restarting the call in case they get interrupted this way. ... However since this problem does not occur without threads, we can be sure that the blame is still on the receiver. ... In a practical case, we have a thread blocked in recvfor more than 12 hours, which is way beyond the timeout of the sender connection. ...
    (Linux-Kernel)
  • Re: how to avoid recv() blocking issue?
    ... using WSAEventSelect. ... we have a weak link in a chain: a timeout. ... drop the connection abnormally such that the blocking recv will never ... TerminateThread- btw, it's not a good idea, because a socket ...
    (microsoft.public.win32.programmer.networks)
  • Re: Connection broken on recv() function
    ... >>> with the error code WSAECONNRESET. ... it still blocked in the recv() function. ... >>> I don't want to use a nonblock socket. ... On my system whenever I unplug the cable: ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Sockcore assert at line 520
    ... the CAsyncSocket wraps a SOCKET object. ... But in reading the WSAAsyncSelect, ... and all you have to have is the .pdb file ... >WinDbg, last week I tried installing WinDbg on a machine where I've got VC ...
    (microsoft.public.vc.mfc)