Re: About zero-byte receive (IOCP server)



In a TCP / IP / Ethernet connection, a very common scenario for a TCP
connection but by no means the only one, while the Ethernet frames can be up
to 1500 bytes (or larger if jumbo-frames are enabled), the TCP payload data
contained within a frame is always smaller and may be zero. This payload is
reassembled into the TCP stream by the network stack and then presented to
your application via a socket interface with no reliable correlation between
Ethernet frame size or payload and data returned in a call to WSARecv.



The reason why a zero-byte receive is inefficient has less to do with how
data is transferred over the network, and more to do with how data is
transferred from the network stack into your application.



If you post a zero-byte receive on a socket and, upon completion, read
synchronously, the network stack will assemble the incoming TCP stream data
in the socket receive buffer, then signal IO completion. When the IO
completion packet is processed by one of your IO worker threads, using
GetQueuedCompletionStatus, that thread will call WSARecv, and the new TCP
stream data will be copied from the socket receive buffer to your user
buffer and the call will return. This process has three UM-KM transitions
(post zero-byte receive, GQCS, & WSARecv), and two memory copies (from
packets to socket receive buffer, & from socket receive buffer to user
buffer).



If you post a non-zero byte receive and, upon completion, simply process the
data that was returned, the network stack will assemble the incoming TCP
stream data into the user buffer for the pending receive and signal IO
completion. When the IO completion packet is processed by one of your IO
worker threads, it already has the next piece of TCP data to process and it
can just use it. This process has only two UM-KM transitions (post receive
& GQCS) and only one memory copy (form packets to user buffer) thus saving
substantial processing. This method has the additional advantage that the
TCP window is not reduced, because the socket receive buffer has not been
used, and the counterparty can continue sending data as fast as it wants.
If the data rate is really high, one could even go further by posting
multiple receives to each socket so that wile the IO worker thread was
processing the result of one receive, there would still be another user
buffer to fill.



This may not seem like much of a savings, but even with sysenter / sysleave,
a UM-KM transition is at least several thousand instructions and has system
wide effects (CPU cache lines etc.); and the advantage of avoiding a memory
copy is, obviously, directly proportional to the data rate. Unless you are
working on a memory starved system or servicing a large number of
connections that rarely send anything, the advantage is obvious.



BTW: why would you think I work at MS? Most MS employees around here have
[MSFT] in there names while I like to remain very anonymous.



"George" <George@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:6866B934-289A-404F-A087-A532E253D9EA@xxxxxxxxxxxxxxxx
m,

in response to my question you said that a zero-byte receive will complete
as soon a single byte is received and a wsarecv with buffer will complete
with whatever data is received. Tcp data is sent in packets usually around
1500 bytes, shouldn't both functions complete with 1500bytes of data?

also do you work at MS?



"m" wrote:

IMHO, while this approach will work, I would recommend against the use of
zero-byte receives in any IOCP server. This design is a signal and
process
scheme that will perform badly for high data rates and be wastefully
inefficient in all circumstances except when hosting a very large number
(>10,000) connections that only rarely send anything (mass broadcast
application).



Most modern hardware has at least 1 GB of RAM so using 10-50 MB of it to
dramatically improve the IO efficiency only makes sense.


.



Relevant Pages

  • Re: Socket stuck with puts over ADSL line
    ... gets stuck with the puts command within the filevent writeable ... Is the socket configured as -blocking 1? ... local buffer would fill rapidly, ... buffered portion across the WAN as its own TCP packet, ...
    (comp.lang.tcl)
  • Re: Coordinating TCP projects
    ... It is rapidly becoming clear that quite a few of us have Big Plans for the TCP implementation over the next 12-18 months. ... Jim and I recently discussed the idea of implementing autotuning of the TCP reassembly queue size based on analysis of some experimental work we've been doing. ... This means that if the TCP window grows to be more than 48 segments wide, and a packet is lost, the receiver will buffer the next 48 segments in the reassembly queue and subsequently drop all the remaining segments in the window because the reassembly buffer is full i.e. 1 packet loss in the network can equate to many packet losses at the receiver because of insufficient buffering. ... We observed that the socket receive buffer size provides a good indication of the expected number of bytes in flight for a connection, and can therefore serve as the figure to base the size of the reassembly queue on. ...
    (freebsd-arch)
  • Re: Coordinating TCP projects
    ... It is rapidly becoming clear that quite a few of us have Big Plans for the TCP implementation over the next 12-18 months. ... Jim and I recently discussed the idea of implementing autotuning of the TCP reassembly queue size based on analysis of some experimental work we've been doing. ... This means that if the TCP window grows to be more than 48 segments wide, and a packet is lost, the receiver will buffer the next 48 segments in the reassembly queue and subsequently drop all the remaining segments in the window because the reassembly buffer is full i.e. 1 packet loss in the network can equate to many packet losses at the receiver because of insufficient buffering. ... We observed that the socket receive buffer size provides a good indication of the expected number of bytes in flight for a connection, and can therefore serve as the figure to base the size of the reassembly queue on. ...
    (freebsd-net)
  • Re: Coordinating TCP projects
    ... It is rapidly becoming clear that quite a few of us have Big Plans for the TCP implementation over the next 12-18 months. ... Jim and I recently discussed the idea of implementing autotuning of the TCP reassembly queue size based on analysis of some experimental work we've been doing. ... This means that if the TCP window grows to be more than 48 segments wide, and a packet is lost, the receiver will buffer the next 48 segments in the reassembly queue and subsequently drop all the remaining segments in the window because the reassembly buffer is full i.e. 1 packet loss in the network can equate to many packet losses at the receiver because of insufficient buffering. ... We observed that the socket receive buffer size provides a good indication of the expected number of bytes in flight for a connection, and can therefore serve as the figure to base the size of the reassembly queue on. ...
    (freebsd-arch)
  • Re: Coordinating TCP projects
    ... It is rapidly becoming clear that quite a few of us have Big Plans for the TCP implementation over the next 12-18 months. ... Jim and I recently discussed the idea of implementing autotuning of the TCP reassembly queue size based on analysis of some experimental work we've been doing. ... This means that if the TCP window grows to be more than 48 segments wide, and a packet is lost, the receiver will buffer the next 48 segments in the reassembly queue and subsequently drop all the remaining segments in the window because the reassembly buffer is full i.e. 1 packet loss in the network can equate to many packet losses at the receiver because of insufficient buffering. ... We observed that the socket receive buffer size provides a good indication of the expected number of bytes in flight for a connection, and can therefore serve as the figure to base the size of the reassembly queue on. ...
    (freebsd-net)