Re: CSocketFiles / CArchive vs Raw Buffer Manipulation
- From: Joseph M. Newcomer <newcomer@xxxxxxxxxxxx>
- Date: Sat, 20 Aug 2005 12:27:43 -0400
See below...
On 18 Aug 2005 07:00:43 -0700, "Josh McFarlane" <darsant@xxxxxxxxx> wrote:
>Joseph M. Newcomer wrote:
>> Since all you have is a stream of bytes, it is your responsibility to figure out what they
>> mean.
>> For example, to send an image, you would probably send a DWORD value (ideally in Network
>> Standard Byte Order) that indicated the total length of the message. Then you might send
>> the data as either raw image data or packaged up into groups of scan lines or whatever,
>> but until you had read all the bytes indicated by the leading byte count, you do not have
>> the entire image.
>> On the receiving side, you *must* know the subtype; otherwise it is impossible to
>> reconstruct the data. So, for example, you might send, as the second four-byte value (and
>> included in the byte count) an indicator of what type of package you are sending. At that
>> point, you would create an instance of the subclass and let it read the rest of the data
>> on its own. Since you already say that each type has its own serialization, it is
>> impossible for the superclass to read the data in a meaningful fashion.
>> joe
>
>So the best way to look at this is to transmit the size first, as a
>DWORD or other known value, followed by the type identifier, then use
>the type identify to take the remaining bits and reconstruct them into
>the correct package class?
****
That's how I'd do it. It is a common problem of the loss of class identity across binary
boundaries, such as network packet transmission. The way serialization works is to do
this same encoding automatically, but there are a huge number of problems with it (not
just the synchronous transmission problem, but a little nasty called "schema evolution"
which can be a real nightmare...what happens when the older version of the program sends
data to the newer version, or vice-versa? So often the second word is a "version number"
of the message; the third word is the data type; and the first word of each data type is
the "version number" of the data structure. I've found self-tagged binary data to be a
highly effective way of handling this problem. These days, I often choose to use text and
XML, because it is the same idea.
*****
>
>I think I'm learning alot more about it now, and it's turning out to be
>a much bigger ordeal than I imagined. Right now our UDP implementation
>pipes it directly to the other end, and how it does it without screwing
>up is beyond me (Out of maybe 100,000 data files we've transmitted,
>I've only seen 1 fail to send and cause a crash.)
****
UDP within a LAN often gives the effective illusion that it is reliable. But it is only
an illusion. It critically depends upon the fact that there is no router, that the
receiver is able to keep up with the data rate, and the transmitter does not overflow its
own network's internal buffers. So it sort of mostly works on good days, but is free to
fail without warning of any sort to either the sender or the receiver.
I even once saw a post from someone who was complaining that he could only send some 1400
bytes in a UDP packet, not realizing that UDP is only required to send 512 data bytes (536
bytes total packet size) and is free at any router to truncate the message without any
notification.
There is a very sound set of reasons that UDP is formally defined as an "ureliable"
protocol. There is no guarantee that a message sent will be received; there is no
guarantee that a message sent will not arrive multiple times; there is no guarantee that
given the sender transmitted messages A, B, C that they will be seen by the receiver as A,
B, C; in fact, the receiver could see ACB, CAB, CAAB, BACA, ACBB, ABB, AC, BA, CC, or any
other possible permutation of these three symbols, including adding arbitrary duplication,
you could care to invent. Or the recipient might see nothing at all. All these are
permitted. Note that if a message is lost, truncated, or duplicated, neither the sender
nor the receiver will ever know, unless they implement a higher-level protocol such as
putting sequence numbers in each packet, having retransmission protocols and timeouts,
etc. Also, UDP has no flow control, so a rogue UDP sender can flood the downstream router
or receiving computer, any of which are free to simply discard packets without
notification to sender or receiver. TCP/IP has positive-acknowledgement flow control. So
if I send ABC using TCP, I will receive ABC. No duplicates; no out-of-order; no lost
packets; no erroneous consumption of network resources; and if there is any serious error
(not counting the implicit retransmissions that go on under the floor if packets are
lost), both the sender and the receiver will know that there was a problem.
****
>
>All this discussion also makes me glad we're moving over to TCP
Joseph M. Newcomer [MVP]
email: newcomer@xxxxxxxxxxxx
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
.
- References:
- CSocketFiles / CArchive vs Raw Buffer Manipulation
- From: Josh McFarlane
- Re: CSocketFiles / CArchive vs Raw Buffer Manipulation
- From: Josh McFarlane
- Re: CSocketFiles / CArchive vs Raw Buffer Manipulation
- From: Joseph M . Newcomer
- Re: CSocketFiles / CArchive vs Raw Buffer Manipulation
- From: Josh McFarlane
- CSocketFiles / CArchive vs Raw Buffer Manipulation
- Prev by Date: Re: Looking for information on MFC TCP Communication
- Next by Date: Re: 16 bit code
- Previous by thread: Re: CSocketFiles / CArchive vs Raw Buffer Manipulation
- Next by thread: Re: CSocketFiles / CArchive vs Raw Buffer Manipulation
- Index(es):
Relevant Pages
|