Re: getline and CR, LF, CR/LF, VS/Linux



On Tue, 12 Jul 2005 06:44:37 +1000, Ian Semmel
<isemmel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

>
>
>r norman wrote:
>>
>> The question isn't really whether STL is to blame or not. It is how
>> the operating system interprets what the C language calls '\n' as an
>> element in files that are specified to be "text" (whatever that
>> means). It isn't the STL that translates '\n' into either ASCII
>> crlf sequences or into the single lf character.
>>
>>
>
>I think a 'text' file is fairly obvious.
>
>I don't care one way or the other. The only reason I used STL was because the
>program is for Windows and Linux.

The notion of a binary file is, indeed, obvious. What goes into and
comes out of a binary file is the byte stream exactly as specified
(provided you are aware of packing density, byte order, internal
representation of floating point values, alphabet sets, etc.) But the
notion of a "text" file is far from obvious. It does NOT mean that
the file only contains ASCII printing characters. Some operating
systems put a special "end-of-file" marker at the end of a file
(ctrl-Z = 0x1A is a common one) and some do not. If it does and, for
some weird reason, the "text" file includes that character as data,
the operating system will stop reading even though most of the file is
still unread according to the byte count. Some systems automatically
replace an ascii lf (0x0A) in the output stream with a crlf pair
(0x0D-0x0A), reversing that process reading the file. That way, the
number of characters contained in the file is not the same as the
total number of characters in the strings you write to it. You run
into all sorts of difficulties in file seeking to a specified byte
count that way.

The interpretation of a raw cr (0x0D) varies by system. On many
devices (and this is the original concept), 0x0D was a simple return,
to the first column on the same line. Using 0x0D allowed you to
overprint old data. On those system, 0x0A was a simple line feed, to
the same column on the next line without returning to the beginning of
the line. On other devices, ones that emulate a manual typewriter,
the "carriage return" function included both aspects: go to the first
column of the next line. For data entry, it is traditional to use the
"Enter" or "Return" key (which ordinarily generates the 0x0D
character) to mean "this is the end of data entry for this line".
That means execute a cr/lf sequence on the display. In C, '\r' is the
0x0d character in binary while '\n' is 0x0a. However, what they
"mean" in terms of executing "text" input and output is highly
variable.

This is an ancient problem and you can't blame Windows. Before then,
MS-DOS and UNIX treated "text" files differently. And you can't even
blame Microsoft, the authors of MS-DOS because CP/M also differed from
Unix. And CP/M was, I believe, based on even earlier models, but that
is getting before my time.

Everybody who has ever moved "text" files between Unix-Linus-xxxIX
systems and CP/M - DOS - Windows systems has encountered this problem.
It used to be even worse. Not only did you have to worry about what
the operating system did, you had to also worry about what the display
and printer devices did. They often had internal switches that
controlled how they interpreted cr and lf bytes. It was not uncommon
to find nicely formatted text on the screen appearing double spaced on
the printer or, alternatively, with all the output scrolling out to
the far right side of the printer, never returning to the beginning of
the line.



.



Relevant Pages

  • Re: How to terminate a text file line in Unicode (in Java)
    ... In this case, technically, I might use any character sequence ... though the text file isn't really a text file if its line terminator is not one of the characters designated for such use in character code standards. ... This is somewhat obscure (since the operating system need not use such a convention) and reflects lack of rigorous standardization of the language. ... I guess both candidates are feasible, with no clear preference, but the context and purpose may make one of them preferable. ...
    (comp.std.internat)
  • Re: EVERY SINGLE McCAIN AD IS NEGATIVE
    ... Don't blame McCain's character. ... It may not be (mostly due to the media) but there is ample blame to ... That's true with regards to Fox News "fair and unbiased" my ass. ...
    (talk.politics.guns)
  • OT - line disciplines [was: Re: What IS THIS]
    ... which usually makes the operating system stop it (you can set ... >up signals to catch this, but again, thats not for this group). ... each input stream may be interpreted by a "line discipline" ... processing might include echoing the character as well as buffering it, ...
    (comp.lang.c)
  • 3D Desktop Environment/OS Prototype
    ... I was thinking about the security of an Operating System, and how that could remain just as organized as a file system. ... Imagine that when your computer boots up, it's waiting for a 3D map of icons in a first person point of view building. ... The added security is in that your first person character has to be located at a specific set range of coordinates to face and reach for/click on the icon. ... what I was thinking was to use a 3D graphing routien adopted for the sake of producing terrain maps that could literally go on for what would seem to be thousands of miles. ...
    (microsoft.public.windows.vista.general)
  • Beyond Vista!! 3D Desktop Environment/OS Prototype
    ... for a 3D map of icons in a first person point of view building. ... load to initialize the operating system. ... the character, an x,y for the pointer. ... you'll see that you can produce any kind of terrain ...
    (microsoft.public.windows.vista.general)