Re: fgets() equivalent?

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Norman Diamond wrote:

In my experience fgets() wasn't even fgets() equivalent. In Windows a newline consists of a consecutive pair of characters CR-LF. When
reading text, the CR-LF is read, the CR is discarded, and the LF is stored into the program's buffer, which is fine so far. When reading text, if a bare LF is read, it's not a newline, but fgets() stops reading anyway and stores the LF into the program's buffer. So the program can't detect if a newline was actually reached or not. I had to change a program to read each byte in binary.

This is the subject of RAW vs COOK mode. Windows, and MS-DOS for that matter, in my opinion, has consistent logic for storage of text files that follows the history and logic of a teletype or printer using proper definition of control codes.

In other words you can just DUMP a unix file to a printer without taking into account how the HEAD is moving.

The \r means Carriage Return, its ASCII mnemonic is <CR>. Carriage is the HEAD of the printer, get it?. It means the HEAD to the back to the beginning of the SAME line of the paper or display.

The \n means Line Feed, its ASCII mnemonic is <LR>. It means the HEAD to moves DOWN to the next line of the paper or display.

Unix traditionally use <LF> in its files. Thats good and great, you save alot of bytes, but its not reality when it comes to printing or displaying a "console" type concept.

MSDOS and Windows uses something called COOK mode where it uses the \n to translate on output to <CR><LF> or in Unix mode "\r\n".

Why? Well you have to good back into history, but I will venture one primary reason is what comes first with the NEW LINE concept?

Do you move the printer head DOWN first and then move to
the far LEFT? Same as \n\r

or
Do you move the printer head to the far LEFT first and
then DOWN? Same as \r\n.

If you ever using a type writer, it is normally carriage return, line feed or \r\n.

It doesn't matter right?

Well, you can get into trouble when you dealing with text files. It has to be right, the EOL (end of line) is expected to have <CR><LF> bytes in MS-DOS based text files. Unix expects <LF> (\n) as the EOL byte.

Try dumping both style files to a PRINTER:

type unix-style.txt > PRN
type msdos-style.txt > PRN

and see which way is better and which one will drive that PRINTER nuts. :-)


Excel can write tab delimited text files, in which a newline consists of a consecutive pair of characters CR-LF, fields within a line are separated by tab characters, and sub-lines within a field are separated by bare LF characters. Apparently the maker of Excel and the maker of Visual C++ aren't on speaking terms with each other.

I don't think this applies TAB is an application intepretation for a EOL, a rare concept and probably only under special circumstances.

DotNet is worse. Internal to the library, one level of reading _does_ continue until a CR-LF pair, but another level of reading stops at the first bare LF and returns that result to the program.

Well, I venture DotNet is trying to be friendly to both MS-DOS and *nix worlds. This is normal. A lot of applications that have to deal with text files that come from both worlds MUST take that into account. All FTP Server has to deal with that all the time.

If the program tries to find the internal buffer pointer and resume the way that MSDN says to, then the library resumes from the wrong point and some input data get lost. In DotNet too I had to read each byte in binary.

Well, I am not sure what problem you are referring too but whatever it is, more than likely it is an innocent PE (Programmer's error). :-)

The bottom line you need to know what format your storage files are in or what potential format they can be in.

If you have an application that accepts text files from anyone, that could mean MS-DOS or UNIX world. Therefore, you have to pay attention to how your text file "Line" reader is going to behave. Look at the ReadLine() function I posted. That deals with both:

<LF> <--- Unix World
<CR><LF> <--- MS-DOS/WIndows world

Anway, in general, if you expect to be dealing with text files or input that may come from both worlds, then you need to read in RAW mode, not COOK mode. The applet needs to take into account the possible existence of one or both bytes.

Here is a program to explore it

// File: linereader.c

#include <stdio.h>
#include <windows.h>
#include <fcntl.h>

void DumpLine(char *sz)
{
int i;
for (i=0; i < strlen(sz); i++) {
switch(sz[i]) {
case '\n': printf("<LF>"); break;
case '\r': printf("<CR>"); break;
default:
printf("%c",sz[i]);
}
}
printf("\n");
}

int main(char argc, char *argv[])
{
char *szFileName;
char *szMode;
FILE *fv;

szFileName = argv[1];
szMode = (argc>2)?argv[2]:"rb";

fv = fopen(szFileName,szMode);
if (fv) {
char szLine[1024];
int nTotal = 0;
while (fgets(szLine,sizeof(szLine)-1,fv)) {
nTotal++;
DumpLine(szLine);
}
fclose(fv);
printf("-- Total Lines: %d\n",nTotal);
} else {
printf("oops error %d opening file\n",GetLastError());
}
return 1;
}

run it like so:

linereader textfile.txt rb > raw.txt
linereader textfile.txt rt > cook.txt

and see the difference.

--
HLS



.



Relevant Pages

  • Re: More on abstract musical conception and performance (WAS Re: Mahler #6)
    ... >>> has seemingly tried to explain by discussing the intermediation of the ... Some people can get closer to what was in Beethoven's head ... > you are supporting the contention that Paul took an untenable position ... I do such "abstract performance" for myself via score reading. ...
    (rec.music.classical.recordings)
  • River Flow Rates and Safety
    ... involved in the planning of Reading Uni Head. ... Back down on the Reading stretch today, however, we were on red boards ... despite the flow rate of the river being less than it was on the Trent ... river flow rate meter below the bridge - would it not be possible to ...
    (rec.sport.rowing)
  • Re: Poor performance from file system
    ... > head of the disk has to get into position to read the data, ... and the failure of any one of them could mean ... > last reading ... >>By and large, I do not care that the SCSI temperatures change, ...
    (comp.os.linux.misc)
  • Re: Any suggestions as to the code for making automated file names?
    ... But that's based on the premise that the entire domain you are reading is in the same ... the track was wide and the head positioning ... fairly straightforward, so the edge effects were not terribly ...
    (microsoft.public.vc.mfc)
  • Re: fgets() equivalent?
    ... In my experience fgets() wasn't even fgetsequivalent. ... When reading text, if a bare LF is read, it's not a newline, but fgetsstops reading anyway and stores the LF into the program's buffer. ... Excel can write tab delimited text files, in which a newline consists of a consecutive pair of characters CR-LF, fields within a line are separated by tab characters, and sub-lines within a field are separated by bare LF characters. ...
    (microsoft.public.win32.programmer.kernel)