Re: fgets() equivalent?
- From: Pops <dude@xxxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 28 Nov 2007 06:15:44 -0500
Norman Diamond wrote:
In my experience fgets() wasn't even fgets() equivalent. In Windows a newline consists of a consecutive pair of characters CR-LF. When
reading text, the CR-LF is read, the CR is discarded, and the LF is stored into the program's buffer, which is fine so far. When reading text, if a bare LF is read, it's not a newline, but fgets() stops reading anyway and stores the LF into the program's buffer. So the program can't detect if a newline was actually reached or not. I had to change a program to read each byte in binary.
This is the subject of RAW vs COOK mode. Windows, and MS-DOS for that matter, in my opinion, has consistent logic for storage of text files that follows the history and logic of a teletype or printer using proper definition of control codes.
In other words you can just DUMP a unix file to a printer without taking into account how the HEAD is moving.
The \r means Carriage Return, its ASCII mnemonic is <CR>. Carriage is the HEAD of the printer, get it?. It means the HEAD to the back to the beginning of the SAME line of the paper or display.
The \n means Line Feed, its ASCII mnemonic is <LR>. It means the HEAD to moves DOWN to the next line of the paper or display.
Unix traditionally use <LF> in its files. Thats good and great, you save alot of bytes, but its not reality when it comes to printing or displaying a "console" type concept.
MSDOS and Windows uses something called COOK mode where it uses the \n to translate on output to <CR><LF> or in Unix mode "\r\n".
Why? Well you have to good back into history, but I will venture one primary reason is what comes first with the NEW LINE concept?
Do you move the printer head DOWN first and then move to
the far LEFT? Same as \n\r
or
Do you move the printer head to the far LEFT first and
then DOWN? Same as \r\n.
If you ever using a type writer, it is normally carriage return, line feed or \r\n.
It doesn't matter right?
Well, you can get into trouble when you dealing with text files. It has to be right, the EOL (end of line) is expected to have <CR><LF> bytes in MS-DOS based text files. Unix expects <LF> (\n) as the EOL byte.
Try dumping both style files to a PRINTER:
type unix-style.txt > PRN
type msdos-style.txt > PRN
and see which way is better and which one will drive that PRINTER nuts. :-)
Excel can write tab delimited text files, in which a newline consists of a consecutive pair of characters CR-LF, fields within a line are separated by tab characters, and sub-lines within a field are separated by bare LF characters. Apparently the maker of Excel and the maker of Visual C++ aren't on speaking terms with each other.
I don't think this applies TAB is an application intepretation for a EOL, a rare concept and probably only under special circumstances.
DotNet is worse. Internal to the library, one level of reading _does_ continue until a CR-LF pair, but another level of reading stops at the first bare LF and returns that result to the program.
Well, I venture DotNet is trying to be friendly to both MS-DOS and *nix worlds. This is normal. A lot of applications that have to deal with text files that come from both worlds MUST take that into account. All FTP Server has to deal with that all the time.
If the program tries to find the internal buffer pointer and resume the way that MSDN says to, then the library resumes from the wrong point and some input data get lost. In DotNet too I had to read each byte in binary.
Well, I am not sure what problem you are referring too but whatever it is, more than likely it is an innocent PE (Programmer's error). :-)
The bottom line you need to know what format your storage files are in or what potential format they can be in.
If you have an application that accepts text files from anyone, that could mean MS-DOS or UNIX world. Therefore, you have to pay attention to how your text file "Line" reader is going to behave. Look at the ReadLine() function I posted. That deals with both:
<LF> <--- Unix World
<CR><LF> <--- MS-DOS/WIndows world
Anway, in general, if you expect to be dealing with text files or input that may come from both worlds, then you need to read in RAW mode, not COOK mode. The applet needs to take into account the possible existence of one or both bytes.
Here is a program to explore it
// File: linereader.c
#include <stdio.h>
#include <windows.h>
#include <fcntl.h>
void DumpLine(char *sz)
{
int i;
for (i=0; i < strlen(sz); i++) {
switch(sz[i]) {
case '\n': printf("<LF>"); break;
case '\r': printf("<CR>"); break;
default:
printf("%c",sz[i]);
}
}
printf("\n");
}
int main(char argc, char *argv[])
{
char *szFileName;
char *szMode;
FILE *fv;
szFileName = argv[1];
szMode = (argc>2)?argv[2]:"rb";
fv = fopen(szFileName,szMode);
if (fv) {
char szLine[1024];
int nTotal = 0;
while (fgets(szLine,sizeof(szLine)-1,fv)) {
nTotal++;
DumpLine(szLine);
}
fclose(fv);
printf("-- Total Lines: %d\n",nTotal);
} else {
printf("oops error %d opening file\n",GetLastError());
}
return 1;
}
run it like so:
linereader textfile.txt rb > raw.txt
linereader textfile.txt rt > cook.txt
and see the difference.
--
HLS
.
- Follow-Ups:
- Re: fgets() equivalent?
- From: Norman Diamond
- Re: fgets() equivalent?
- References:
- Re: fgets() equivalent?
- From: Norman Diamond
- Re: fgets() equivalent?
- Prev by Date: Re: Efficient memory allocation
- Next by Date: XP: known SMB anomalies?
- Previous by thread: Re: fgets() equivalent?
- Next by thread: Re: fgets() equivalent?
- Index(es):
Relevant Pages
|