Re: File-Compare "fc" falsely reports mismatch between identical files
- From: "Al Dunbar" <alandrub@xxxxxxxxxxx>
- Date: Sun, 14 Jun 2009 14:47:35 -0600
"billious" <billious_1954@xxxxxxxxxxx> wrote in message news:4a34a921$0$32367$5a62ac22@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
"Al Dunbar" <alandrub@xxxxxxxxxxx> wrote in message news:uqXzYkH7JHA.1432@xxxxxxxxxxxxxxxxxxxxxxx
"billious" <billious_1954@xxxxxxxxxxx> wrote in message news:4a33dc2e$0$32366$5a62ac22@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
"Rich Pasco" <richp1234@xxxxxxxxxxx> wrote in message news:O1ZzeNE7JHA.5780@xxxxxxxxxxxxxxxxxxxxxxx
Intriguing. I've seen FC do strange things in the past - comparing A against B giving different results from B against A, but didn't have time to investigate the problem.
The only difference I see in this case is the filename length - what happens if both filenames have the same length, both as 8.3 names and long filenames?
May be clutching at straws, but perhaps the buffer is shared between the filename and the data read; fine for real-world ASCII but not for binary being interpreted as line-oriented ASCII.
While this is indeed an interesting observation, it is illogical to insist/assume that a tool designed to compare text files should be able to consistently determine that two identical binary files are identical, because the flip side of that is that it should therefore always be able to determine that two non-identical binary files differ.
/Al
Hmm, I'd not completely agree.
No problem. I don't always fully agree with my own ideas...
Whereas the RESULT of the operation may not be correct, I'd suggest that the result should be CONSISTENT between runs; FC A B and FC B A for identical files being simply a particular case that /might/ point to a reason.
While it seems reasonable to say that the result should be consistent between runs, one could also say that, for consistency's sake, one should only use tools in the manner and for the purpose they were designed and intended.
Talikng theory, FC/B must be the SIMPLEST (but not the most-frequently used and hence default) mode. In the FC/B case, all that is required is to read the data from the two files into a buffer, compare them byte-by-byte and report any differences. Nothing complicated there.
Agreed.
FC/A is a completely different kettle of fish.
Right. On my system FC /? indicates that this means to display only the first and last lines of each set of differences, whereas /L is said to compare files as ascii text.
The data has to be read and assembled line-by-line using CRLF as an EOL, and stored to allow the line-buffer to be used. How is the line stored? No doubt C-style as an ASCIIZ - so how does the program react to a NULL read from its input data? How is ^Z treated? Is it just accepted as a "normal" character, or is it EOF since we're dealing with a ASCII-compare? COPY for instance appears to recognise ^Z as end-of-file and will terminate copying a binary file at the first one encountered - does FC follow this idea? By extension, should FC report two ASCII files as being identical if one is straight-ASCII and the other identical EXCEPT for an appended ^Z?
That depends on whether or not that is what everybody wants it to do. Since we, as a species, have been unable to provide a rock-solid definition of what a text file is (see http://www.google.ca/search?hl=en&q=define%3A+text+file&meta=&aq=f&oq=), we can hardly complain when this lack of clarity results in anomalies...
Then there's the /w problem - evidently implemented as a comparison between the data after it has been buffered.
What, in /A (default) mode is the result of reading a long-long-long line - is there a limit? And what about the last "line" of an ASCII file that ISN'T CRLF-terminated?
I'd suggest that IDENTICAL input should produce IDENTICAL results, and any inconsistent behaviour indicates an unwarranted assumption has been made - possibly uninitialised variables, and I'd claim that if this behaviour depends on the filenames involved, then that evidence reinforces my suspicions.
I agree with you almost completely. But I come to the conclusion that, for the most part, the "assumptions" are valid for most of the files that we generally consider to be "text files". Show me a couple of "text files" that fc/a does not compare properly, and I would argue that they are so extreme in some way that I would not consider them "text files". One of the definitions found by google is this: "A file that contains characters organized into one or more lines. The lines must not contain null characters and none can exceed the maximum line length allowed by the implementation." Ah, the implementation. In this case FC would be the implementation, would it not?
But, that said, what is your definition of a text file, and is that the authoritative definition? I mean, if there is no general agreement on a definition, then how can it be said that the assumptions made were incorrect?
But I see no indication here that FC gives non-identical results on identical input. FC A B may give different results from FC B A, however, I would suggest that, by definition, and from the point of view of FC, the input is therefore NOT identical. I would also suggest that FCA B appears to ALWAYS give the same results as itself, and that the same goes for FC B A. Reading ahead, you seem to suggest that this may not be true. I'll address that further down-thread.
I'd challenge your "flip-side" notion, too.
That is the part where I expected the most challenges...
There is a substantial amount of software that has been written that appears to work, but actually it fails to fail; working by chance rather than design.
Oh, you've seen some of my work, then, have you? ;-)
This can sometimes be proved by applying carefully-selected data; often it's been discovered by users and they've created manual procedures to avoid the known problems with the tools that they've been supplied. Often also, to prove the point one has to try to get an office manager with no IT comprehension to understand.
OK, let's put this into context: which software packages would you propose as having absolutely no flaws?
In the case of the FC problem to which I referred, I recall now that I was executing FC A B where A and B were COBOL source files resident on remote machines. Repeatedly running FC A B would suddenly fail with FC claiming that one of the files couldn't be found, although DIR, COPY, EDIT, etc. could find it - FC stubbornly refused to find it once it had decided that it didn't exist.
You have basically demonstrated that, like most of the world's known software, FC is not completely flawless. But you err by anthropomorphizing its motives ;-)
I strongly suspect that this was actually something wrong with the network - but the insane network administrator had to be in a rare calm mood for such matters to be discussed, and she'd fly into a rage of screamed accusations of "living in the past" if you were to use anything that wasn't point-click-and-giggle. Nothing about her set-ups could ever, EVER be her fundamental lack of appreciation of cause and effect.
If the writers of FC could have anticipated our expectations, perhaps they would have had its help text explain its limitations in terms of line length, number of lines, size, or whatever other assumptions might have been implicit at the time. I would also have suggested to them that if either or both of the files turned out to exceed the specifications of its definition of a "text file", that it should return an indicator that the files are not identical text files.
/Al
.
- Follow-Ups:
- References:
- File-Compare "fc" falsely reports mismatch between identical files
- From: Rich Pasco
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: foxidrive
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: Rich Pasco
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: foxidrive
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: Rich Pasco
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: billious
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: Al Dunbar
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: billious
- File-Compare "fc" falsely reports mismatch between identical files
- Prev by Date: Re: File-Compare "fc" falsely reports mismatch between identical files
- Next by Date: Re: File-Compare "fc" falsely reports mismatch between identical files
- Previous by thread: Re: File-Compare "fc" falsely reports mismatch between identical files
- Next by thread: Re: File-Compare "fc" falsely reports mismatch between identical files
- Index(es):