Re: File-Compare "fc" falsely reports mismatch between identical files
- From: "Al Dunbar" <alandrub@xxxxxxxxxxx>
- Date: Mon, 15 Jun 2009 23:15:49 -0600
"billious" <billious_1954@xxxxxxxxxxx> wrote in message news:4a3604db$0$32349$5a62ac22@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
"Al Dunbar" <alandrub@xxxxxxxxxxx> wrote in message news:uzMbbFT7JHA.4116@xxxxxxxxxxxxxxxxxxxxxxx
<snip>
Right. On my system FC /? indicates that this means to display only the first and last lines of each set of differences, whereas /L is said to compare files as ascii text.
Frankly, the distinction escapes me. /A may or may not be ASCII - I believe that it's assumed to be ASCII on the grounds that /B is Binary. Perhaps it is not so.
To reproduce the line-before and the line-after (which appears to be the /A documented and the default behaviour) it would seem that FC interprets the file - let's assume it's a text file for simplicity's sake - in a line-oriented manner. So what is the distinction between /L and /A?
I suspect that /A may *imply* /L. But, according to FC ?, /A does not mean to do an ASCII comparison.
Also, the documentation and behaviour appear difficult to reconcile. What is really meant by "first and last lines for each set of differences?" For instance, if we have the sequence SSDDSDDDS (Same/Different lines) then FC appears to show SDDSDDDS, starting at the second "Same" line and ending with the fourth. It could be argued that SDDS is one "set" and SDDDS is a second "set" hence the third Same line should be reproduced both as the last line of the first "set of differences" and also as the first of the second "set of differences."
And if it is LINES that are being compared, what is the difference between /A and /L mode? Both are line-oriented.
apparently
[behaviour with "non-ASCII" files]
That depends on whether or not that is what everybody wants it to do. Since we, as a species, have been unable to provide a rock-solid definition of what a text file is (see http://www.google.ca/search?hl=en&q=define%3A+text+file&meta=&aq=f&oq=), we can hardly complain when this lack of clarity results in anomalies...
Much heartache is caused by the assumption that what is "standard" in Redmond is some variety of universal standard.
You've made this assumption? I haven't.
<snip>
I agree with you almost completely. But I come to the conclusion that, for the most part, the "assumptions" are valid for most of the files that we generally consider to be "text files". Show me a couple of "text files" that fc/a does not compare properly, and I would argue that they are so extreme in some way that I would not consider them "text files". One of the definitions found by google is this: "A file that contains characters organized into one or more lines. The lines must not contain null characters and none can exceed the maximum line length allowed by the implementation." Ah, the implementation. In this case FC would be the implementation, would it not?
I'd suspect that the eariest FC implementations were assembler, oriented toward 80-column data. It would seem unreasonable in that environment to produce a report wider than 80 columns, given the peripherals commonly in use at the time.Even had the output of FC been sent to a file and typed, word-wrapping on an 80-column screen would have been tedious and difficult to interpret.
Also, in those days 7-bit ASCII was de force. A few control characters were used - CR,LF,FF,TAB - but the others had little relevance to the printed document. Were the "high-ASCII" characters graphics or special characters used in non-English alphabets? Unicode was way in the future...
As techniques have moved away from these earlier ideas, so the definitions have become more fuzzy.
You are probably right.
But, that said, what is your definition of a text file, and is that the authoritative definition? I mean, if there is no general agreement on a definition, then how can it be said that the assumptions made were incorrect?
Aye, that's the nub of the problem. I beliieve text files were originally assumed to be 7-bit ASCII, organised as "lines" being terminated by a CRLF sequence. "Lines" could be up to 80 characters long.
But each of these "requirements" is rubbery. "80" characters could be 132 - the common printer width for 15" printers. Or 164 or so, in compressed-print, or more with proportional-print, or more if the "text" was data not meant to be printed. 7-bit could be expanded to take care of accented characters, etc.
In the end, it becomes a meaningless, yet surprisingly commonly-used term. If the line length is not limited, and the character-set processed is not limited, then what is the difference between "ASCII" and "Binary?" It becomes simply a binary-compare in blocks delimited by the arbirary CRLF sequence. What "authority" is going to impose a line-length or character-set limit - and remember that there will always be the dissenters who want "just a few more characters" or "oh - and this character, too."
The lack of an absolute definition of what a text file is does not mean that there is no benefit in making the distinction between ASCII or text and binary.
<snip>
But I see no indication here that FC gives non-identical results on identical input. FC A B may give different results from FC B A, however, I would suggest that, by definition, and from the point of view of FC, the input is therefore NOT identical. I would also suggest that FCA B appears to ALWAYS give the same results as itself, and that the same goes for FC B A. Reading ahead, you seem to suggest that this may not be true. I'll address that further down-thread.
Hmm. If A and B are two separate files with identical contents, then FC processes the file differently depending on their names. This indicates that FC's output does not depend WHOLLY on the contents of the files examined.
What guarantee is there then, that there are no other circumstances when the NAMES of the files will influence the outcome of FC's processing?
The names, or some other factor that we have not considered - now *that* is the question, especially if we are talking about the outcome when FC compares files that are well within the simpler definitions of text files.
Could it be that FC simply fails to fail? (and from there, why?)
Well, as I have tried to imply, I am not convinced that it is rationale to suggest that a tool can be said to fail when it is used in a manner and for a purpose which it was clearly not intended ;-)
<snip>
Again, the fail-to-fail scenario. At the company I've referred to on may occasions, their previous programmers had incorrectly implemented a price-loading formula (costing the company money) and had also incorrectly calculated tax payable - for a period of six years. Company management insisted on the first one being corrected, but that the second be ignored. A few months later, the tax department reacted to a customer's complaint and insisted that the faulty tax calculation be fixed. No pride from management in doing the right thing, leading to problems. Believe me, they couldn't AFFORD to have a proper tax investigation of their affairs....
Interesting analogy. Perhaps MS will be forced to fix FC when it has been demonstrated to be a serious problem, serious meaning something like lawsuits ;-)
<snip>
If the writers of FC could have anticipated our expectations, perhaps they would have had its help text explain its limitations in terms of line length, number of lines, size, or whatever other assumptions might have been implicit at the time. I would also have suggested to them that if either or both of the files turned out to exceed the specifications of its definition of a "text file", that it should return an indicator that the files are not identical text files.
You mean - meaningful documentation?
Yes. But I'm not holding my breath on this or on any of the many other things that could stand to be corrected. Like, for example, when you shutdown a windows 98 system, it is left with a message on the screen to the effect that: it is now safe to turn your computer off. When it was my employer's computer that I was shutting down, I wondered how the o/s could tell that it would actually be safe for me to power off my computer at home at that particular moment.
But as for FC - I'd like it to have a /Q option - to suppress output ans simply set ERRORLEVEL, as a simple same/different condition is often all that is required. Now that brings up /W - should /W simply define any sequence of 1 or more TABs is the same as 1 or more SPACES, or should the TABS be expanded out as spaces to - well, columns of 8, 5, 4 or 2 (I've seen all defined as "standard" in different environments) - or perhaps go back to the old typewriter standard of "wherever you want them this time." What about tabs/spaces appearing as trailing whitespace? In this case, one or more space/tabs would match zero or more space/tabs before CRLF. Perhaps we need a switch for this, too. Trailing whitespace can cause much grief on a batch "SET var=" line...
Possibly even a /i swich for a case-insensitive version...
Perhaps there is yet time for you to develop the ultimate file comparison program...
/Al
.
- Follow-Ups:
- References:
- File-Compare "fc" falsely reports mismatch between identical files
- From: Rich Pasco
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: foxidrive
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: Rich Pasco
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: foxidrive
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: Rich Pasco
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: billious
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: Al Dunbar
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: billious
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: Al Dunbar
- Re: File-Compare "fc" falsely reports mismatch between identical files
- From: billious
- File-Compare "fc" falsely reports mismatch between identical files
- Prev by Date: Re: File-Compare "fc" falsely reports mismatch between identical files
- Next by Date: Re: File-Compare "fc" falsely reports mismatch between identical files
- Previous by thread: Re: File-Compare "fc" falsely reports mismatch between identical files
- Next by thread: Re: File-Compare "fc" falsely reports mismatch between identical files
- Index(es):
Relevant Pages
|