Re: FindFirstFile, how much faster than FindNextFile?
From: Joseph M. Newcomer (newcomer_at_flounder.com)
Date: 02/11/05
- Next message: Scott McPhillips [MVP]: "Re: Dialoog Box Close Buttons Won't Work"
- Previous message: Scott McPhillips [MVP]: "Re: AfxBeginThread argument is altered"
- In reply to: John Doe: "Re: FindFirstFile, how much faster than FindNextFile?"
- Next in thread: John Doe: "Re: FindFirstFile, how much faster than FindNextFile?"
- Reply: John Doe: "Re: FindFirstFile, how much faster than FindNextFile?"
- Reply: John Doe: "Re: FindFirstFile, how much faster than FindNextFile?"
- Reply: Alexander Grigoriev: "Re: FindFirstFile, how much faster than FindNextFile?"
- Messages sorted by: [ date ] [ thread ]
Date: Fri, 11 Feb 2005 10:43:26 -0500
You have many issues here. There's nothing wrong with your code that a complete rethinking
and rewrite won't solve.
First, you are presuming the files are in the root directory. This would at best be a
horrible assumption; in some installations, the C:\ directory is actually protected, so
the presumption that this is a valid place to have a program put or find files is not a
particularly good assumption. For that matter, why do you think C:\ has any meaning? In a
multi-boot system the main partition might be D:\ or E:\ or something. It might even be
that they are in the My Documents folder, which is the more likely place to find them in
most real installations.
Next, you say you want an arbitrary substring of the filename, but that's not what your
patterns are suggesting. If I want to find any file that contains "KeyA" anywhere in it,
the correct pattern is not
*KeyA.txt
but
*KeyA*.txt
At least, this is what the dir command does. I do not know if FindFirstFile supports this
particular pattern match or not (I suspect it does).
But the whole notion that you would hand-code 50 unique patterns is remarkably silly. What
is the meaning of the index. This goes back to my original question, where do you get the
strings for comparison? You erroneously answered where you got the filenames for the other
half of the comparison; I knew that you got them from FindFirstFile. But why do you need a
case at all? Just do
FindFirstFile( strings[i], ...)
and you don't need something as silly as 50 cases to handle a trivial problem. Just an
array of 50 strings!
Your code is inefficient to write, and consequently hard to maintain. You are doing this
in the worst possible way.
You should not presume anything about the viability of C:\ as a path. Instead, the path
should be independently computed and concatenated to the filename pattern, e.g.,
FindFirstFile( basedirectory + strings[i], ...)
so the whole discussion about whether a switch statement is "faster" is irrelevant. It
should not even exist!
Since it is clear the table is compile-time constant,
static const LPCTSTR strings[] = {
_T("*KeyA*.txt"),
_T("*KeyB*.txt"),
...
NULL // end of table
};
I threw a NULL in for end-of-table in case you need to iterate over the table. If you need
a second component to use as a value then just create
static const struct {
LPCTSTR pattern;
LPCTSTR name } strings[] = {
{ _T("*KeyA*.txt"), _T("KeyA.txt") },
{_T("*KeyB*.txt"), _T("KeyB.txt") },
...
{ NULL, NULL} // end of table
};
then you could write
FileHandle = FindFile(basedirectory + strings[i].pattern, &FindData);
FileName = strings[i].name;
Why are you using strcpy? This already suggests serious problems with the code. It is not
Unicode-aware programming for one thing, and it implies you have a fixed-size buffer into
which you are copying a string, which immediately suggests that you are begging for some
serious bug to arise. if FileName were a CString, then a simple assignment would have
worked.
Hint: if you ever write strcpy or strcat, THINK REAL HARD ABOUT WHY YOU DID IT. The answer
is usually that you made a fundamental coding error. I use strcpy and strcat so
infrequently that I find it a surprise when I need to do so (actually, the most often use
I make of strcpy, or to be precise, _tcscpy, is putting the font name into a LOGFONT
structure). I can't recall the last time I wrote strcat. Use CString, or if you like the
C++ standard library, the string type. There is rarely a need in modern MFC programming to
use cpy or cat functions. So rare that I don't believe it is appropriate here.
Lose the switch statement. Rethink the problem and if the answer does not seem obvious,
ask it in terms of a program construct that would make sense.
In fact, when I re-read your message, the actual solution is even simpler, if the ? is
unique.
CString suffix = _T("ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"); // for example
FileHandle = FindFirstFile( MakePattern(keybase + suffix[i]), &FindData);
FileName = keybase + suffix[i] + extension;
CString MakePattern(const CString & keybase)
{
return basedirectory + _T("*") + keybase + _T("*") + extension;
}
And PLEASE do not suggest that this is "inefficient"; it is so many orders of magnitude
faster than the standard deviation of the time required for FindFirstFile that the cost of
building the string will not even be measurable.
In rereading the code once more, I see that you really DIDN'T mean case 65; you almost
certainly meant case 'A', which is what you should have written. In this case, the code is
even more trivial: you just concatenate the character you would have used in the switch
statement!
TCHAR value;
FileHandle = FindFirstFile( MakePattern(keybase + value), &FindData);
FileName = keybase + value;
A good refresher on basic principles of the C language seems a good investment of your
time. If you want to represent the value of the character 'A', the value 'A' is a much
better choice than the value 65. So if you wanted a case based on letter values, the
preferred way of writing it would be
case 'A':
... something here
break;
case 'B':
... something for the B case
break;
Note that if you had actually indicated this by something as obvious as using the value
you meant, 'A', instead of some bizarre numeric encoding of the character, the code would
have been more obvious. The fact that it took me three careful readings of your code to
try to make sense out of it--that you had written code so convoluted and obscure that its
meaning was not obvious on first reading--already suggests so many failures of the coding
process that you need to scrap this entirely and rewrite it.
Why write a switch statement with 4 lines of code for each of 50 cases (the case plus the
three lines of executable code--that's 200 lines of code!) when you can write the whole
thing in about three lines of code? And have it more robust and reliable, because silly
artifacts like "C:\\" are not hardwired!
I've just reduced 200 lines of code to 3 lines of code, which is 66:1 compression. And,
more to the point, I've reduced 200 lines of incredibly bizarre and hard-to-write,
hard-to-modify, hard-to-read and hard-to-maintain code to three lines of straight-line
code that were trivial to read, modify, write and maintain.
Remember this: on a 2GHz machine, each machine cycle for computing costs 1/2 of a
nanosecond. A high-performance 10,000rpm hard drive requires 6ms to complete one
revolution, so even if the seek head is exactly positioned on the right cylinder, it takes
on the average 3ms to transfer the data (worst case is 6ms). That's 6,000,000 machine
cycles for the average case. Most instructions take one cycle if the data is ready, and I
haven't even accounted for the superscalar nature of the Pentium, where you can be
executing as many as three instructions per CPU clock cycle, or such features as
speculative read, instruction fetch-ahead, etc. which minimize the delays of waiting for
memory because data will be prefetched to the L2 cache. And if the disk arm has to move to
locate something, the cost of FindFirstFile is off the scale compared to CPU operations.
(And yes, the file system cache will expedite this, so it may be only a couple orders of
magnitude slower to FindFirstFile than to do CPU-intensive comparse, but that's better
than the five to seven orders of magnitude difference that could result if multiple seeks
are required).
I think you need to focus less on what is "efficient" in terms of the API interactions and
more on basic principles of constructing code that is easy to write, read, understand and
maintain, and is robust and correct under actual user scenarios. Your example meets none
of the previous criteria. Get the code RIGHT first, then worry about efficiency. And
"right" does not simply mean "computes the result I want". It also means the code meets
those criteria of clarity, maintainability and robustness.
joe
On Thu, 10 Feb 2005 20:56:56 GMT, John Doe <jdoe@usenet.is.the.real.thing.com> wrote:
>"Scott McPhillips [MVP]" <org-dot-mvps-at-scottmcp> wrote:
>>John Doe wrote:
>
>>> The switch/case method requires falling through up to 45 cases
>>> and then executing one FindFirstFile().
>>>
>>> The FindFirstFile()/FindNextFile() requires executing one
>>> FindFirstFile() and then up to 49 FindNextFile() iterations.
>>>
>>> Both require one FindFirstFile(). I think my question boils down
>>> to whether falling through the cases is faster than looping
>>> through the FindNextFile()s.
>
>>Using FindFirstFile with 50 patterns is going to read through the
>>directory file 50 times.
>
>Is using FindFirstFile() to search a folder for the less specific
>string "*.txt" and finding it sooner much faster than using
>FindFirstFile() to search one time for the single more specific
>string like "*KeyB.txt"?
>
>Thank you.
>
>
>
>
>Just in case this needs clarification:
>
>This is the switch statement method which enters only one case and
>searches only once for a more specific string. It doesn't use
>FindNextFile().
>
>>> case 65:
>>> {
>>> FileHandle=FindFirstFile("C:\\*KeyA.txt",&FindData);
>>> strcpy(FileName,"KeyA.txt");
>>> break;
>>> }
>>> case 66:
>>> {
>>> FileHandle=FindFirstFile("C:\\*KeyB.txt",&FindData);
>>> strcpy(FileName,"KeyB.txt");
>>> break;
>>> }
>>> ... 43 additional similar cases, the only thing that
>>> changes is the string "Key?.txt"
>
>This method searches for a less specific string first and then loops
>searching for the more specific string.
>
>>> FileHandle=FindFirstFile("C:\\*.txt",&dirInfo);
>>> if(FileHandle!=INVALID_HANDLE_VALUE)
>>> {
>>> do
>>> {
>>> using strstr() to search 50 filenames for a given
>>> substring like "KeyB"
>>> }
>>> while(::FindNextFile(FileHandle,&FindData));
>>> }
>>> FindClose(FileHandle);
>
>
>
>
>
>>Using the FindNextFile loop is going to read
>>through the same directory file only once - much more efficient.
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
- Next message: Scott McPhillips [MVP]: "Re: Dialoog Box Close Buttons Won't Work"
- Previous message: Scott McPhillips [MVP]: "Re: AfxBeginThread argument is altered"
- In reply to: John Doe: "Re: FindFirstFile, how much faster than FindNextFile?"
- Next in thread: John Doe: "Re: FindFirstFile, how much faster than FindNextFile?"
- Reply: John Doe: "Re: FindFirstFile, how much faster than FindNextFile?"
- Reply: John Doe: "Re: FindFirstFile, how much faster than FindNextFile?"
- Reply: Alexander Grigoriev: "Re: FindFirstFile, how much faster than FindNextFile?"
- Messages sorted by: [ date ] [ thread ]