Re: High-performance IO
- From: "anton bassov" <soviet_bloke@xxxxxxxxxxx>
- Date: 24 Jul 2006 01:29:00 -0700
Hi mate
In my previous post I answered some of your questions, but somehow
overlooked the general picture. Therefore, let me tell you what I
think.
First of all, "one file-one thread" strategy is not the best one in
your situation. Why????
Let's say thread X processes file A, and thread Y processes file B.
These files may be physically located in totally different parts of the
disk. Therefore, every time context switch occur, the system has to
update the position of the head, i.e. you have to search the disk all
the time. As a result, you get performance degradation, instead of
improvement. I think the best thing to do is to process files
sequentially, i.e. one by one
Second, as I have already mentioned in my previous message, you should
try to avoid page faults. Imagine what happens if the size of your
buffer exceeds the amount of memory that is available to your process -
you will be moving data from the disk to memory and back to the disk
all the time. If you have SeLockMemory privilege, probably it makes
sense to allocate memory with MapUserPhysicalPage() - if you do it this
way, you will make sure that your buffer is always resident in RAM. As
a result, you will be able to read data, update it, write it back to
the disk and proceed to the new range, without a SINGLE page fault. If
you do it this way, you have to do everything synchronously, so that
there is no need for FILE_FLAG_OVERLAPPED flag
To summarize, the fact that you are about to access files only once and
do so sequentially gives you a HUGE potential for optimization in terms
of minimizing the number of disk accesses and disk searching.
Therefore, you should take advantage on it - in your situation it
offers much more than multithreading and asynchronous IO do
Anton Bassov
Piotr Wyderski wrote:
I would like to obtain the highest possible throughput
of disk IO under the following conditions. The program
simultaneously reads and writes several large files,
~0.25+ GiB each. Each file must be processed in order,
so no explicit fine-grained parallelization provided by IOCP
is possible -- the file has its own processing thread. The
files are accessed sequentially and only once, so no caching
will help. Currently I use four independent memory buffers
per file and overlapped IO to separate the phases of
reading/writing and processing. The IO completion notifications
are issued through Win32 event objects.
I set the FILE_FLAG_NO_BUFFERING flag to bypass
the filesystem cache and (hopefully) cause the system to
perform explicit DMA transfers directly into the buffers.
But several questions remain open:
1. How big should be the data block to obtain the highest
performance? Currently it is hardcoded to 512KiB, but it
is an early implementation and is subject to change. If
it is OS/filesystem specific, then how can I get the best
value at run time using WinAPI?
2. Does FILE_FLAG_NO_BUFFERING and
FILE_FLAG_OVERLAPPED interfere somehow
with FILE_FLAG_SEQUENTIAL_SCAN?
Should I avoid the latter hint when the first two flags
are specified?
3. How should I allocate the memory buffer? A simple
VirtualAlloc will do, but it can optimize (cache coloring,
clustering etc.) the virtual->physical address mapping
for memory access, not for IO, especially for large DMA
transfers. Is there a way to allocate a contiguous block of
physical memory? Will it help much under NT/XP? I'm
asking, because I don't know the details of NT's low-level
disk IO architecture and implementation.
4. Can I directly read into/write from an AWE buffer?
5. Is there something I forgot to use in order to obtain
the highest throughput? Except IOCP, of course. :-)
Best regards
Piotr Wyderski
.
- Follow-Ups:
- Re: High-performance IO
- From: Piotr Wyderski
- Re: High-performance IO
- References:
- High-performance IO
- From: Piotr Wyderski
- High-performance IO
- Prev by Date: Re: setting SeDebugPrivilege privilege
- Next by Date: Re: High-performance IO
- Previous by thread: Re: High-performance IO
- Next by thread: Re: High-performance IO
- Index(es):
Relevant Pages
|