Re: High-performance IO
- From: "anton bassov" <soviet_bloke@xxxxxxxxxxx>
- Date: 29 Jul 2006 11:41:38 -0700
Hi Piotr
The size of contigious buffer is meaningfull for common buffer DMA
transfers - the larger the buffer, the more data you can transfer in
one go. However, using common buffer DMA transfers for the purpose of
reading or writing data from/to the disk is unreasonable thing to do -
the system would have to move data to/from the common buffer to its
destination/source buffer. Therefore, the system is going to use
scatter-gather DMA for the disk IO anyway, so that memory does not
have to be physically contigious
In order to do DMA, one has to provide MDL. All pages that MDL
describes have to be physically present and locked in RAM. If they are
currently paged, the system has to load them to RAM and lock them
before it can proceed to DMA. This is the only bottleneck that you can
avoid - you can make sure that your buffer is always locked in RAM. You
should not be bothered about the rest - let the system take care of
that.
a) what is the upper limit of a DMA disk transfer, how is it
computed on a Windows XP-based machine and by whom?
I would say that for scatter-gather DMA it depends on how much memory
can be locked.
According to IoAllocateMdl() documentation, the maximum size of MDL is
PAGE_SIZE * (65535 - sizeof(MDL)) / sizeof(ULONG_PTR). However, it
does not necessarily mean that all pages MDL describes may be
successfully locked. Therefore, it depends on the situation. In you
case it does not matter anyway - after all, you are going to make sure
that your buffer is always locked in RAM.
BTW, apart from using AWE, you can also lock pages in memory with
ZwLockVirtualMemory() by specifying 0x2 as a LockType. However,
LockMemory() is not going to do the job - the only thing it can do is
to lock memory in the working set list of the target process (it
specifies 0x1 as a LockType in its call to ZwLockVirtualMemory()) It is
understandable that such approach does not allow you to utilize more
than 4 G of RAM.
Anton Bassov
Piotr Wyderski wrote:
Hi Anton,
First of all, memory does not have to be physically contigious - the
only thing you need is to make it resident and locked
Surely, but discontinuities limit the length of the DMA transfers.
If the length of a contiguous memory block is equal to the size
of a virtual memory page, the DMA transfer length will be at
most 4KiB too -- not a very impressive value... The other limiting
factor if the cluster size, so the buffer doesn't have to be contiguous
that much.
But I don't know whether the disk drivers are supposed to support 64-bit
(36, in fact) physical addressing.
Drivers are not concerned about things like that - after all, they deal
only with linear addresses.
But eventually they must issue a DMA transfer request (does
anyone stil use PIO?) and they work only with physical memory.
That rises two questions:
a) what is the upper limit of a DMA disk transfer, how is it
computed on a Windows XP-based machine and by whom?
b) is the underlying hardware able to access physical memory
above 4GiB? There may still be some 32-bit oddities etc.
I know that AWE can give the programmer a way to use
huge memory, but the problem is whether a physical memory
block from the AWE area is guaranteed to be reachable via
PCI etc. If one completes a data block in AWE and issue
an IO request, it would not be desirable to receive an error
code like E_INVALID_PHYSICAL_MEMORY_RANGE
or something. I ask, because the following excerpt from MSDN
is the seed of doubt:
"A similar restriction is that AWE window address ranges and memory
pools cannot be used as data buffers for graphics or video calls."
And since AGP and PCI buses are being implemented in a quite
similar way, this remark naturally scales up the problem to a question
"does this restriction apply to disks too?"
Unfortunately, there is not much information available at the
level of detail I would like to know... :-(
The only problem is that AWE requires SeLockMemory privilege
I believe it will not be a big problem, especially that it is just an
alternative control flow path -- without that privilege my application
will simply use the old good VirtualAlloc-based way. But indeed,
the temptation to use AWE is very strong... :-)
Best regards
Piotr Wyderski
PS. The current, improved implementation is damn fast
and no longer is the performance limiting factor. :-)
.
- Follow-Ups:
- Re: High-performance IO
- From: Piotr Wyderski
- Re: High-performance IO
- References:
- High-performance IO
- From: Piotr Wyderski
- Re: High-performance IO
- From: anton bassov
- Re: High-performance IO
- From: Piotr Wyderski
- Re: High-performance IO
- From: anton bassov
- Re: High-performance IO
- From: Piotr Wyderski
- High-performance IO
- Prev by Date: Re: Creating a shortcut
- Next by Date: Re: PsSetCreateProcessNotifyRoutine
- Previous by thread: Re: High-performance IO
- Next by thread: Re: High-performance IO
- Index(es):
Relevant Pages
|