Re: Cache coherency issues using AllocateCommonBuffer(..)



The common buffer DMA APIs provided by NT on x86, x64, and ia64 platforms
assume that DMA is coherent with the processor caches. Non-coherent DMA is
something that MS folks have consistently tried to discourage for many years
now. I will get the AllocateCommonBuffer doc fixed to talk about lack of
support for non-coherent DMA.

You can workaround either by calling into Mm directly as you have already
done or use MDLs rather than a common buffer, apply the appropriate
noncached attribute to the pages locked into the MDLs, and then use the
standard non-common buffer based DMA APIs.

Mapping the buffer noncached into user mode might fail because Mm detects
the conflicting cached mapping generated by AllocateCommonBuffer and rejects
the noncached mapping to avoid cache attribute aliasing. I have seen that
failure with ZwMapViewOfSection.

-Eliyas

"Calvin Guan" <hguan@xxxxxxxxxxxxxxxxxxx> wrote in message
news:umpx98wTGHA.4772@xxxxxxxxxxxxxxxxxxxxxxx

I can't tell why MmAllocateContiguousMemorySpecifyCache works not
AllocateCommonBuffer.

Something I'd check:
1. Did you see the actual DMA write operation on the remote system?
2. What does the TLP attribute look like? Especially the NoSnoop bit?
3. How soon do you read from the host memory after you believe the DMA
write is done? this could be write-posting issue. Do a read from the
device register should flush the post-write data back to the host memory.

interesting issue. good luck,
--
Calvin Guan (Windows DDK MVP)

"smann" <ssm11b@xxxxxxxxxxx> wrote in message
news:32565AE2-17A2-4EEF-BDD1-20FD87CFCAC6@xxxxxxxxxxxxxxxx
Hi,
I am seeing a cache coherency issue with memory allocated through
AllocateCommonBuffer(..). This issue is occuring with a PCI Experss
driver.
The application I am running is in a dual host enviornment, where each
host
(PC) can directly wirte to each others main memory. This achieved
through a
special feature in the PCIe switch. Through various debugging techniques
I
have verifed that when (local) host correctly writes through the PCIe
switch
to the physical memory of the other (remote) host. Once the data is
written,
the remote host is signaled through the PCIe switch to read new data in
its
buffer. I have also verifed that the data read by the remote application
from its shared memory buffer, is stale, i.e. cached data.

Through other debugging mechanism I can invalidate the cache on the
remote
host. When I do this the behavior behavior of the remote host is
corrected
and it read the contents of the buffer not its cache.

The driver is using AllocateCommonBuffer from the DMA_OPERATIONS
structrue.
I tried other depricated MmAllocateContiguousMemorySpecifyCache()
function
and the cache coherency issue is resolved.

So I am baffeled, I almost certian I am allocating the buffer correctly
w/
AllocateCommonBuffer(...). Here is a snapshot of what I am doing. In
the
drivet I allocate with the following call:

// Attempt to allocate buffer
pKernelVa =
pdx->pDmaAdapter->DmaOperations->AllocateCommonBuffer(
pdx->pDmaAdapter,
BufferSize,
&BufferLogicalAddress,
bCacheEnabled // Enable Caching for buffer?
);

Here the bCacheEnabled is set to FALSE. Then allocate the MDL with the
following call:

pBufferObj->pMdl =
IoAllocateMdl(
pKernelVa,
BufferSize,
FALSE, // Is this a secondary buffer?
FALSE, // Charge quota?
NULL // No IRP associated with MDL
);

MmBuildMdlForNonPagedPool(
pBufferObj->pMdl
);

Next I am getting the User Virtual Address for the buffer through the
followng call:

pUserVa =
MmMapLockedPagesSpecifyCache(
pBufferObj->pMdl ,
UserMode,
CacheMode, // CacheMode == MmNonCached
NULL,
FALSE,
NormalPagePriority
);

I return the virtual address and the physical addres to the user mode
application. The Physical address of the memory remote host is passed to
local host and is used for wirting to the remote memory. After the
remote
memory is written, the remote host uses the virtual address to read data
from
the shared buffer.

And as I mentioned the data it reads is the data in it cache even though
the
buffer was allocated with the non-cachable attribute. When I ran
multiple
iterations of writes, I could see the data read by the remote host was
from
one of the previous writes.

If anyone has any idea on why the buffer being cached, I would appreciate
the response?

--
smann




.



Relevant Pages

  • Re: Cache coherency issues using AllocateCommonBuffer(..)
    ... we did put a scope on the remote system looked at the TLPs coming ... and so it seems like we are stuck with AllocateCommonBuffer. ... standard non-common buffer based DMA APIs. ... How soon do you read from the host memory after you believe the DMA ...
    (microsoft.public.development.device.drivers)
  • Re: Cache coherency issues using AllocateCommonBuffer(..)
    ... which we used to verify that the remote CPU was reading stale data ... and so it seems like we are stuck with AllocateCommonBuffer. ... standard non-common buffer based DMA APIs. ... How soon do you read from the host memory after you believe the DMA ...
    (microsoft.public.development.device.drivers)
  • Re: Cache coherency issues using AllocateCommonBuffer(..)
    ... Try not overwriting the NoSnoop bit on the remote system to see if it solves ... that the memory region is to be read by host CPUs. ... standard non-common buffer based DMA APIs. ... How soon do you read from the host memory after you believe the DMA ...
    (microsoft.public.development.device.drivers)
  • Re: Cache coherency issues using AllocateCommonBuffer(..)
    ... Did you see the actual DMA write operation on the remote system? ... How soon do you read from the host memory after you believe the DMA write ... from its shared memory buffer, is stale, i.e. cached data. ... drivet I allocate with the following call: ...
    (microsoft.public.development.device.drivers)
  • Re: Cisco 7204VXR Interface ignored packet
    ... Input queue drops (ignored packets) are usually due to buffers not being ... The buffer defaults are good 98% of the ... 256 max cache size, 256 in cache ... 22496866 hits in cache, 0 misses in cache ...
    (comp.dcom.sys.cisco)