Re: 1394 and Vista64

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



I had my fair amount of aggravation running dma above 4Gb. The reason
was that either my card, or the pci express bridge chip, or both, and
only in some machines, did not like to do 64-bit accesses very much. I
also have a lingering suspicion that those configurations also don't
like accesses that cross a 4Gb boundary. But I'm the first to admit, I
still didn't get to the bottom of this issue.

I had some success forcing all buffers that took place in a dma to
fully stay within a 4Gb area. I used to see a fair amount of hangs,
and they pretty much disappeared when I started to force dma to stay
within a 4Gb area, that is, the upper 32 bits of the buffer's physical
address stay the same during the whole dma. I still experienced the
rare hang, so, right now my driver forces all dma buffers to be
located at a physical address below 4Gb. Before I had several hangs a
day, then I had a hang a week, now I have a hang every few weeks. I
still see the problem, but I really don't know what's going on.

I dislike the idea of bounce buffers, but the alternative in my case
is to dma directly from user space, and then I cannot control the
physical address range. What I do is to allocate a large slab of
physically contiguous memory at start time - my standard allocation is
256Mb, it seems to work well even in 32-bit systems - and then I
suballocate all my dma buffers from that slab. That allows me to keep
my buffers within any physical address range I need. I still need to
copy buffers at dma time, but my application is so chip-intensive that
I could probably run my processor at 100% utilization and I wouldn't
see any speed degradation.

One nice thing about bounce buffers is that they give me the
flexibility to do real-time data verification. What I do is, when the
user has a "DmaVerify" flag set in the Registry, I allocate two bounce
buffers instead of one. Say it's a dma to the chip: I copy the user
buffer to one of the bounce buffers, and I throw a command stream at
the chip that does a dma from that buffer to the chip memory and
immediately bounces it back, doing a dma to the second buffer. Only
then I generate the completion interrupt. This means that my DPC can
compare the two buffers and accuse any discrepancy. I normally don't
use that in production mode, but it comes real handy when we see
unexplained artifacts in a rendered image and we want to rule out the
integrity of the dma.

Alberto.


On Feb 6, 8:37 am, stevesummers0909_at_gmail_dot_com
<stevesummers0...@xxxxxxxxx> wrote:
I have a 1394 driver that works on Vista64 machines, but only when
there is < 4GB of memory. When more memory is installed, I get dropped
packets, repeated packets, etc.

Researching Windows DMA I've learned that going over 4 GB means that
"bounce buffers" are required, at least whenever the physical address
of the buffers exceeds 4 GB. In most cases, I take this to mean that
the system copies data from/to an area addressable by the 1394 card
(<4GB) to/from buffers which are physically above 4 GB.

However, I don't see any mention in the DDK docs of restrictions on
the type of memory used for either descriptor lists or the associated
MDL's for the packet data buffers. Nor is there any mention of DMA
cache flushing API's whose job it is to either:
     - flush CPU caches ( KeFlushIoBuffers() )
     - ensure that each DMA transfer is complete ( FlushAdapterBuffers
() )

So... what rules for allocating the packet buffers and descriptor
lists apply here? And is it necessary to use these cache flushing
functions?

.



Relevant Pages

  • Re: Handling high UDP throughput
    ... The product that uses this sustains 540MbS with a 38kHz interrupt running using more than half the processor's power, so a lot goes on in the system but a lot of time is available for TCP/IP. ... The Ethernet driver was optimized, the memory movement was optimized (just using an inline memcpy that does a DMA transfer adds 30% to the effective speed), the IP checksum was in assembly, and a zero-copy TCP/IP stack was required. ... How much TX buffers did you have? ...
    (comp.arch.embedded)
  • non barrier versions of dma_map functions
    ... This is a request for extending the DMA api for efficient handling of multiple buffers or scatter gather mapping/unmapping operations. ... On ARMv7 it performs the necessary cache-operations and calls data sync barrier instruction (DSB). ...
    (Linux-Kernel)
  • RFC non barrier versions of dma_map functions
    ... This is a request for extending the DMA api for efficient handling of multiple buffers or scatter gather mapping/unmapping operations. ... On ARMv7 it performs the necessary cache-operations and calls data sync barrier instruction (DSB). ...
    (Linux-Kernel)
  • Re: USB mass storage and ARM cache coherency
    ... transfers and DMA for bulk transfers. ... The current stack performs dma cache maintenance even for the PIO transfers ... which leads to the corruption issue. ... The control buffers are handled by CPU ...
    (Linux-Kernel)
  • PXA2xx SPI controller updated for 2.6.16-rc1?
    ... Do you have a version of the PXA2xx SPI contoller driver more recent ... I've attached my attempt (PIO works but DMA doesn't) if it's of any use. ... I'm currently using SSP3 on the PXA27x with the slave chip select GPIO ... It looks like you're waiting for the transmit buffer in the controller ...
    (Linux-Kernel)