Re: DMA transfer speed issues on PCI card with PCI and PCI-X busses




Oliver wrote:
Hi all,

we're using a PCI card with large on-board memory and a PCI interface chip with DMA busmaster engine
(PCI9656 from PLX but only using 32 bit width). Data transfer is done using Scatter Gather DMA and
is running stable. The card is working in PCI and PCI-X busses with 33 MHz and 66 MHz.


You can't use PCI-X _bus_ with PCI9656. PCI9656 is PCI-only device.
What you use is PCI-X _slot_ in PCI mode.

When measuring the DMA transfer speed I get some strange results:

used in a standard 32 bit PCI slot:
read: card to PC memory: 80-100 MB/s
write: PC memory to card: 80-100 MB/s

used in a 64 bit PCI-X slot with 66 MHz:
read: card to PC memory: 200-220 MB/s
write: PC memory to card: 40-50 MB/s

using an older design with a pure 33 MHz 32 bit interface chip (PCI9080)in a PCI-X slot:
read: card to PC memory: 80-100 MB/s
write: PC memory to card: 40-50 MB/s

Concerning the read direction the speed is as expected but as soon as a PCI-X slot is involved the
bus transfer speed in write direction drops a lot. The Scatter-Gather list is located in PC memory
what should make access easier in write direction as the PCI bus need not to change direction.

We tested at least 20 different PCI-X systems and get these results with ±10%. It can't be a local
bus issue as the card is at least able to stream data with 100 MB/s on a 32 bit PCI slot.

Anybody an idea where to start searching and which setup to change?
When monitoring PCI bus signal I see that in write direction the transfer drops and is restarted
after only a few bytes of transfer. But why? We modified the design to be sure that the local bus is
able to run with even 250 MB/s with no latency.
There're no other card accesses while DMA is running, the driver simply waits for the interrupt.

Any help is appreciated.

Best regards
Oliver

First you seem confused about the naming of PCI transfers.
For bus-master cards:
PCI-to-memory = Write
Memory-to-PCI = Read

Second, in PLX IO accelerators you could program the PCI command used
in DMA transactions (See register 6Ch). The default For Memory-to-PCI
DMA is "memory read line". Some chipsets, e.g. AMD 8132 interpret
"memory read line" command as "read one processor cache line" i.e. in
case of Opteron 64 bytes. Reading just 64 bytes leads to suboptimal
performance.
Program DMA to use "memory read multiple" command and on many systems
you will see immediate increase in performance.

Third, you didn't tell us about your local bus. May be the bottleneck
is on the local side of the bridge?

Fourth, your question doesn't really belong to m.p.d.d.d. You are just
lucky that I am bored.

.



Relevant Pages

  • Re: A simple question about DMA, please help me.
    ... held by the DMA controller and the CPU is set idle until this transfer ... memory to fetch instructions while the DMA transfer is continuing. ... The PCI bus changed that -- it eliminated the separate lines for each ...
    (comp.lang.asm.x86)
  • Re: p5e3 deluxe motherboard and 1333 2GB memory modules.. best bet?
    ... I cant find the module i was going to buy in their memory specs/ ... The 5400 has enough bandwidth to run four video card slots at ... I purchased a PCI video card the ...
    (alt.comp.periphs.mainboard.asus)
  • [PATCH] Cardbus card memory assignment on x86_64 (if base==maxbase)
    ... I've sent the patch below to Ivan Kokshaysky ... I've debugged a memory assignment problem which happens even with 2.6.8.1 ... on Amilo Laptops from Siemens Fujitsu with a RT2500 CardBus card. ... The problem was that the PCI scan didn't report a memory range for the ...
    (Linux-Kernel)
  • Re: Trying to DMA data from PCI bus to IDE
    ... >to handle DMA transfers across the PCI bus to the rest of the PC104 ... >memory, and then use the regular Linux functions to write it to the ...
    (comp.os.linux.development.system)
  • Re: ST developments New graphics card.
    ... A Kinetic is a StrongARM card with it's own faster memory on board, which is used in preference but also in addition to the slower memory on the motherboard. ... The drawback is that fast DMA transfers can't be made in to the on board memory, making disc operations from some 3rd party IDE and SCSI cards quite a bit slower. ... I got around this by unplugging all instances of the DMAManager module, from the ROM and also any on podule cards, and it then worked perfectly with the Kinetic* ...
    (comp.sys.acorn.hardware)