Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- From: Anton Bassov <AntonBassov@xxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Sat, 3 Mar 2007 11:41:13 -0800
Thomas,
Do make an estimate of the event interactions that must be used toActually, this is the main reason why I said that performance improvement
synchronize each "slot". Events that cross the user-kernel boundary will be
a performance bottleneck. That's the number that I believe must be minimized
to achieve decent performance. (Others may have different thoughts...).
may be rather slim, if any at all.....
No matter how you look at it, apps still have to synchronize their access to
the shared buffer, which means that thet will inevitably have to call
WaitXXX() functions that involve user-to-kernel transition before they can
access the buffer. If you just make your driver pend IRP_MJ_READ, apparently,
asynch IO completion routine will have to re-submit the request, i.e. call
ReadFileEx() upon every invocation. Is ReadFileEx() much more expensive that
WaitXXX() functions? I don't think so - after all, when you make a call that
involves a transition to the kernel mode, the lion's share of time is spent
on parameter validation, rather than on processing the call itself.
Therefore, sharing a buffer does not seem to offer an advantage so
significant that it is worth all the pain of dealing with things that you
don't need
to think about when using "standard" IO.
Consider having your "Prime" application allocate user-mode memory that will
eventually be shared amongst all your user and kernel components. Pass it to
one driver using IRp that is pended, etc. Standard stuff...
Another concern is locking memory. If you use METHOD_BUFFERED, the system
will have to copy all data from the system buffer to the user one, and vice
versa. If you use METHOD_DIRECT, the system does not have to copy data.
However, again, the same question arises - is copying data is so expensive
operation? You save few machine cycles, but, as a result, have to lock memory
for the extended periods of time (i.e. while "prime" app is active). This is
not a big deal if shared section is just few pages in size, but if it is
reasonably large, locking memory for the extended periods of time may lead to
overall performance degradation - you get the pain
of dealing with all additional complexities.......and get performance
degradation, rather than improvement!!!!
If I was in the OP's place, if would first try to do everything "stupid and
simple", and see how it all works - I wold start thinking about optimization
if and only if I am not happy with the performance. However, the OP starts
thinking about it in the very beginning of his project.
According to Knuth, " in 95% of cases optimization is mother of all evil". I
think that there is a good chance that the OP's case falls into these 95%
......
Anton Bassov
"Thomas F. Divine" wrote:
.
"Le Chaud Lapin" <jaibuduvin@xxxxxxxxx> wrote in message
news:1172849765.066871.323980@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hi Thomas & Anton,Do make an estimate of the event interactions that must be used to
On Mar 2, 1:02 am, "Thomas F. Divine" <tdivine@NOpcausaSPAM> wrote:
-Le Chaud Lapin-
Surely with enough work and debugging you can get your scheme to work. Do
sit back and seriously consider my personal motto: "Keep it simple,
stupid!".
I'm considering, i'm considering. :)
I guess I asked the question in the wrong way. What I should have
done was given context of unchangeable requirements, and ask what is
the best way to do the packet copies.
I have several processes that share a section in RAM, say 27 process,
and one of the processes (prime process) loads a driver that binds to
miniport to send and receive frames. The system is designed so every
frame that is transmitted to the miniports via the protocol must
originate in the shared section, and every frame received from a
miniport via the protocol must be implanted in the shared section. I
currently have 128 frame slots in the shared section, and whenever the
prime process needs to send a frame, it migth not have opportunity to
batch the send, as there might be only 1 frame to send. On receives,
I would like to batch the receives in Protocol receive for bursty
traffic, but bypass copy of queing, copying directly to the shared
section. Since the frame slots are limited, I do not want to try to
guess the optimum preallocation count in ReadFile to batch reads, or I
will starve the other processes of frames.
So I guess my question is, given that the target of all mini-port-
indicated frames must be that shared section, what will be the
performance penalty of copying full packets from the large buffers
setup by ReadFile and the corresponding buffers in the shared
section. Is it much less significant that the kernell transitions.
Also note that this scheme will require a kernel transition anyway,
when ReadFile attempts to lower the semaphore on the frame slots in
the shared section.
-Le Chaud Lapin-
synchronize each "slot". Events that cross the user-kernel boundary will be
a performance bottleneck. That's the number that I believe must be minimized
to achieve decent performance. (Others may have different thoughts...).
You should still be able to preserve the "safety net" provided by the OS
with the inverted call mechanism.
Consider having your "Prime" application allocate user-mode memory that will
eventually be shared amongst all your user and kernel components. Pass it to
one driver using IRp that is pended, etc. Standard stuff...
I don't see why your "Prime" application couldn't pass the same memory to a
second driver also. Having done so, the "Prime" application and two drivers
have access to a common memory area. In addition, the system will
automatically call both driver's Cleanup/Close routines when the "Prime"
application exits. This is the feature that you really don't want to
re-invent yourself.
Now the "Prime" application is all set. What about the other processes?
You can share memory across processes using "named shared memory" and "named
events", etc. A little work here, but in user-mode only with much less
chance of crashes of the system.
Just a thought...
Thomas F. Divine
- Follow-Ups:
- Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- From: Maxim S. Shatskih
- Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- From: Le Chaud Lapin
- Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- References:
- Direct Copying To Share Memory In NDIS ProtocolReceive
- From: Le Chaud Lapin
- Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- From: Thomas F. Divine
- Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- From: Le Chaud Lapin
- Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- From: Thomas F. Divine
- Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- From: Le Chaud Lapin
- Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- From: Thomas F. Divine
- Direct Copying To Share Memory In NDIS ProtocolReceive
- Prev by Date: Re: How does a NDIS intermediate driver work with respect to the TCP/IP stack?
- Next by Date: Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- Previous by thread: Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- Next by thread: Re: Direct Copying To Share Memory In NDIS ProtocolReceive
- Index(es):
Relevant Pages
|
|