Re: ReadFileScatter and WriteFileGather

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance




"Slava M. Usov" <stripit.slough@xxxxxxx> wrote in message
news:e5SV8c8RGHA.4920@xxxxxxxxxxxxxxxxxxxxxxx
"Beverly Brown" <beverly.brown@xxxxxxxxx> wrote in message
news:%23Pmjry7RGHA.4688@xxxxxxxxxxxxxxxxxxxxxxx
Looking at the docs for these functions, I see that the buffers in the
array must be page-sized and page-aligned. I haven't quite figured out
how
this is useful in a real-world example. Why would an application want
discontiguous page-sized, page-aligned virtual buffers?

Page-aligned page-size buffers can be passed down to bare metal without
any
transformations, which is required for maximum efficiency. If you don't
care
about this, you should not care about scatter-gather IO, either.

As a device driver person, I care a lot about scatter-gather, but the
virtual address layout of the buffer doesn't matter at all to me. It's the
contiguity of the physical memory it gets locked into that matters. Being
able to move data without having to copy it to a physically contiguous
buffer first is the big win with scatter-gather. But I can do that with a
single large user buffer.

If you break that user-mode buffer down into page-sized chunks in the
application, the IO manager still has to lock them all into physical address
space and give the driver a MDL describing the physical pages. It seems that
in addition to the application having more overhead to mange the buffer
space, the IO manager's job just got harder because now it has more virtual
addresses to deal with for the same amount of data. On the surface, that
doesn't look like maximum efficency.


The usefulness of discontiguous buffers cannot be seen in a example that
can
be posted here.

One example is a DB server that must cache disk data without using the OS
cache. While the cache is relatively free, you can read contiguous disk
data
into contiguous buffers. But eventually the cache will fill up, and then
parts of it will be purged, and you suddenly cannot read contiguous disk
data into contiguous buffers, unless you grow the cache, or de-fragment
the
cache, or issue multiple IO requests, or just read contiguous data into
discontiguous buffers. Do I need to explain why the latter option is the
best?

Ahhh. OK. I wasn't thinking about the cache effects. I was thinking about
the IO manager making MDLs for the user-mode buffers and it seemed like any
benefit gained was lost with the app tracking the buffers that way. (I don't
play in the filesystem space - I work mostly with non-storage devices where
cache plays a much smaller role).


In order to get such an array, VirutalAlloc would need to be called
repeatedly requesting single pages each time.

This can be done, but this is exactly what should not be done. Instead,
you
allocate a big contiguous chunk and maintain a free page list/bitmap and
use
whatever free pages that happen to be in the list. You do not re-allocate
each time you do IO.

[...]

Until you explained about the cache, I couldn't see any benefit to doing
that. It seemed like a lot of overhead to read data into a buffer.

One more question though. Why is the API restricted to single page buffers?
Why not allow arbitrary sized buffers? Unix systems have a readv function
that is similar but it does not restrict the size of the buffers to a single
page. I'm thinking this could be useful for non-storage devices if multiple
buffers of arbitrary size could be specified. The IO manager could coalesce
the buffers into a single MDL (or a MDL chain) and only one call into the
driver would be necessary for multiple buffers. An application like the one
you described could still manage its own page-sized buffers. Nothing would
prevent it from doing so. But other types of applications could take
advantage of that single call into the driver without forcing it to manage
its buffers in page-sized chunks. It could use whatever size is more natural
for the application and device it is using.


Also, if the buffers need to be page-aligned as the docs say, why bother
having a member specifying Alignment in the FILE_SEGMENT_ELEMENT
structure?

It is not a structure, it is a union. This element ensures that each
element
is 64 bit aligned.

Oops! I missed that in my reading. Thanks for setting me straight.

Beverly

S




.



Relevant Pages

  • Re: Increased Buffers due to patch 56e49d (vmscan: evict use-once pages first), but why exactly?
    ... All is about the increased amount of "Buffers" accounted as active while ... But I didn't get how this prefers buffers compared to cache pages (I ... This indeed sounds like the kind of workload that would only ... filesystem metadata all the time. ...
    (Linux-Kernel)
  • Re: Increased Buffers due to patch 56e49d (vmscan: evict use-once pages first), but why exactly?
    ... All is about the increased amount of "Buffers" accounted as active while ... But I didn't get how this prefers buffers compared to cache pages (I ... This indeed sounds like the kind of workload that would only ... it might makes stream I/O benchmark score a bit because such workload ...
    (Linux-Kernel)
  • Re: ReadFileScatter and WriteFileGather
    ... Page-aligned page-size buffers can be passed down to bare metal without any ... One example is a DB server that must cache disk data without using the OS ... repeatedly requesting single pages each time. ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Top - Mem: 119212k used
    ... AC>> meant "60+megs after excluding cache", ... AC>> This indicates 54360K in use, after excluding buffers and cache. ... AC> is the smallest window manager I could find. ... (/usr/bin/X11/xterm, not the 'fancy' xterms), and Mozilla. ...
    (comp.os.linux.misc)
  • Re: free (command) memory question
    ... > I just ran the free command and it shows as follows: ... Of the used memory 105MB is buffers and 213MB ... would assume Linux always keeps a little physical memory for buffers and ... cache, so I would expect it to start swapping before all of the additional ...
    (alt.os.linux.suse)