Re: Decoding strategy
- From: "Peter Duniho" <NpOeStPeAdM@xxxxxxxxxxxxxxxx>
- Date: Wed, 11 Oct 2006 13:47:34 -0700
<marcin.rzeznicki@xxxxxxxxx> wrote in message
news:1160597995.002328.207620@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[...]
So, memory mapped view is sure too be usable for a while, so I
think it pays off too keep that in memory. That characteristic also
ensures me that OS cache can be helpful and performance will not suffer
from misses/disk reads very often.
That's well and good. However, those characteristics assist in ensuring
that the file data is cached when using other forms of i/o as well,
including using a FileStream. The benefit is not unique to memory mapped
file i/o.
Yes, I know. I was referring to something else. Sorry for being
unclear. Docs say:
"(..)must specify an offset within the file that matches the memory
allocation granularity of the system, or the function fails. That is,
the offset must be a multiple of the allocation granularity".
I know that this is going to be aligned to sth in VM, but I do not
care, this is transparent unless you write kernel-mode stuff or,
generally, very low level stuff. What I do care is that I cannot choose
FILE offset at wich mapping starts. And that leads to "tearing"
Okay, I think I understand better what you meant. I'm going to snip a bunch
of stuff here, and hopefully jump to the core of the issue...
[...]
Well, that's very close to what I have now. Let me specify the details.
I read few "blocks" a time, namely 4, which is 256kb of data (block for
me is memory allocation granularity, as that it is the smallest
addressable part of file when it comes to memory mapping). I try to
adjust offset a little, so that: I always read the whole data I am
requested, and, immediate reads in the neighbourhood will not cause
remapping, whch is close to your idea. But then, how do you know
whether the very first byte of current "window" is the first block of
character?
Thanks. The code you posted helps me understand better what's going on.
In fact, as near as I can tell, you are using memory mapping in practically
the same way as my proposed multiple-buffer solution deals with things.
That is, you're windowing the file with memory mapping the same way I'm
doing it with the buffers.
Here's a dumb question: is there any particular reason you're NOT mapping
the entire file at once? I've mentioned the possibility in previous
messages, making assumptions that you have your reasons for not doing so.
But if you could, all of these issues just go away. Are you genuinely
concerned that you won't have enough contiguous virtual address space to map
the whole file?
Anyway, for the moment let's assume that you can only map a portion of the
file at a time...
Depending on what the actual performance is, it seems to me that either
method would be the correct solution. I suspect that the answer is to
simply do the memory mapping a little differently, but I don't have enough
experience with memory mapped files to know for sure.
Specifically: what if you modified your code that maps the file, so that it
maps a range *around* the starting point, the way I suggested with the
buffers? At certain points (perhaps only when you got right to the very
edge and attempted to read a byte outside your mapped range), you would
remap the file, shifting the window so that the bytes you want to deal with
are within the mapped range.
When you index the data, I would recommend the high-level code using an
index relative to the file beginning. That is, your index is just the file
offset for the data (note that I'm not using the word "index" to relate to
the broader index you calculate for the file data...I just mean a way to
identify which byte you're working on at the moment). Then you translate
that to the actual offset within the mapped range as necessary. That way,
you can be changing the mapped range on the fly without affecting how the
higher-level code that actually processes the data works.
I believe that performance should be fine doing this. When you remap the
file, most of the file should still be in physical RAM and I suspect the OS
will correctly reattach the newly mapped range to the portions of the range
that are already resident in RAM. Only the newly mapped portions of the
file should need to be read.
[...]
:-( That's pain in the ass for me. If I knew that I could always look
back for missing parts of single character, then mix of your solution
with memory mapping would be the best scheme
Well, at some point you need to come up with some mechanism for finding the
beginning of a valid character. :) How you access the file might make this
easier or harder, but the problem exists even if you can map the entire file
at once. Sorry I can't be more helpful on that front. I agree, that part
actually seems to be the "hard part" of this problem, in spite of all the
space we've consumed discussing the file i/o part. :)
Pete
.
- Follow-Ups:
- Re: Decoding strategy
- From: marcin . rzeznicki
- Re: Decoding strategy
- References:
- Decoding strategy
- From: marcin . rzeznicki
- Re: Decoding strategy
- From: Kevin Spencer
- Re: Decoding strategy
- From: marcin . rzeznicki
- Re: Decoding strategy
- From: Peter Duniho
- Re: Decoding strategy
- From: marcin . rzeznicki
- Re: Decoding strategy
- From: Peter Duniho
- Re: Decoding strategy
- From: marcin . rzeznicki
- Re: Decoding strategy
- From: Peter Duniho
- Re: Decoding strategy
- From: marcin . rzeznicki
- Decoding strategy
- Prev by Date: Re: Accessing DataSet values easily
- Next by Date: Re: when casting, receiving: System.InvalidCastException (Specified cast is not valid)
- Previous by thread: Re: Decoding strategy
- Next by thread: Re: Decoding strategy
- Index(es):
Relevant Pages
|