Re: Decoding strategy
- From: "Peter Duniho" <NpOeStPeAdM@xxxxxxxxxxxxxxxx>
- Date: Tue, 10 Oct 2006 12:11:24 -0700
<marcin.rzeznicki@xxxxxxxxx> wrote in message
news:1160502641.574989.204890@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I didn't test performance with FileStream, but maybe you can confirm -
Does File Stream caches contents of file in memory?
FileStream does buffer, which is in a sense a kind of caching. You can
specify the buffer size when you create the FileStream.
I think there is
slight speedup when using memory mapping in that I do not have to hit
the disk all the time.
IMHO, the two major benefits to memory mapping are 1) convenience (as long
as your file access fits within the addressable space available to you), and
2) minimal and efficient virtual memory usage (the physical memory storage
of the data can be backed by the file itself, rather than using up swap file
space).
Any i/o speed advantage you can get with memory mapping, you can get with
normal file i/o using appropriate techniques.
In my solution I simply open mapping over whole
file and create views as needed. Anyway, let's say that I did it using
FileStream, I can read some bytes from it, but I still face the same
problem - how to interpret first bytes I have read, whether they are
beginning of character, or maybe end of "previous" character?
I'm not entirely sure I understand the question. Even using a memory mapped
file, if you jump into a random location in the middle, you can't tell
whether you're at the beginning of a new character or in the middle of one.
You need some point of reference to tell the difference.
If the file is entirely made up of contiguous Unicode characters, and thus
each character always starts on an even offset from the start of the file,
then that's one easy way to tell when you are at the beginning or middle of
a character. If that's the case though, then you could easily preserve that
characteristic even reading the file using FileStream.
On the other hand, if you are dealing with some other multibyte character
set, or it's all Unicode but there's other data that can cause the Unicode
characters to get shifted to odd offsets, then even using memory mapped
files you need to find a good point of reference before you decide whether
you're dealing with the start of a Unicode character.
Basically, I don't see how using the FileStream class versus using memory
mapping alters the underlying issue of determining what the character
boundaries are. You can read sections of the file using FileStream, and as
long as you keep track of what absolute file position those sections come
from, you can always translate the address of a byte from a partial section
back to an absolute file position, giving you the exact same position
information you'd have when using memory mapping.
It *is* true that reading the file into buffers by sections using the
FileStream class, you could wind up with partial data at the beginning of
end of one of these sections. The question there though is not knowing what
you've got (since as I point out above, you can just as easily determine
that whether using FileStream or memory mapping), but rather how to get back
the other part. To deal with that, you'd need additional layer of
processing that can piece together these data that straddle read boundaries.
I agree that this is an area in which memory mapped files are more
convenient, but it shouldn't be that hard for you to maintain a small
"workspace" buffer in which this sort of reconstruction can take place. In
the simplest case, it need only be a single "char" in which you pull out one
byte at a time from the buffer read by FileStream and combine them as pairs
into the "char" buffer (that may or may not be efficient, depending on what
level at which you're processing the data...if you have to look at each and
every character anyway, it may not be all that bad).
Pete
.
- Follow-Ups:
- Re: Decoding strategy
- From: marcin . rzeznicki
- Re: Decoding strategy
- References:
- Decoding strategy
- From: marcin . rzeznicki
- Re: Decoding strategy
- From: Kevin Spencer
- Re: Decoding strategy
- From: marcin . rzeznicki
- Decoding strategy
- Prev by Date: Re: nant - what is the right way to reference?
- Next by Date: Re: Decoding strategy
- Previous by thread: Re: Decoding strategy
- Next by thread: Re: Decoding strategy
- Index(es):
Relevant Pages
|