Re: Decoding strategy
- From: "Kevin Spencer" <spam@xxxxxxx>
- Date: Tue, 10 Oct 2006 08:12:39 -0400
I would use a FileStream instance to read the file. The FileStream class
supports random access to files, allowing you to jump around in the file.
You can read as little or as much as you want into memory when you need to.
--
HTH,
Kevin Spencer
Microsoft MVP
Chicken Salad Shooter
http://unclechutney.blogspot.com
A man, a plan, a canal, a palindrome that has.. oh, never mind.
<marcin.rzeznicki@xxxxxxxxx> wrote in message
news:1160431060.920574.5670@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hello everyone
I've got a little problem with choosing the best decoding strategy for
some nasty problem. I have to deal with very large files wich contain
text encoded with various encodings. Their length makes loading
contents of file into memory in single run inappropriate. I solved this
problem by implementing memory mapping using P/Invoke and I load
contents of file in chunks. Since files' contents are in different
encodings what I really do is mapping portion of file into memory and
then decoding that part using System.Text.Encoding. So far, so good,
but. It's not difficult to imagine serious problem with this approach.
Since file processing is not, and also cannot be, sequential and
furthermore, memory mapping limits offsets at which mapping can take
place, then some mapping can "tear" a character apart. How to deal with
this? I thought of implementing decoder fallback which would check few
bytes behind current mapping and would try to substitute unrecognized
chars, but I don't know whether it is feasible. I do not know if
decoder will not accidently mistake broken char for some valid, but
different from expected, character. I guess it depends on encoding
used. What do You think?
.
- Follow-Ups:
- Re: Decoding strategy
- From: marcin . rzeznicki
- Re: Decoding strategy
- References:
- Decoding strategy
- From: marcin . rzeznicki
- Decoding strategy
- Prev by Date: Re: Regex question
- Next by Date: How to populate a combobox with an ArrayList containing 1 dimensional array of strings?
- Previous by thread: Decoding strategy
- Next by thread: Re: Decoding strategy
- Index(es):
Relevant Pages
|