Re: How to read html files AS IS. Encoding seems to change the characters.
- From: Göran Andersson <guffa@xxxxxxxxx>
- Date: Sun, 01 Apr 2007 15:45:03 +0200
Zoro wrote:
Thanks again Goran for your help.
You are writing it back as UTF-8, as you are not specifying any encoding
in the WriteAllText method call.
It looks like I may be able to do it with string after all.
It looks like the problem with my test was - like you suggested - that
i didn't specify the write encoding. When I do, as long as I use the
same encoding when reading and writing, it worked with all 3 codes you
have suggested (but not with any of the built in codes - e.g. UTF-n!).
Then you have successfully decoded the file into text, as you are not losing any characters.
If you save the file using utf-8, all the characters will still be there, as strings are unicode and utf-8 can store any unicode characters. The reason that your test did not succeed with the unicode encodings is because the utility that you are using doesn't support unicode. You would need the "Pro" version for that.
I am still not clear on how it's going to work away from the test -
using the database situation, but I am HOPING it would work as
follows:
1. I will use 1 of these codes to read the file
2. then store the string into nvarchar field and add a note informing
users of the encoding I used
3. specify the same encoding when creating the file, after reading the
string from the db.
Do you think this would work?
Thanks again,
zoro.
As you successfully decoded the file to a string, you can store that in a nvarchar/ntext field and you are done. You can also store the encoding used if you like to recreate the file exactly, but you can create a file using any encoding that supports the characters in the text.
One advantage with using utf-8 encoding is that it places a BOM (byte order mark) at the beginning of the file, that can be used to identify the encoding used. If you use the File.ReadAllText to read a file that contains a BOM, it will read the file correctly, even if you specify a completely different encoding.
--
Göran Andersson
_____
http://www.guffa.com
.
- Follow-Ups:
- References:
- Prev by Date: Visual C++ Console Applications
- Next by Date: open wpf window
- Previous by thread: Re: How to read html files AS IS. Encoding seems to change the characters.
- Next by thread: Re: How to read html files AS IS. Encoding seems to change the characters.
- Index(es):
Relevant Pages
|