Re: Filename Encoding Help



Adhal wrote:
Hello,
On Vista & XP, I want to store filenames in a text file. What encoding should I use?

UTF16 (Encoding.Unicode)
OR
UTF32 (Encoding.UTF32).

I think UTF16 is enough? Anyway that is what I am currently using.

You can use any unicode encoding, like UTF-7, UTF-8, UTF-16 or UTF-32.

I suggest UTF-8, it's the most efficient for regular text, and it's the default for all methods reading and writing text files in .NET.

One other question. If I have a Japanese system and I again I want to store the filenames in a text file but this time not Unicode, should I use:

A) ANSI (Encoding.Default)
B) ASCII (Encoding.ASCII)

My understanding is that I should use Encoding.Default as it is set according to the system. I really don't know the difference between the two as they seem alike.

That depends on what characters the file names contains. The ASCII encoding only handles characters with character codes from 32 to 127. The ANSI character set will handle any characters in the ASCII character set.

It also depends on what you are going to use the file for. Is there any other program that will read the file?

--
Göran Andersson
_____
http://www.guffa.com
.



Relevant Pages

  • Re: Im sure glad I didnt buy a Mac Mini!
    ... I think what you need to do is send an email to a Windows-user that contains European characters in the UTF-8 charset and some quoted text in the ISO-somethingorother charset. ... This is why we have the convention of specifying the encoding, using only ASCII characters, before the text itself. ... If the character set is specified incorrectly, or if the email client doesn't support it, you will get incorrect characters. ... If it's a recent version, try changing the encoding it's using and see if the characters appear - if they do, then the problem is neither Outlook nor Windows, but a misconfiguration somewhere further up the chain. ...
    (comp.sys.mac.advocacy)
  • Re: Why do unwanted characters appear in emails, web pages, and forums?
    ... Other characters can be transformed. ... This is a character encoding issue. ... you claim you type a single dash "-" character. ... configured to use the ISO-8859-1 character set. ...
    (microsoft.public.windowsxp.general)
  • Re: euro sign become ? on xml document parsing
    ... >>> just one way of encoding a subset of the Unicode characters. ... I think UFT-8 does encode all the Unicode character set. ...
    (comp.lang.java.help)
  • Re: Html-encode all characters not in the current character set
    ... output text written in utf-8 ... I would also like to translate all non-latin1 characters using ... ISO-8859-1 is an encoding that only covers a limited character set. ...
    (comp.lang.php)
  • Re: Html-encode all characters not in the current character set
    ... output text written in utf-8 ... I would also like to translate all non-latin1 characters using ... ISO-8859-1 is an encoding that only covers a limited character set. ...
    (comp.lang.php)

Loading