Re: Convert ANSI to UTF-8?

Tech-Archive recommends: Fix windows errors by optimizing your registry




I have some legacy code that saves a file to ANSI, (using FILE* fp; fopen( &fp, ..., "w"); fwrite(...))
I am writing 'char*' to the file, (not wchar).

The file is saved as ANSI, a third party application uses the file, but one of their new requirement is that the file be UTF-8.

Is there a way of saving my file so it is UTF-8?

My application uses _UNICODE apart from that file saving class.
I don't mind changing all the chars to wchars but I am not sure that it will change the format of the file, (not the way I understand UTF-8 anyway).

Hi Simon,

to add to Mihai's correct answer, you may want to try a library I recently shared on Code Gallery:

http://code.msdn.microsoft.com/UTF8Helpers

In the downloads section there is a simple console-based app to show basic use of it, there is a more complex MFC-based GUI app, and there is a PowerPoint presentation to show basic class usage.

There are couple classes (UTF8TextFileReader and UTF8TextFileWriter) that embed the logic of Unicode UTF-16 (i.e. standard Windows Unicode: wchar_t*) to Unicode UTF-8 (char*) conversion.

Writing to file using UTF-8 using that classes is very easy, like:

UTF8TextFileWriter outFile( filename );
...
outFile.WriteLine( /* your Unicode CString here */ )

The library also exports two static methods to convert strings to and from UTF-16/UTF-8:

- UTF8Convert::ToUTF16() : converts UTF-8 --> UTF-16
- UTF8Convert::FromUTF16() : converts UTF-16 --> UTF-8

So you could still use these static methods for Unicode conversion and manage file input/output yourself, if you don't want to use UTF8TextFileWriter/Reader classes.
Note that with an overload of UTF8TextFileWriter constructor you could also specify to write a BOM (i.e. a sequence of bytes at the beginning of file, marking the content of the file as UTF-8).

For more details on Unicode, you may want to read the FAQ at Unicode.org:

http://www.unicode.org/faq/

and I belive that for very detailed questions/answers, you will benefit from Mihai's help here.

Giovanni



This looks very interesting, I will certainly have a look at it.

Thanks for sharing,

Simon

.



Relevant Pages

  • Re: Convert ANSI to UTF-8?
    ... I am writing 'char*' to the file, (not wchar). ... Is there a way of saving my file so it is UTF-8? ... My application uses _UNICODE apart from that file saving class. ... UTF8TextFileWriter outFile(filename); ...
    (microsoft.public.vc.mfc)
  • Re: Unicode Delphi Win32 - which approach
    ... I like the backwards compatibility aspects of UTF-8 vs UTF-16. ... The first 256 Unicode characters map to the ANSI character set. ... entire stream> but calling an API 100 times in a loop I can imagine. ... and explicitly contextualise every string. ...
    (borland.public.delphi.non-technical)
  • Re: Unicode string libraries
    ... UTF-8 is the encoding that must be used ... I initially thought that the variable-length characters ... but also that UTF-8 didn't break when Unicode got extended ...
    (comp.programming)
  • Re: New Years Resolution (was Re: cell phones, was: car help, was: Starving people refuse to eat foo
    ... Its still UTF-8, or rather, a mangled UTF-8, but recognizable to any ... Characters in the range 0-127 require a single byte, ... Unicode is a method of encoding characters with a enough variety to ...
    (rec.arts.sf.written)
  • Re: Unicode string libraries
    ... encoding negotiation. ... old languages which have adopted Unicode without much pain. ... compatibility with too many old programs; but char as a holder for UTF-8 ... The limitations of UTF-16 ...
    (comp.programming)