Re: Creating ANSI text files with international characters
From: Lars-Erik Aabech (lars-erik.aabech_at_markedspartner.no)
Date: 03/03/04
- Next message: Ryan Gregg: "Re: Deploy the aspx file under apache"
- Previous message: Paul: "SendKeys - PostMessage problem"
- In reply to: Jon Skeet [C# MVP]: "Re: Creating ANSI text files with international characters"
- Next in thread: Lars-Erik Aabech: "Re: Creating ANSI text files with international characters"
- Reply: Lars-Erik Aabech: "Re: Creating ANSI text files with international characters"
- Reply: Lars-Erik Aabech: "Re: Creating ANSI text files with international characters"
- Reply: Jon Skeet [C# MVP]: "Re: Creating ANSI text files with international characters"
- Messages sorted by: [ date ] [ thread ]
Date: Wed, 3 Mar 2004 15:16:13 +0100
OK, First of all, thnx for reading the spec for me ;) *a little ashamed*
I'll just recap and give you my new status / interpretation of my
restrictions.
First of all, I've been using the wrong file extention for vCalendar files,
I've been using the extention for iCalendar files, which can be encoded with
UTF-8. Now that I've changed to the vCalendar format, only different ANSI
codepages is accepted, but are apparently read as ASCII. (am I at least
getting better at this? ;) )
Anyway - using iso-8859-1 encoding with codepage 1252 which is the
encoding/codepage my outlook uses when exporting to .vcf files, and set the
encoding parameter to quoted-printable for the summary/description property
of the vCalendar object - I'm able to use =0D, =3A etc. for special
characters, but =E5 (å) is stripped when I try to open the file in outlook.
There's no apparent relevant difference between a file outlook exports and
reads perfectly with æøå in it, and the files I generate. Both use
iso-8859-1 encoding with codepage 1252. This is how they look:
Outlook's (displayed correctly):
BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook 11.0 MIMEDIR//EN
VERSION:1.0
BEGIN:VEVENT
DTSTART:20040310T070000Z
DTEND:20040310T080000Z
UID:20042603T022612@webcompetence.no
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:M=E5l- og resultatsamtale mellom Lars=
-Erik Aabech og Lars-Erik Aabech=0D=0ATid: 10.03.2004 08:00=0D=0ASam=
taletype: Resultatsamtale=0D=0AKanskje dette funker..=0D=0A
SUMMARY;ENCODING=QUOTED-PRINTABLE:Invitasjon til m=E5l- og resultatsamtale
PRIORITY:3
END:VEVENT
END:VCALENDAR
Mine:
BEGIN:VCALENDAR
VERSION:2.0
METHOD:PUBLISH
BEGIN:VEVENT
UID:20044703T024702@webcompetence.no
LOCATION:
DTSTART:20040310T070000Z
DTEND:20040310T080000Z
DTSTAMP:20040303T134702Z
SUMMARY;ENCODING=QUOTED-PRINTABLE:Invitasjon til m=E5l- og resultatsamtale
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:M=E5l- og resultatsamtale mellom
Lars-Erik Aabech og Lars-Erik Aabech=0DTid: 10.03.2004 08:00=0DSamtaletype:
Resultatsamtale=0D=0DKanskje dette funker..
CLASS:PUBLIC
END:VEVENT
END:VCALENDAR
So, I'm at a complete loss as far as vCalendar files go.
But I found out that iCalendar and vCalendar files use appx. the same syntax
(although I have to admit I haven't read the specs good enough to describe
the differences), and if I export an iCalendar file from outlook it is
encoded with UTF-8 using codepage 65001 - æøå is saved as plain text, and
line-shifts are saved as \n in plain text :)
So, for now I'm gonna change encoding, codepage, syntax & file extention and
pray iCalendar is easier than vCalendar :)
Thanks a lot for your help, Jon! I've learnt a lot today, although I'm
partly giving up :)
Lars-Erik
-
BTW.. here's the file from outlook in iCalendar format :D
BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook 11.0 MIMEDIR//EN
VERSION:2.0
METHOD:PUBLISH
BEGIN:VEVENT
DTSTART:20040310T070000Z
DTEND:20040310T080000Z
TRANSP:OPAQUE
SEQUENCE:0
UID:20044703T024702@webcompetence.no
DTSTAMP:20040303T134702Z
DESCRIPTION:Mål- og resultatsamtale mellom Lars-Erik Aabech og Lars-Erik
Aabech\nTid: 10.03.2004 08:00\nSamtaletype: Resultatsamtale\n\nKanskje
dette funker..\n
SUMMARY:Invitasjon til mål- og resultatsamtale
PRIORITY:5
X-MICROSOFT-CDO-IMPORTANCE:1
CLASS:PUBLIC
END:VEVENT
END:VCALENDAR
"Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in message
news:MPG.1aafe34fe7535be298a2f6@msnews.microsoft.com...
Lars-Erik Aabech <lars-erik.aabech@markedspartner.no> wrote:
> > What exactly do you mean by "it works with any codepage"? *What* works
> > with any codepage?
>
> There's obviously a lot I don't know about encoding (although I'd like
to).
> Do you know some place on the net (not to academic) where I can learn
more?
See http://www.pobox.com/~skeet/csharp/unicode.html - that's my best
explanation, and it's got some other things in as well.
> I though ANSI and ASCII were more or less the same 256 bytes set, except
the
> last x bytes represent different special characters depending on the
> codepage specified.
ASCII is only 7-bit to start with.
Different ANSI code pages tend to share the first 128 values with
ASCII, and then have different values for the last 128 values. That's
what I mean when I say there's no such thing as "the ANSI encoding".
> Anyway - I'm creating a vCalendar file (http://www.imc.org/pdi/) which
will
> be mailed as an attachment to Outlook users (hopefully it will work with
> other apps too). Outlook complains if the file isn't encoded correctly. So
I
> tried to open one of the generated files with notepad, saved it as ANSI
> instead of UTF-8, and then it works. These are the facts I based my
> statements on ;) (works, doesn't work, ansi etc)
Looking at the specification, it seems Outlook is being a little too
generous, but that there's a way you can get round it anyway. From the
spec, section 2.1.5:
<quote>
The default character set is ASCII. The default character set can be
overridden for an individual property value by using the "CHARSET"
property parameter. This property parameter may be used on any
property. However, the use of this parameter on some properties may not
make sense.
Any character set registered with the Internet Assigned Numbers
Authority (IANA) can be specified by this property parameter. For
example, ISO 8859-8 or the Latin/Hebrew character set is specified by:
DESCRIPTION;CHARSET=ISO-8859-8:...
Some transports (e.g., MIME based electronic mail) may also provide a
character set property at the transport wrapper level. This property
can be used in these cases for transporting a vCalendar data stream
that has been defined using a default character set other than ASCII
(e.g., UTF-8).
</quote>
I would suggest that you should output ASCII without any CHARSET= tag
where there are no non-ASCII characters, and use UTF-8 otherwise,
specifying CHARSET=UTF-8.
I would certainly *hope* that would work.
Note section 2.1.4, however, which specifies the encoding for the whole
object - it defaults to only 7 bit.
> > If the receiving application is *really* expecting code page 865, then
> > use Encoding.GetEncoding(865).
>
> I'm getting closer at least..
> I've tried the following, and all the types was accepted by outlook 2003,
> with assorted presentations of the norwegian characters: (?, +, empty, etc
> :) )
>
> System.Text.Encoding enc = System.Text.Encoding.GetEncoding(865);
> System.Text.Encoding enc = System.Text.Encoding.GetEncoding(1252);
> System.Text.Encoding enc = System.Text.Encoding.GetEncoding(20127);
> System.Text.Encoding enc = System.Text.Encoding.GetEncoding("iso-8859-1");
>
> etc. etc.
>
> Which means that outlook don't give a **** what codepage I use.
It must, because you're *potentially* creating different data. What you
might have seen is either Outlook guessing (which means it might guess
it wrong) or you picking encodings which use the same mappings for
those particular characters.
> I've exported a calendar element from outlook with special characters
(while
> writing this post) and it appears I have to replace the special characters
> with '=E6' etc. and insert some more parameters in the vCalendar file.
> Example:
> SUMMARY;ENCODING=QUOTED-PRINTABLE:V=E6ret er r=F8tent i =E5r =C6=D8=C5
> instead of
> SUMMARY:Været er røtent i år ÆØÅ
That would be due to using quoted printable, as specified in section
2.1.4.
> So, the last question I have would be... Anyone got a magic way to do this
> or do I have to do string.replace("æ", "=E6").replace("ø", "=xx").... ???
> (Maybe a loop using String.charCodeAt or such, but still....)
Basically you'd want to look through the created byte array, and any
byte greater than 127 should be quoted - along with '=' presumably (I
haven't checked the quoted printable spec for a while).
-- Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
- Next message: Ryan Gregg: "Re: Deploy the aspx file under apache"
- Previous message: Paul: "SendKeys - PostMessage problem"
- In reply to: Jon Skeet [C# MVP]: "Re: Creating ANSI text files with international characters"
- Next in thread: Lars-Erik Aabech: "Re: Creating ANSI text files with international characters"
- Reply: Lars-Erik Aabech: "Re: Creating ANSI text files with international characters"
- Reply: Lars-Erik Aabech: "Re: Creating ANSI text files with international characters"
- Reply: Jon Skeet [C# MVP]: "Re: Creating ANSI text files with international characters"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|