Re: Open html as source



Hi Tobias:

OK, so Open as "Text" and see if you can see what is wrong with the header:
change it, re-save it...

I have a horrible feeling you are running into a built-in limitation of
Word X -- Word X can't display most Unicode.

It would appear that the document might be coded for a double-byte character
set (e.g. a Japanese font). If so, those underscores show that it's
recognising the charset tag perfectly, but Word X can't display those
characters! Word 2004 "can".

I see in a second post you admitted the thing is in Japanese :-) Sorry:
You are going to struggle with that in Word X. Time to upgrade (wait for
2008 -- it will be further improved in its ability to do HTML-y and
Unicode-y things).

Cheers

--

Please reply in the group. Please do NOT email me unless I ask you to.

http://jgmcghie.fastmail.com.au/

John McGhie, Consultant Technical Writer
McGhie Information Engineering Pty Ltd
Sydney, Australia. GMT + 10 Hrs
+61 4 1209 1410, mailto:john@xxxxxxxxxxx

"Tobias Weber" <towb@xxxxxxx> wrote in message
news:towb-9D9464.14041930042007@xxxxxxxxxxxxxxxx
In article <#kr1dUwiHHA.2408@xxxxxxxxxxxxxxxxxxxx>,
"John McGhie [MVP Word, Word Mac]" <john@xxxxxxxxxxx> wrote:

Well, Word does not neet to recognise the "Meta tag", it's the CharSet
tag

I'm sure we both mean <meta http-equiv="content-type"
content="text/html; charset=utf-8">

it needs to get hold of. It should recognise that OK, provided that the

It does when displaying rendered html. "View source" shows wide
characters as two symbols, so I suppose it forgot that it's UTF8.

content really is UTF-8.

It is, although sans BOM.

And there is a way around it: Set Word>Preferences>General> "Confirm
conversions at open" to ON, then use File>Open (MUST be from within Word)
to

That's what I was looking for. Thanks!

open the file. You will then get a dialog asking you what format the
file
is in. Choose UTF-8 at that point.

There is no UTF-8 in the list, only "Unicode Text", which apparently
expects UFT-16 as my document comes out as only underscores.

--
Tobias Weber


.



Relevant Pages

  • [PATCH] UTF-8 input: composing non-latin1 characters, and copy-paste
    ... One can put the keyboard driver into Unicode mode, load a Unicode keymap, and get single keystrokes generate valid UTF-8 for non-ASCII characters. ...
    (Linux-Kernel)
  • Re: Unicode string libraries
    ... UTF-8 is the encoding that must be used ... I initially thought that the variable-length characters ... but also that UTF-8 didn't break when Unicode got extended ...
    (comp.programming)
  • Re: Unicode string libraries
    ... I know that Perl uses UTF-8 as its internal string representation. ... characters defined within the BMP). ... search on UTF-8 encodings is equivalent to a search on Unicode ... it makes sense to choose other criteria for your internal encoding. ...
    (comp.programming)
  • Re: Fast UTF-8 strlen function
    ... >> Is there a fast UTF-8 string length function floating around? ... Length in bytes, or length in characters? ... For UTF-8, the main basic "change" you have to make to your string routines ... then I could individually look up the characters in my UNICODE ...
    (alt.lang.asm)
  • Re: Unicode string libraries
    ... it comes to sequences of characters? ... I know that Perl uses UTF-8 as its internal string representation. ... Ruby just didn't do Unicode. ...
    (comp.programming)