Re: client side script and encoding

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Hi Wen Yuan...

I have to agree with Anthony. While the ASP.Net 3.5 patch is interesting,
the only reason why I put the client-side script generation in an aspx page
was to have better control of the IIS output encoding than a flat file gives
me

This is *entirely* a client-side (IE) issue. The script comes down entirely
valid in the declared encoding, but it's IE's *application* of the code that
gets it wrong.

Originally, I tried to get IIS to replicate the whole thing with flat files,
but was disappointed to find that IIS doesn't appear to apply the rudimentary
encoding detection that even notepad provides. It just serves bytes off
disk and doesn't try to figure any of it out and set headers accordingly.

Interestingly, I found that if I saved the client script from notepad in
utf-8, I got the BOM in the file. Seeing the BOM in the stream, IE seems to
do the right thing. Having the HTTP content headers set properly and *not
having the BOM* (which is how asp(x) pages work) appears to make IE get it
wrong.

Bottom line, it appears that IE pays no attention to/doesn't trust an
explicit charset http header for anything except form encoding for uploads.

The aspx page is delivering valid utf-8 encoding (except for the BOM) and
saying so in the headers but IE is ignoring that.

I suppose another work-around would be to have the aspx page emit the BOM
manually before doing any other output.

To see what I'm talking about, make a test.js file by removing all the
Response.Write etc code in the test.aspx sample. Save test.js as utf-8 (you
can see the 1st 2 bytes are a marker). Change test.html to referenct test.js
instead of test.aspx.

Thanks
Mark


"WenYuan Wang [MSFT]" wrote:

Hello Anthony,
Thanks for your suggestion.

I'm sorry, but I cannot paste the bug id in reply. But if you are
interested in it, I can provide a repro sample.

testScript.aspx
<%@ Page language="c#" EnableSessionState="false"%>
<%
Response.Write ("document.write('<div>");
Response.Write ("D¨¦cid¨¦ ¨¤ ¨ºtre le pr¨¦sident qui r¨¦formera la France,
M.
Sarkozy ");
Response.Write ("a fait de la revalorisation du travail sa priorit¨¦.");
Response.Write ("</div>')");
%>

testHTML.htm
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html>
<head><meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type"/></head>
<body>
<script src="testScript.aspx" language="JavaScript"></script>
</body>
</html>

When browse testHTML.htm in ASP.net 2.0,we will notice characters are
getting garbled.
However, it will work fine if the page is running against ASP.net 3.5 beta
2.

Have a great day,
Best regards,

Wen Yuan
Microsoft Online Community Support
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.


.



Relevant Pages

  • Page encoding and browsers (IE in particular)
    ... deficiencies in the way IE and IIS interact. ... The nub of it is this: ASP.Net explicitly sets an output encoding header ... look and output a BOM if it's utf-*? ... header is right, but when the client-side script is executed, IE is ignoring ...
    (microsoft.public.dotnet.framework.aspnet)
  • UTF - SEEK_SET workaround for BOM encoding(utf-16/32) layer Bug
    ... Here is a way to do it by analysing the BOM in the character semantics ... of the encoding layer on the open handle without touching any layers ...
    (comp.lang.perl.misc)
  • Re: UTF - SEEK_SET workaround for BOM encoding(utf-16/32) layer Bug
    ... The passed in handle can have any encoding, ... There won't be a bom on it so SEEK_SET is just 0. ... # are used when analysing the BOM. ... ## Read in $MAX_BOM_LENGTH characters. ...
    (comp.lang.perl.misc)
  • Re: Efficiently concatenating contents of multiple files
    ... real life solutions to real world problems and the only efficient way ... array based on the encoding. ... might check to see if your source files were using different BOMs), ... you can use a binary transfer, skipping the BOM character from all but the ...
    (comp.lang.java.programmer)
  • RE: UTF-8 Encoding
    ... As for the encoding for ASPX page in VS.NET/ ASP.NET RUNTIME, ... The strings we hardcoded in code file .cs or .vb are compiled into bytes at ... UTF-8 encoded and batch convert a set of file to UTF-8? ...
    (microsoft.public.dotnet.framework.aspnet)