Re: different encoding handling between old ASP and ASP.Net



Hi Mark,

Thanks for your posting.
Yes, I can imagine and believe the screen you got, however, this is infact
not caused by the underlyign charset processing difference between ASP and
ASP.NET. More exactly, this is somewhat caused by the different
globalization support and configuration between ASP and ASP.NET.

In ASP, we have limited configuration on global dev, so generally we have
two things need to set:
1. The codePage value for the serverside page, through
<%@ Language="VBScript" CodePage="65001" %> or
<%
Session.CodePage = 65001
%>

the above two aproach all set the serverside page's request processing
charset to utf-8(code page 65001). So the comming querystring will be
decode as utf-8 encoding. If you don't set either of them, ASP will use
the default charset( your system locale on the server) to decode the string
in the comming request.

In ASP.NET, we don't need to set these, since ASP.NET bydefault use utf-8
as the request/response EncodingCharset, we can find the default setting in
web.config's <globalization> element.

2. When the server page write content to clientside, the browser will
automatically use the proper encoding to display the page, also in ASP we
can use the following code to explicitly set.(If not , the server's default
charset will be used)
<%
Response.Charset = "UTF-8"
%>

In ASP.NET as I mentioned above, the UTF-8 is also the default setting.
Also, this info will indicate the client browser to automatically choose
the correctly encoding to display the page content. If we didn't explicitly
set it, we need to
manually adjust the client browser's view-->encoding to utf-8 to display
the correct content.

Now, as for the byte period you mentiond:

%C7%D1%B1%DB%BA%A3%B3%CA%B9%E6

when using utf-8 to decode them, they'll be parsed as three undiplayable
chars , we should see three empty squares on the page (this is the correct
behavior). We can also confirm this by running the below code in .net's
winform app:
================
byte[] bytes = {0xC7,0xD1,0xB1,0xDB,0xBA,0xA3,0xB3,0xCA,0xB9,0xE6};

string str = System.Text.Encoding.UTF8.GetString(bytes);

MessageBox.Show(string.Format("string:{0}, length:{1}",str,str.Length));
================

The reason why you got different behavior in ASP may caused by the ASP use
your server's system locale to parse the querystring rather than (utf-8).
So I suggest you try the following page which explicitly set the server
page's codepage as utf-8 and response charset to utf-8:
==============================
<%@ Language="VBScript" %>

<%
Session.CodePage = 65001
%>

<%
dim str

str = Request.QueryString("str")

Response.Write("<br>String: " & str)
Response.Write("<br>Length: " & Len(str))

Response.CharSet = "utf-8"
%>

========================

Then, when pass the
%C7%D1%B1%DB%BA%A3%B3%CA%B9%E6

as querystring, we can get three empty squares displayed on page(make sure
the client browser is using utf-8 encoding to display the page) which is
identical to the ASP.NET page(using utf-8 request/response encoding)'s
behavior.

If there're anything unclear or any other related questions, please feel
free to post here. Thanks,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)




.



Relevant Pages

  • Re: different encoding handling between old ASP and ASP.Net
    ... none of the vaporized characters in the original example are ... prohibited from utf-8 per se; what was broken about the original example was ... don't see how that's any less "wrong" than what ASP does. ... throw an invalid format exception when garbarge is fed to an Encoding class. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Encoded chars from form post
    ... We have an asp application which we are building in unicode support. ... select encoding in IE). ... If the encoding is set to utf-8 before you post the data it should ... japanses or chinese characters (seems to b ...
    (microsoft.public.inetserver.asp.general)
  • Re: Get &euro; past a XML parser
    ... >> I'm having troubles getting the euro sign through an XML parser. ... > You need to explicitly declare that the output encoding is UTF-8 because ... But when I tell my browser to use charset UTF-8, ...
    (comp.lang.php)
  • Re: Unicode and html - help for simple web site
    ... Google, which is broken in many ways, thinks it is ISO-8859-5 ... > Worryingly, when auto charset recognition was turned off, the encoding ... > was reported as utf-8: but surely these strings of cp1251 bytes could ...
    (comp.infosystems.www.authoring.html)
  • Re: Input Character Set Handling
    ... "transmit verbatim over network". ... And for sure you have checked *what* charset is indicated in ... at the Encoding item in the menu for IE's ... UTF-8, and a hex dump of the bytes actually sent shows:- ...
    (comp.lang.javascript)