IIS 6.0 / UTF-8 Include File Issue

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance

From: MK (mattk_at_nospam.nospam)
Date: 06/01/04


Date: Tue, 1 Jun 2004 12:31:37 +0100

We have multiple sites running off one source built for load balancing
reasons.

The sites are basically the same but with different languages. E.g. .CO.UK
is in English, .FR in French, .CN in Chinese.

All the language is included in variables in UTF-8 include files. The ASP
pages themselves are stored as ANSI.

We've finally got almost everything displaying correctly by setting the
codepage etc. according to language type:

  if UTF8 then
    Response.Charset = "utf-8"
    Response.Codepage = "65001"
    Response.Write "<META HTTP-EQUIV=""Content-Type"" CONTENT=""text/html;
charset=UTF-8"">" & vbCrLf
  else
    Response.Charset = "iso-8859-1"
    Response.Codepage = "1252"
    Response.Write "<META HTTP-EQUIV=""Content-Type"" CONTENT=""text/html;
charset=iso-8859-1"">" & vbCrLf
  end if

The problem is that when the page is delivered as Latin - using the else
clause above, IIS seems to implicitly think the page is UTF for text inputs.

So if a user inputs text on the .FR site (charset=iso-8859-1) all the
accented characters are lost on the input, this is fixed if we change the
charset to UTF-8 or if we remove the UTF-8 includes.

Does anyone have a fix for this?

Just making all the pages UTF-8 causes is other display problems as IIS 6.0
seems to make other assumptions (that previous versions didn't) that cause
other display problems.

It's been suggested internally we store all the text in SQL and write it to
screen from there so we have no UTF-8 includes but that's a 'hammer to crack
a nut' at the moment.

Any input appreciated.

Matt



Relevant Pages

  • Re: IIS 6.0 / UTF-8 Include File Issue
    ... Response.Form - gives the correct encoded string ... > Just making all the pages UTF-8 causes is other display problems as IIS ...
    (microsoft.public.inetserver.iis)
  • Re: LC_CTYPE=UTF-8 in ksh
    ... And the idea of UTF-8 is to be language independent, ... The "UTF-8" encoding is language ... shall define character classification, case conversion, and other ...
    (comp.unix.shell)
  • Re: Is this UTF-8 regular expression semantically correct?
    ... "Would a person who makes a claim that his language allows programmers to program in their ... the language document does NOT specify UTF-8; ... a horrible design blunder, or you are falsely representing that you can support localized ...
    (microsoft.public.vc.mfc)
  • Re: Is this UTF-8 regular expression semantically correct?
    ... regexp library, which suggests you are using UTF-8 internally, which would be wrong. ... write ABC it would appear as CBA because of the left-to-right nature of that language. ... UTF-32 you have not solved this problem. ...
    (microsoft.public.vc.mfc)
  • Re: Is this UTF-8 regular expression semantically correct?
    ... Would C++ permit digits other than ASCII??? ... "Would a person who makes a claim that his language allows programmers to program in their ... the language document does NOT specify UTF-8; ... a compiler which allows localized identifiers, then you say that by "localized identifier" ...
    (microsoft.public.vc.mfc)