Re: different encoding handling between old ASP and ASP.Net



Mark wrote:

> Hi Joerg...
>
> Actually, none of the vaporized characters in the original example
> are prohibited from utf-8 per se; what was broken about the original
> example was that %C7 was followed by %D1; to be legal utf-8, it
> would have to have been followed by %BF or lower.

Yep -- I was talking about bytes, not characters.

> Taken together, the example string that was supposed to be utf-8 *as
> a whole* is invalid, and the question was more about what's the
> appropriate way to respond ot that. ASP responded to an invalid
> utf-8 string by not trying to find valid bits in it but by giving as
> close to a "raw" approximation as it could.
>
> ASP.Net treats it like panning for gold. It sifts through the stream
> until it finds byte combos that are legal, keeps those, and drops the
> rest. It doesn't even put in ? as a placeholder, like so many of the
> other apis do. I don't see how that's any less "wrong" than what ASP
> does.

As I pointed put, replacement characters are misleading, because you
have no idea whether the '?' is genuine or a replacement.

What we really need here is a HttpRequest property that indicates
whether form data or the query string were decoded without skipping
input bytes.

Cheers,

--
http://www.joergjooss.de
mailto:news-reply@xxxxxxxxxxxxx
.



Relevant Pages

  • Re: Fast UTF-8 strlen function
    ... >> Is there a fast UTF-8 string length function floating around? ... Length in bytes, or length in characters? ... For UTF-8, the main basic "change" you have to make to your string routines ... then I could individually look up the characters in my UNICODE ...
    (alt.lang.asm)
  • Re: Writing Japanese or Chinese strings in a text file
    ... characters on the screen. ... start of the file that flags the data as UTF-8. ... VB uses Unicode internally, for 'String' data in memory. ... So they are right in the excel file. ...
    (microsoft.public.vb.general.discussion)
  • Re: CString and UTF-8
    ... installing a machine that isn't a standard locale and see how CString ... as multibyte characters, there are two chinese characters in there. ... ASCII string with UTF-8 embedded into it, delimited with quotes, the ...
    (microsoft.public.vc.mfc)
  • Re: Unicode string libraries
    ... it comes to sequences of characters? ... I know that Perl uses UTF-8 as its internal string representation. ... Ruby just didn't do Unicode. ...
    (comp.programming)
  • Re: Unicode question
    ... You're missing the existence and continued use of UTF-8 and UTF-16. ... I want to use an extended string that's good enough to cover UCS-4 ... sometimes multiple characters must be treated as one unit. ...
    (borland.public.delphi.non-technical)