Re: different encoding handling between old ASP and ASP.Net
- From: "Mark" <mmodrall@xxxxxxxxxxxxx>
- Date: Mon, 6 Jun 2005 13:57:05 -0700
Hi Joerg...
Actually, none of the vaporized characters in the original example are
prohibited from utf-8 per se; what was broken about the original example was
that %C7 was followed by %D1; to be legal utf-8, it would have to have been
followed by %BF or lower.
Taken together, the example string that was supposed to be utf-8 *as a
whole* is invalid, and the question was more about what's the appropriate way
to respond ot that. ASP responded to an invalid utf-8 string by not trying
to find valid bits in it but by giving as close to a "raw" approximation as
it could.
ASP.Net treats it like panning for gold. It sifts through the stream until
it finds byte combos that are legal, keeps those, and drops the rest. It
doesn't even put in ? as a placeholder, like so many of the other apis do. I
don't see how that's any less "wrong" than what ASP does.
What perplexes me more is why the discontinuity? It's just another thing
that won't work the same way when migrating from ASP to ASP.Net. If there's
a rationalization why picking out bits and pieces from an invalid stream is
better than not trying to translate it at all, I'd be curious to know.
If I were God, I'd say that the "right" way to do it in .Net would be to
throw an invalid format exception when garbarge is fed to an Encoding class.
But given how expensive Exception processing is, I could understand why they
might not want to do that. Next down on my most "right" list would be to
have HttpUtility.UrlDecode() return an instance of an object where one
member would be the successfully translated string (if any) and another
member would be an array of the raw bytes. Then you could test the result
and make use of the bits if you chose.
Thanks
_mark
"Joerg Jooss" wrote:
> Mark wrote:
>
> > Hi...
> >
> > Just noticed something odd... In old ASP if you had query parameters
> > that were invalid for their encoding (broken utf-8, say), ASP would
> > give you back chars representing the 8-bit byte value of the broken
> > encoding, so you still got something for every input byte.
> >
> > This appears to have changed radically in ASP.Net, going down to the
> > base System.Text.Encoding object. Now, it appears to simply vaporize
> > bytes that don't fit in the encoding. You don't even get a ?
> > placeholder like you get in so many other contexts in asp.
> >
> > Could anyone explain why there was such a dramatic change in the
> > handling of error cases? Is there a way using the .net framework to
> > know if you had an encoding error?
> >
> > An example of the input:
> > /test.aspx?query=%C7%D1%B1%DB%BA%A3%B3%CA%B9%E6
> > In the above, C7, A3, B3, and E6 don't make a valid utf-8 stream, but
> > looking for Request.QueryString ("query") gives me the decoded
> > version, just missing any representation of the offending characters,
> > i.e. three characters 1137, 1786, and 697 (which don't render in IE
> > either by the way).
> >
> > Request.QueryString ("query") in ASP would yield a 10-character
> > string, with each of the original bytes converted to the raw 8-bit
> > value.
>
> What does that mean? 0xC7, 0xA3, 0xB3, and 0xE6 are all meaningless in
> UTF-8. There's no way to replace these bytes with a replacement
> character, because that character's meaning would be ambiguous -- is it
> the real character or a replacement? Whatever ASP does in this
> situation, it's wrong.
>
> Cheers,
>
> --
> http://www.joergjooss.de
> mailto:news-reply@xxxxxxxxxxxxx
>
.
- Follow-Ups:
- Re: different encoding handling between old ASP and ASP.Net
- From: Joerg Jooss
- Re: different encoding handling between old ASP and ASP.Net
- From: Steven Cheng[MSFT]
- Re: different encoding handling between old ASP and ASP.Net
- References:
- different encoding handling between old ASP and ASP.Net
- From: Mark
- Re: different encoding handling between old ASP and ASP.Net
- From: Joerg Jooss
- different encoding handling between old ASP and ASP.Net
- Prev by Date: Question on Repeater DataItem
- Next by Date: Re: Catch duplicate primary key exeption
- Previous by thread: Re: different encoding handling between old ASP and ASP.Net
- Next by thread: Re: different encoding handling between old ASP and ASP.Net
- Index(es):
Relevant Pages
|