Re: Interating over the characters in a string

From: Nick Malik (nickmalik_at_hotmail.nospam.com)
Date: 09/23/04


Date: Thu, 23 Sep 2004 12:55:12 GMT

Hi Carlo,

Jon is right... your code, as posted, will convert ("Fubar") to (Fuba)

I, too, am confused by the error you are getting.

Also, when parsing, beware of simple solutions. In the CSV format, a
double-quote character can be embedded within a string. I believe it
appears twice, as in:
"The word ""misspelled"" is often spelled incorrectly"

(Not 100% certain about that, but my memory tells me that this is the case.
Also, commas can occur in the quoted string too, so Split() may not work
very well either.)

Good Luck,
--- Nick

"Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in message
news:MPG.1bbcd392eb51b3598b4d9@msnews.microsoft.com...
> Carlo Razzeto <crazzeto@hotmail.com> wrote:
> > Hello, I have a question in regards to .Net string maniplulation. I have
a
> > question in regards to interating over individual characters in a
string.
> > The problem is I have a CSV parser that will successfully parse out
quoted
> > csv files, the only issue is it will leave the leading and ending quotes
in
> > tact. Before I go on I do realize since it's a CSV I could do
> > stringval.replace( "\"", "" ); but I wanted to take the chance to learn
out
> > to iterate over string values. Anyway, the problem is what I had written
> > originally to do this was:
> >
> > if ( stringval[0] == '"' ) {
> > stringval = stringval.substring( 1, ( stringval.length - 1 ) );
> > }
> > if( stringval[( stringval.length - 1 )] == '"' ) {
> > stringval = stringval.substring( 0, ( stringval.length - 2 ) );
> > }
> >
> > I wasn't stripping off the last " ever and I realize now that the
problem
> > has to do with .Net storing strings in UNICODE, which allows for
character
> > pairs to reprisent a single character. So my question here is, how does
one
> > iterate over the character values in a string and replace it's value if
> > neccessary?
>
> Although Unicode (UTF-16 in particular) allows surrogate pairs, I don't
> think that's the real problem. What exactly are you seeing?
>
> Note that your code as posted above will remove the character *before*
> the final " as well.
>
> --
> Jon Skeet - <skeet@pobox.com>
> http://www.pobox.com/~skeet
> If replying to the group, please do not mail me too



Relevant Pages

  • Re: Pre Delphi 2008-9 Unicode Dos and Donts
    ... and trailing surrogate pairs do not overlap: ... for code like "split this string into individual ... happening on a per character basis, you don't have to worry about the ... pair of a leading and trailing surrogate. ...
    (borland.public.delphi.non-technical)
  • Re: Interating over the characters in a string
    ... I have a question in regards to .Net string maniplulation. ... The problem is I have a CSV parser that will successfully parse out quoted ... pairs to reprisent a single character. ...
    (microsoft.public.dotnet.framework)
  • BCP and CSV - my conclusions
    ... There have been several threads here on reading/writing CSV files ... using BCP, and I suspect that they will continue to crop up for ever. ... contain a double-quotes character need to be wrapped in a translation ... within a field that is enclosed in quotes, and accepts the line-end ...
    (microsoft.public.sqlserver.tools)
  • ANNOUNCE: Text-CSV_XS 0.28
    ... Added t/45_eol.t, eol tests ... surrounding the separation character is removed when parsing. ... even though it violates the CSV specs. ...
    (comp.lang.perl.modules)
  • Re: how can i change the text delimiter
    ... we receive the data in csv format ... ... Defines the character used to quote fields that ... Defines the character that will be used to separate ... Extracts fields from the CSV record in string. ...
    (comp.lang.python)