Re: Challenge - Regular Expression that divides a string at tokens

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Not answering your question as asked, but it seems to me it would be trivial
to do what you want using Mystring.Split(array) where your array has "{" and
"}" in it. Then you could re-concatenate those as would be appropropriate.

Thus you would get an array like:

"This is "
"blue"
", this is "
"red"
", and this is "
"green"
"."
and without much thinking I think you could figure out how to make all the
odd numbered elements of that array start with "{" and end with "}".

Good luck.

Reece

"Roger Frost" <RogerFrost@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:5D396FB2-CE63-436D-AFC9-3CF7073780B0@xxxxxxxxxxxxxxxx
Hi all

I've been messing with this since early yesterday. I thought it might
come
to me in my sleep, but no such luck.

Here is the basic problem, I need to split a given string into sub-strings
of it's "token" and "non-token' parts.

For instance, the string "This is {blue}, this is {red}, and this is
{green}."

Should result in:

"This is "
"{blue}"
", this is "
"{red}"
", and this is "
"{green}"
"."

Now, I can do this in two parts, seperating the tokens from the literals
(the output includes "}" and/or "{" on the literals, but I can deal with
this). What I can't seem to do is combine the two to get the above
results,
which is what I need, it allows me to rebuild the string in the correct
order
easily with minimal code, nevermind that, the important part is that I
need
to do this with Regular Expressions.

Here is a complete example program:

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Press Enter To Start.");
Console.ReadLine();

string mystr = "{id}: The best {item:{category}} of all {items}
in {country}{industry} of the world.";

mystr = mystr.Replace("}{", "} {"); // Just some validation to
make things simpler

string matchTokens = @"{(.+?)(}?)}";
string matchLiterals =
@"^([^{}]+?){|}([^{}]+?){|}([^{}]+?)$|^([^{}]+?)$";

Regex findTokens = new Regex(matchTokens);
Regex findLiterals = new Regex(matchLiterals);

MatchCollection tokens = findTokens.Matches(mystr);
MatchCollection literals = findLiterals.Matches(mystr);

foreach (Match m in tokens)
{ Console.WriteLine(m.Value); }

Console.WriteLine();

foreach (Match m in literals)
{ Console.WriteLine(m.Value); }

Console.WriteLine();

Console.WriteLine("Press Enter To Exit.");
Console.ReadLine();
}
}
}

I've tried the following pattern:

string matchTokens =
@"({(.+?)(}?)})|^([^{}]+?){|}([^{}]+?){|}([^{}]+?)$|^([^{}]+?)$";

It's just a combination of the two, but outputs the same as the
matchTokens
pattern in the example.

If any


.



Relevant Pages

  • Re: Declaring a dynamic pointer to an array of char pointers
    ... I'll expect a variable length string of tokens seperated by white space chars. ... I have read several web documents, pointer tutorials... ... Note that if you could use C++, vectorwould be better choice than raw C-like array of pointers. ...
    (microsoft.public.vc.language)
  • Re: Challenging text masking problem
    ... > create an array of integers representing the positions of %x tokens. ... It returns an array. ... When you split the output string into an array, all you need do is loop through ...
    (microsoft.public.dotnet.languages.vb)
  • Re: C++ dynamic structures
    ... why other than that its to do with my memory operations using ... string linetoken; ... vectorTokens; ... initializes each array element. ...
    (comp.programming)
  • Challenge - Regular Expression that divides a string at tokens
    ... I need to split a given string into sub-strings ... Now, I can do this in two parts, seperating the tokens from the literals ... I've tried the following pattern: ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Challenge - Regular Expression that divides a string at tokens
    ... I've worked out various solutions to do this with methods on the string ... The input string could begin with a token, and tokens can be nested (see the ... The ending algorithm is recursive object creation, ... Thus you would get an array like: ...
    (microsoft.public.dotnet.languages.csharp)