Re: Regex question



What Jeff is saying is that your approach is backwards. Take a step back and think about what you're trying to do. If you want to pull a valid date from a string, then you have to write a pattern that defines the lexical structure of the date you're trying to extract.

First, here's the easy button: www.regexlib.com

Second, regular expressions are not by any means easy. What you're trying to do is simple by Regex standards, but still requires a lot more specificity than "a digit or a slash". For example, in Regex you can specify quantifiers and alternation constructs.

\d{1,2}(-|/)\d{1,2}(-|/)(\d{2}|\d{4})

Will match a 1 or 2 digit month, a 1 or 2 digit day, and a 2 or 4 digit year - each separated by a - or /.

However, that doesn't guarantee you a valid date. For example, it matches 99/99/9999. Not to mention the fact that it's only valid for US dates with a pretty specific format.

So another poster had recommended a second pass using DateTime.TryParse. Regex will help you get most of the way there but trying to construct a pattern that will ensure a valid date within the range allowed by T-SQL would be a really lousy use of Regex in the first place.

Josh

"tshad" <tfs@xxxxxxxxxxxxxx> wrote in message news:ekbDF9tZJHA.1352@xxxxxxxxxxxxxxxxxxxxxxx

"Jeff Johnson" <i.get@xxxxxxxxxxx> wrote in message news:%23%23J8PgFZJHA.4596@xxxxxxxxxxxxxxxxxxxxxxx
"tshad" <tfs@xxxxxxxxxxxxxx> wrote in message news:OpJPJBkYJHA.5828@xxxxxxxxxxxxxxxxxxxxxxx

This is really a regex question.

I am wonding if anyone knows a good Regex expression that would pull a valid date from a string.

I have used:

strValue = Regex.Replace(valueIn, @"[^\d/]", "");

which works most of the time.

But I have some cases where I have strings like:

05/07/08(-4%)
09/19/08 DOM 55
09/19/2008 DOM 53
FOR 09/15/08 -23

Stop using Replace to get rid of the stuff you don't want, because clearly it's causing problems. Instead, examine all the possible inputs you might get and then craft a regex to EXTRACT those parts. Then TEST what you've extracted to see if it's a date. After all, 99/76/23 might fit the regex, but it isn't a valid date.

And how would you suggest I do that??? These are just examples of some of the inputs I am getting. I can't really get rid of any parts as I don't know what will be where.

I have no control over what the user will enter in this case.

I need to be able to be able to find a date in the input. Using a variety of possible (probable) date formats, I should be able to extract the date from the input - if one exists.

If you are positive that the separator will always be a slash and that you'll only have digits (not 10/Dec/2008), you might get away with this:

That was what I was looking for.

Regex dateRegex = New Regex(@"\d{1,2}/\d{1,2}/\d{2,4}");

Then you'll use the Match() method (or Matches) and see if you get anything, and then Date.TryParse[Exact]() to see if it's a real date.
What I was planning to do - just wasn't sure of the regex.

Thanks,

Tom




.