Re: Regex question




"Josh Einstein" <josheinstein@xxxxxxxxxxx> wrote in message
news:uFiXHJ5ZJHA.1336@xxxxxxxxxxxxxxxxxxxxxxx
What Jeff is saying is that your approach is backwards. Take a step back
and think about what you're trying to do. If you want to pull a valid date
from a string, then you have to write a pattern that defines the lexical
structure of the date you're trying to extract.

First, here's the easy button: www.regexlib.com

Second, regular expressions are not by any means easy. What you're trying
to do is simple by Regex standards, but still requires a lot more
specificity than "a digit or a slash". For example, in Regex you can
specify quantifiers and alternation constructs.

\d{1,2}(-|/)\d{1,2}(-|/)(\d{2}|\d{4})

Will match a 1 or 2 digit month, a 1 or 2 digit day, and a 2 or 4 digit
year - each separated by a - or /.

However, that doesn't guarantee you a valid date. For example, it matches
99/99/9999. Not to mention the fact that it's only valid for US dates with
a pretty specific format.


That is fine. I only need to be able to extract something that looks like a
date. The information is being added into a form that is completely
freeform (have no control over that) and the user can enter whatever he
wants. The user knows it is a date field so most of the time it will have a
valid date in the text somewhere. We then get the information via XML that
we put into SQL. We then have a table with cleaned values, including these
date values that we need for reporting purposes. If it isn't a valid date -
it won't get reported on.

So another poster had recommended a second pass using DateTime.TryParse.

I like this as well and am using both now which gets me what I need. There
will be some problem records but they are a very small amount and won't
affect anything.

Regex will help you get most of the way there but trying to construct a
pattern that will ensure a valid date within the range allowed by T-SQL
would be a really lousy use of Regex in the first place.

I know know why it is a lousy use of it. Seems like a better way to find a
date pattern in my text field then trying to parse the field with for/loops
to find a date.

Thanks,

Tom
Josh

"tshad" <tfs@xxxxxxxxxxxxxx> wrote in message
news:ekbDF9tZJHA.1352@xxxxxxxxxxxxxxxxxxxxxxx

"Jeff Johnson" <i.get@xxxxxxxxxxx> wrote in message
news:%23%23J8PgFZJHA.4596@xxxxxxxxxxxxxxxxxxxxxxx
"tshad" <tfs@xxxxxxxxxxxxxx> wrote in message
news:OpJPJBkYJHA.5828@xxxxxxxxxxxxxxxxxxxxxxx

This is really a regex question.

I am wonding if anyone knows a good Regex expression that would pull a
valid date from a string.

I have used:

strValue = Regex.Replace(valueIn, @"[^\d/]", "");

which works most of the time.

But I have some cases where I have strings like:

05/07/08(-4%)
09/19/08 DOM 55
09/19/2008 DOM 53
FOR 09/15/08 -23

Stop using Replace to get rid of the stuff you don't want, because
clearly it's causing problems. Instead, examine all the possible inputs
you might get and then craft a regex to EXTRACT those parts. Then TEST
what you've extracted to see if it's a date. After all, 99/76/23 might
fit the regex, but it isn't a valid date.

And how would you suggest I do that??? These are just examples of some
of the inputs I am getting. I can't really get rid of any parts as I
don't know what will be where.

I have no control over what the user will enter in this case.

I need to be able to be able to find a date in the input. Using a
variety of possible (probable) date formats, I should be able to extract
the date from the input - if one exists.

If you are positive that the separator will always be a slash and that
you'll only have digits (not 10/Dec/2008), you might get away with this:

That was what I was looking for.

Regex dateRegex = New Regex(@"\d{1,2}/\d{1,2}/\d{2,4}");

Then you'll use the Match() method (or Matches) and see if you get
anything, and then Date.TryParse[Exact]() to see if it's a real date.
What I was planning to do - just wasn't sure of the regex.

Thanks,

Tom






.