Re: Usename regex

Tech-Archive recommends: Fix windows errors by optimizing your registry



Hello Alexey,

On Aug 1, 11:08 pm, Alexey Smirnov <alexey.smir...@xxxxxxxxx> wrote:

On Aug 1, 9:32 pm, Jesse Houwing <Jesse.houw...@xxxxxxxxxxxxxxxx>
wrote:

Hello Alexey,

On Aug 1, 8:04 pm, Jesse Houwing <Jesse.houw...@xxxxxxxxxxxxxxxx>
wrote:

Hello Mick,

Hi All,

I would like to know how I can limit users to only registering
usernames
which have alphanumberic characters and one underscore.
From what I understand is I can use a regex to do this. But I
have
no
idea how to do it. I have only used a customfieldvalidor before.
And
the
reason I cant do it now is because of the design of the page
itself.
So basically I need to know how, upon a button click, I can
compare
the text entered in a textbox, to a regex, if its invalid throw a
error, if its valid proceed.
Thanks.

You can indeed use Regex for this. It's actually a simple
expression:

because a regex describes the input from left to right you need to
account for all possible locations of the single '_' you're
allowing.

It would be even simpler to just allow underscores as a rule.

The expression comes down to this:

^(_[a-zA-Z0-9]+|[a-zA-Z0-9]+_?|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$

^ First make sure we're matching from the beginning of the
input.
( We have multiple options
_[a-zA-Z0-9]+ first find the underscore, followed by an
unlimited
number
of alphanumeric characters
| OR....
[a-zA-Z0-9]+_? find an unlimited number of alphanumeric
characters
followed
by an optional underscore
| OR....
[a-zA-Z0-9]+_[a-zA-Z0-9]+ find an unlimited numner of
alphanumeric
characters followed by an underscore, followed by more characters.
) No more options
$ End of the string.
Now add a Regex validator to your page. Set the expression to the
expression above and the control to validate to your textbox and
you're all done.
Jesse- Hide quoted text -

- Show quoted text -

I just can add if the underscore is allowed in the middle of a word
only (I think which is usual for usernames), then expression can be
as

^([a-zA-Z0-9]+_[a-zA-Z0-9]+)$

This regex *forces* the username to contain a '_'

And don't use this:
^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$
As it will cause excessive backtracking and use lot's of CPU under
the wrong circumstances.

Jesse- Hide quoted text -

- Show quoted text -

sorry, it's just a test post: ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$- Hide
quoted text -

- Show quoted text -

hmm, I think I deleted by mistake the (?) from the expression. It's a
silly typo, sorry about this

But I don't get what's wrong with the following expression

^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$

basically, it's a copy of your expression, I just deleted first two
parts from it.




The problem is in the fact that it allows for excessive backtracking. Think of a string, preferably very long that contains only alphanumeric characters, but end in a # sign. This regex will try every combination of the first and the second part until all options are exhausted. This can take quite a while.

I'll try to explain it more graphically so that it's easier to understand.

Usually what would happen is this:

input: aaaaa#
regex: ^([a-zA-Z0-9]+_?[a-zA-Z0-9]+)$

the first part will try to match and captures
aaaaa

Then the parser finds the # sign, it's neither a _ so it skips that, nor another [a-z0-9A-Z], so it backtracks one position.
aaaa

Now the second [a-zA-Z0-9]+ can match the last a
aaaa|a (| is to show which part of the regex matched what).

But now it still cannot match the # at the end. So the engine backtracks one more position and the second part matches the last two a's.
aaa|aa

Still no match...
aa|aaa

Still no match
a|aaaa

It cannot backtrack further. The first [a-aA-Z0-9] isn't optional. The engine finally concludes that no match can be found.

Remember that when the input gets longer and longer the amount of backtracking increases. This is especially bad if there's more regex to come after that...

Now back to my regular expression.

I've split the problem into three possible solutions. The enige will try each solution and if it fails tries the next.

So again:

input: aaaaa#
regex: ^([a-zA-Z0-9]+_?|_[a-zA-Z0-9]+|[a-zA-Z0-9]+_[a-zA-Z0-9]+)$

The engine first tries the first part of the regex [a-zA-Z0-9]+

It captures
aaaaa

but then fails. There's still another character in the input and it's not a _. There's no need for backtracking. There's no other solution possible. Th efirst part of the regex is ignored. So it tries the second part

It captures nothing, the string doesn't start with a _

It tries the last part It again captures aaaaa, but then fails because there's no _.

The end result is the samen: No match.

But my regex took only 3 attempts to find that out. And regardless of the length of the input will keep doing 3 tries.

Yours took 5 tries and will take an extra try for every possible good character at the start of the string. So if you have an input of 100 characters it would do 100 passes.

I hope this helps you in understanding why your regex has some issues ;)

Jesse


.


Quantcast