Re: Extract domain names out of URLs



On Wed, 23 Apr 2008 10:23:45 -0400, "Rick Rothstein \(MVP - VB\)"
<rick.newsNO.SPAM@xxxxxxxxxxxxxxxxxx> wrote:

re.Pattern =
"\b((https?|ftp)://)?([\-A-Z0-9.]+)(/[\-A-Z0-9+&@#/%=~_|!:,.;]*)?(\?[\-A-Z0-9+&@#/%=~_|!:,.;]*)?"

Now that is what I miss about Regular Expressions from my days many years
ago working with them in the UNIX world... their clarity and readability.<g>

Rick

<ggg>

And even when you write out the explanation:

===============================
URL capturing

\b((https?|ftp)://)?([-A-Z0-9.]+)(/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?

Options: case insensitive

Assert position at a word boundary «\b»
Match the regular expression below and capture its match into backreference
number 1 «((https?|ftp)://)?»
Between zero and one times, as many times as possible, giving back as needed
(greedy) «?»
Match the regular expression below and capture its match into backreference
number 2 «(https?|ftp)»
Match either the regular expression below (attempting the next
alternative only if this one fails) «https?»
Match the characters ?http? literally «http»
Match the character ?s? literally «s?»
Between zero and one times, as many times as possible, giving back
as needed (greedy) «?»
Or match regular expression number 2 below (the entire group fails if
this one fails to match) «ftp»
Match the characters ?ftp? literally «ftp»
Match the characters ?://? literally «://»
Match the regular expression below and capture its match into backreference
number 3 «([-A-Z0-9.]+)»
Match a single character present in the list below «[-A-Z0-9.]+»
Between one and unlimited times, as many times as possible, giving back
as needed (greedy) «+»
The character ?-? «-»
A character in the range between ?A? and ?Z? «A-Z»
A character in the range between ?0? and ?9? «0-9»
The character ?.? «.»
Match the regular expression below and capture its match into backreference
number 4 «(/[-A-Z0-9+&@#/%=~_|!:,.;]*)?»
Between zero and one times, as many times as possible, giving back as needed
(greedy) «?»
Match the character ?/? literally «/»
Match a single character present in the list below
«[-A-Z0-9+&@#/%=~_|!:,.;]*»
Between zero and unlimited times, as many times as possible, giving back
as needed (greedy) «*»
The character ?-? «-»
A character in the range between ?A? and ?Z? «A-Z»
A character in the range between ?0? and ?9? «0-9»
One of the characters ?+&@#/%=~_|!:,.;? «+&@#/%=~_|!:,.;»
Match the regular expression below and capture its match into backreference
number 5 «(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?»
Between zero and one times, as many times as possible, giving back as needed
(greedy) «?»
Match the character ??? literally «\?»
Match a single character present in the list below
«[-A-Z0-9+&@#/%=~_|!:,.;]*»
Between zero and unlimited times, as many times as possible, giving back
as needed (greedy) «*»
The character ?-? «-»
A character in the range between ?A? and ?Z? «A-Z»
A character in the range between ?0? and ?9? «0-9»
One of the characters ?+&@#/%=~_|!:,.;? «+&@#/%=~_|!:,.;»


Created with RegexBuddy
======================================
--ron
.