Re: Extract domain names out of URLs
- From: Ron Rosenfeld <ronrosenfeld@xxxxxxxxxx>
- Date: Wed, 23 Apr 2008 14:33:31 -0400
On Wed, 23 Apr 2008 10:23:45 -0400, "Rick Rothstein \(MVP - VB\)"
<rick.newsNO.SPAM@xxxxxxxxxxxxxxxxxx> wrote:
re.Pattern =
"\b((https?|ftp)://)?([\-A-Z0-9.]+)(/[\-A-Z0-9+&@#/%=~_|!:,.;]*)?(\?[\-A-Z0-9+&@#/%=~_|!:,.;]*)?"
Now that is what I miss about Regular Expressions from my days many years
ago working with them in the UNIX world... their clarity and readability.<g>
Rick
<ggg>
And even when you write out the explanation:
===============================
URL capturing
\b((https?|ftp)://)?([-A-Z0-9.]+)(/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?
Options: case insensitive
Assert position at a word boundary «\b»
Match the regular expression below and capture its match into backreference
number 1 «((https?|ftp)://)?»
Between zero and one times, as many times as possible, giving back as needed
(greedy) «?»
Match the regular expression below and capture its match into backreference
number 2 «(https?|ftp)»
Match either the regular expression below (attempting the next
alternative only if this one fails) «https?»
Match the characters ?http? literally «http»
Match the character ?s? literally «s?»
Between zero and one times, as many times as possible, giving back
as needed (greedy) «?»
Or match regular expression number 2 below (the entire group fails if
this one fails to match) «ftp»
Match the characters ?ftp? literally «ftp»
Match the characters ?://? literally «://»
Match the regular expression below and capture its match into backreference
number 3 «([-A-Z0-9.]+)»
Match a single character present in the list below «[-A-Z0-9.]+»
Between one and unlimited times, as many times as possible, giving back
as needed (greedy) «+»
The character ?-? «-»
A character in the range between ?A? and ?Z? «A-Z»
A character in the range between ?0? and ?9? «0-9»
The character ?.? «.»
Match the regular expression below and capture its match into backreference
number 4 «(/[-A-Z0-9+&@#/%=~_|!:,.;]*)?»
Between zero and one times, as many times as possible, giving back as needed
(greedy) «?»
Match the character ?/? literally «/»
Match a single character present in the list below
«[-A-Z0-9+&@#/%=~_|!:,.;]*»
Between zero and unlimited times, as many times as possible, giving back
as needed (greedy) «*»
The character ?-? «-»
A character in the range between ?A? and ?Z? «A-Z»
A character in the range between ?0? and ?9? «0-9»
One of the characters ?+&@#/%=~_|!:,.;? «+&@#/%=~_|!:,.;»
Match the regular expression below and capture its match into backreference
number 5 «(\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?»
Between zero and one times, as many times as possible, giving back as needed
(greedy) «?»
Match the character ??? literally «\?»
Match a single character present in the list below
«[-A-Z0-9+&@#/%=~_|!:,.;]*»
Between zero and unlimited times, as many times as possible, giving back
as needed (greedy) «*»
The character ?-? «-»
A character in the range between ?A? and ?Z? «A-Z»
A character in the range between ?0? and ?9? «0-9»
One of the characters ?+&@#/%=~_|!:,.;? «+&@#/%=~_|!:,.;»
Created with RegexBuddy
======================================
--ron
.
- References:
- Extract domain names out of URLs
- From: MikeB
- Re: Extract domain names out of URLs
- From: Ron Rosenfeld
- Re: Extract domain names out of URLs
- From: Rick Rothstein \(MVP - VB\)
- Extract domain names out of URLs
- Prev by Date: Re: code gives an error
- Next by Date: Sort Worksheets with a Workbook
- Previous by thread: Re: Extract domain names out of URLs
- Next by thread: Re: Extract domain names out of URLs
- Index(es):