Re: Fast Substring to get URL?

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



Hello Pavel,

On Apr 20, 9:15 am, SnapDive <SnapD...@xxxxxxxxxxxxxxxx> wrote:

I have a line of text like this:

<my_y:dev rel="rdfs:seeAlso"
value="http://testurl.com.local:40/test/images/image1.jpg"; >
I am trying to come up with the fastest or cleanest way to find just
the inner URL part of that, read it into a temp string, and then
replace the

http://testurl.com.local:447/test/images/image1.jpg

with

http://testurl.com/newtest/images2/image1.jpg

How can I do that?

Use System.Uri class, which provides complete facilities for URI
parsing and creation.

Don't use Regex. URIs aren't as simple as they might look, and most
likely you'll get it wrong for subtle corner cases (I had to maintain
a custom regex-based URI parser in the past, and in 2 years after it
was first written, we still kept hitting those corner cases).

From my experience in the SpamAssassin project I'd have to agree, but at
the same time, if the content has a fixed pattern like this, it should be pretty simple to do with a regex. The crux lies in the "fixed pattern" part. If you have control over the text needed to be replaced, then by all means, use the tool you know how to handle best, even if it is a regex. If you do not have control over it, use the most robust method, in this case use the Uri class as Pavel suggested.

I see two options, One:
1) read the content as an XML structure into memory
2) search the values needed to be replaced using XPath
3) replace them, using either Regex or a custom function based on Uri

Two:
1) execute a Regex.Replace action over the whole content
2) if needed use a MatchEvaluator in combination with a Regex or a custom based function on Uri or some other string manipulation

--
Jesse Houwing
jesse.houwing at sogeti.nl


.



Relevant Pages

  • Re: Fast Substring to get URL?
    ... regex is going to be even harder than parsing an URI - in fact, ... If it is all free format xml stuff with validation errors here and there, ...
    (microsoft.public.dotnet.framework)
  • Re: style?: how to convert to boolean: false,no -> 0, yes, true -> 1
    ... >> contain those tokens. ... you need to anchor the regex to force only those ... PT> Hi Uri, ... i forgot the ending anchor. ...
    (comp.lang.perl.misc)
  • Re: Fast Substring to get URL?
    ... if needed use a MatchEvaluator in combination with a Regex or a custom ... It looks like input in this case is relatively free-form XML (i.e. ... regex is going to be even harder than parsing an URI - in fact, ...
    (microsoft.public.dotnet.framework)
  • Re: Split a String
    ... Regex re = new Regex ... Match match = re.Match; ... in diesem Fall ist aber die Uri Klasse wohl der elegantere Weg. ...
    (microsoft.public.de.german.entwickler.dotnet.csharp)