realbasic-nug
[Top] [All Lists]

Re: finding links with RegEx?

To: REALbasic Network Users Group <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: finding links with RegEx?
From: Kevin Ballard <kevin at sb dot org>
Date: Wed, 27 Feb 2002 22:49:28 -0500
On 2/27/02 10:26 PM, "Thomas Reed" <thomasareed at earthlink dot net> wrote:

> <P><A HREF="test.html >some text</A></P>
> 
> <P><A HREF="another.html">another</A></P>
> 
> Your expression above will match a section of text including both A tags.

Which is exactly how OmniWeb interprets it. I felt that if a web browser
interprets it this way, then some RegEx code certainly has license to treat
it that way.

> Here's another try, taking these things into account.  Any other thoughts?
> 
> <[^>]*(src|background|href)[\s]*=[\s]*""?([^\s"">]+)[\s""]*[^>]*>

You are allowed to use the char '>' in urls, although it would probably
produce an invalid URL. I think javascript is allowed in URLs, if you type
javascript: at the beginning, so you don't want to stop at spaces or >
inside a quoted string. I suggest going with the last string I suggested to
keep behaviour consistent with OmniWeb and allow javascript urls to include
spaces and the '>' character.

-- 
Kevin Ballard
kevin at sb dot org
Email from Korea or China must go to <kevin dot nb at sb dot org>
http://kevin.sb.org/



<Prev in Thread] Current Thread [Next in Thread>