On 2/27/02 10:26 PM, "Thomas Reed" <thomasareed at earthlink dot net> wrote:
> <P><A HREF="test.html >some text</A></P>
>
> <P><A HREF="another.html">another</A></P>
>
> Your expression above will match a section of text including both A tags.
Which is exactly how OmniWeb interprets it. I felt that if a web browser
interprets it this way, then some RegEx code certainly has license to treat
it that way.
> Here's another try, taking these things into account. Any other thoughts?
>
> <[^>]*(src|background|href)[\s]*=[\s]*""?([^\s"">]+)[\s""]*[^>]*>
You are allowed to use the char '>' in urls, although it would probably
produce an invalid URL. I think javascript is allowed in URLs, if you type
javascript: at the beginning, so you don't want to stop at spaces or >
inside a quoted string. I suggest going with the last string I suggested to
keep behaviour consistent with OmniWeb and allow javascript urls to include
spaces and the '>' character.
--
Kevin Ballard
kevin at sb dot org
Email from Korea or China must go to <kevin dot nb at sb dot org>
http://kevin.sb.org/
|