realbasic-nug
[Top] [All Lists]

Re: finding links with RegEx?

To: REALbasic Network Users Group <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: finding links with RegEx?
From: Kevin Ballard <kevin at sb dot org>
Date: Wed, 27 Feb 2002 20:32:29 -0500
On 2/27/02 8:20 PM, "Thomas Reed" <thomasareed at earthlink dot net> wrote:

> aRegEx.SearchPattern = "<[^>]*(src|
> background|href)[\s\n]*=[\s\n]*""?([^\s""]+)[\s""][^>]*>"

what happens if, for some reason, someone writes a link like:

<A HREF="test test2">?

It will only match the first test as the link. Also, I may be wrong, but I
think it will also match

<A HREF="> <hrm lala>

with the link being '> <hrm '

try this RegEx (my previously posted one plus a slight modification,
replacement of \x20 with \s, and replacement of \x22 with "" (forgot about
that notation). I actually tested it and it seems to work. As far as I can
tell, this is the best one posted because it matches everything it's
supposed to and blocks out more bad tags than the others would.

<[^>]*(SRC|HREF|BACKGROUND)[\s\n]*=[\s\n]*(""([^""]*)""|([^\s>]*))[^>]*>

-- 
Kevin Ballard
kevin at sb dot org
Email from Korea or China must go to <kevin dot nb at sb dot org>
http://kevin.sb.org/



<Prev in Thread] Current Thread [Next in Thread>