Okay, so I thought I would contribute with a little regEx of my own...
<\s*(?=[ABI])\w+\s+(?=[SBH])(src|background|href)\s*=\s*[""']?([-=?\w./:@&';
\(\)%]+)[^>]*>
It may not catch everything, although it is trying hard...
The focus is put on the three main types of links:
A HREF
BODY BACKGROUND
IMG SRC
The positive lookahead is not necessary, but it speeds up things a little.
The second part is an attempt to catch everything... but not too much!
Sample program:
dim rg as regex
dim m as regexMatch
dim s,t as string
dim k as integer
dim t1,t2 as double
s=editField1.text
rg=new regex
rg.searchPattern="<\s*(?=[ABI])\w+\s+(?=[SBH])(src|background|href)\s*=\s*["
"']?([-=?\w./:@&';\(\)%]+)[^>]*>"
rg.options.caseSensitive=false
t1=microseconds
m=rg.search(s)
listBox1.deleteAllRows
while m<>nil
k=k+1
listBox1.addRow m.subExpressionString(2)
m=rg.search(s,len(m.subExpressionString(0))+m.SubExpressionStart(0))
wend
t2=microseconds
t2=t2-t1
staticText1.text=str(t2)
staticText2.text=str(k)+" matches"
HTH
--
Didier Barbas
Dilettante programmer and linguist
http://ww.sungnyemun.org
On 02/28/2002 10:20, "Thomas Reed" <thomasareed at earthlink dot net> wrote:
> Okay, with all the suggestions, I've put together a regular expression
> that appears to work -- but as I don't feel 100% comfortable with
> building regular expressions, I'd like to run it by folks and see if
> anyone can find any problems with it. Here's what I'm doing:
>
> aRegEx.SearchPattern = "<[^>]*(src|
> background|href)[\s\n]*=[\s\n]*""?([^\s""]+)[\s""][^>]*>"
>
> -Thomas
>
> Personal web page: http://home.earthlink.net/~thomasareed/
> My shareware: http://home.earthlink.net/~thomasareed/shareware/
> Pixel Pen web pub. guide: http://home.earthlink.net/~thomasareed/pixelpen/
>
> Any closet is a walk-in closet if you try hard enough.
|