>what happens if, for some reason, someone writes a link like:
>
><A HREF="test test2">?
It matches only "test" as the link, which I think is reasonable since a
space is an illegal character in a URL anyway.
>Also, I may be wrong, but I think it will also match
>
><A HREF="> <hrm lala>
>
>with the link being '> <hrm '
Nope, it just didn't match this.
However, variations of this are a potential problem I hadn't thought about.
>try this RegEx
>
><[^>]*(SRC|HREF|BACKGROUND)[\s\n]*=[\s\n]*(""([^""]*)""|([^\s>]*))[^>]*>
Actually, that doesn't work so well. In particular, if you miss the
second quote, you get weird behavior. Take this example:
<P><A HREF="test.html >some text</A></P>
<P><A HREF="another.html">another</A></P>
Your expression above will match a section of text including both A tags.
Here's another try, taking these things into account. Any other thoughts?
<[^>]*(src|background|href)[\s]*=[\s]*""?([^\s"">]+)[\s""]*[^>]*>
Thanks for everyone's help refining this! Also, if anyone does any
comparisons of this method to the 2-step method mentioned by someone else
before I do, I'd be curious about the speed difference.
-Thomas
Personal web page: http://home.earthlink.net/~thomasareed/
My shareware: http://home.earthlink.net/~thomasareed/shareware/
Pixel Pen web pub. guide: http://home.earthlink.net/~thomasareed/pixelpen/
I won't rise to the occasion, but I'll slide over to it.
|