realbasic-nug
[Top] [All Lists]

Re: finding links with RegEx?

To: REALbasic Network Users Group <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: finding links with RegEx?
From: Didier BARBAS <lists at sungnyemun dot org>
Date: Thu, 28 Feb 2002 13:30:50 +0900
Well, you can still extend the pattern, if you find it limiting. It was
meant as an example...
As for SRC, the only other place I know is after <SCRIPT, and I assumed
(wrongly, maybe) that you didn't want the links for javascript files.
There is maybe on other place you want to check, the value="..." of embedded
applets and such. For flash thingies, it is a little bit more complex, since
the <param name=movie value="xxx.swf"> is only a link to a .swf animation,
that has to be downloaded and inspected to find the link. If you are brave
enough, look for:
urlxxx.htmlwindowxxxFrame
inside the .swf file to get url and target info.
But that would have to be dealt separately.
Javascript calls are catched if my regex (that's what the \( \) : and ; are
here for...), so this is not an issue...
I don't see any other things to catch, but, then again, I am not an HTML
expert.

This will enable you to catch also flash <param name=movie value="xxx.swf">:

<\s*(?=[ABIP])\w+\s+([-=\w./:@&';\(\)%]+\s+)*(?=[VSBH])(src|background|href|
value)\s*=\s*[""']?([-=?\w./:@&';\(\)%]+)[^>]*>

HTH
-- 
Didier Barbas
Dilettante programmer and linguist
http://ww.sungnyemun.org

On 02/28/2002 12:30, "Thomas Reed" <thomasareed at earthlink dot net> wrote:

>> The focus is put on the three main types of links:
>> A HREF
>> BODY BACKGROUND
>> IMG SRC
> 
> Well, this is a bit limiting.  I know that the SRC attribute is not
> unique to the IMG tag.  I believe it occurs in a number of URLs, and my
> goal would be to match it in all of those as well as in the IMG tag.
> 
> Speaking of this sort of thing, am I missing any other attributes (other
> than HREF, BACKGROUND and SRC) that contain URLs?  It's been a while
> since I worked with HTML on this level...
> 
> -Thomas



<Prev in Thread] Current Thread [Next in Thread>