realbasic-nug
[Top] [All Lists]

Re: finding links with RegEx?

To: REALbasic Network Users Group <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: finding links with RegEx?
From: Thomas Reed <thomasareed at earthlink dot net>
Date: Wed, 27 Feb 2002 21:26:19 -0600
>what happens if, for some reason, someone writes a link like:
>
><A HREF="test test2">?

It matches only "test" as the link, which I think is reasonable since a
space is an illegal character in a URL anyway.

>Also, I may be wrong, but I think it will also match
>
><A HREF="> <hrm lala>
>
>with the link being '> <hrm '

Nope, it just didn't match this.

However, variations of this are a potential problem I hadn't thought about.

>try this RegEx
>
><[^>]*(SRC|HREF|BACKGROUND)[\s\n]*=[\s\n]*(""([^""]*)""|([^\s>]*))[^>]*>

Actually, that doesn't work so well.  In particular, if you miss the
second quote, you get weird behavior.  Take this example:

<P><A HREF="test.html >some text</A></P>

<P><A HREF="another.html">another</A></P>

Your expression above will match a section of text including both A tags.

Here's another try, taking these things into account.  Any other thoughts?

<[^>]*(src|background|href)[\s]*=[\s]*""?([^\s"">]+)[\s""]*[^>]*>

Thanks for everyone's help refining this!  Also, if anyone does any
comparisons of this method to the 2-step method mentioned by someone else
before I do, I'd be curious about the speed difference.

-Thomas

Personal web page:                 http://home.earthlink.net/~thomasareed/
My shareware:            http://home.earthlink.net/~thomasareed/shareware/
Pixel Pen web pub. guide: http://home.earthlink.net/~thomasareed/pixelpen/

I won't rise to the occasion, but I'll slide over to it.



<Prev in Thread] Current Thread [Next in Thread>