Grabbing the href attribute of an A element

2018-12-31 00:24发布

Trying to find the links on a page.

my regex is:

/<a\s[^>]*href=(\"\'??)([^\"\' >]*?)[^>]*>(.*)<\/a>/

but seems to fail at

<a title="this" href="that">what?</a>

How would I change my regex to deal with href not placed first in the a tag?

标签: php html dom
9条回答
ら面具成の殇う
2楼-- · 2018-12-31 00:46

why don't you just match

"<a.*?href\s*=\s*['"](.*?)['"]"

<?php

$str = '<a title="this" href="that">what?</a>';

$res = array();

preg_match_all("/<a.*?href\s*=\s*['\"](.*?)['\"]/", $str, $res);

var_dump($res);

?>

then

$ php test.php
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(27) "<a title="this" href="that""
  }
  [1]=>
  array(1) {
    [0]=>
    string(4) "that"
  }
}

which works. I've just removed the first capture braces.

查看更多
低头抚发
3楼-- · 2018-12-31 00:46

Using your regex, I modified it a bit to suit your need.

<a.*?href=("|')(.*?)("|').*?>(.*)<\/a>

I personally suggest you use a HTML Parser

EDIT: Tested

查看更多
萌妹纸的霸气范
4楼-- · 2018-12-31 00:46

For the one who still not get the solutions very easy and fast using SimpleXML

$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com

Its working for me

查看更多
时光乱了年华
5楼-- · 2018-12-31 00:46

preg_match_all("/(]>)(.?)(</a)/", $contents, $impmatches, PREG_SET_ORDER);

It is tested and it fetch all a tag from any html code.

查看更多
余生无你
6楼-- · 2018-12-31 00:53

I'm not sure what you're trying to do here, but if you're trying to validate the link then look at PHP's filter_var()

If you really need to use a regular expression then check out this tool, it may help: http://regex.larsolavtorvik.com/

查看更多
流年柔荑漫光年
7楼-- · 2018-12-31 00:57

I agree with Gordon, you MUST use an HTML parser to parse HTML. But if you really want a regex you can try this one :

/^<a.*?href=(["\'])(.*?)\1.*$/

This matches <a at the begining of the string, followed by any number of any char (non greedy) .*? then href= followed by the link surrounded by either " or '

$str = '<a title="this" href="that">what?</a>';
preg_match('/^<a.*?href=(["\'])(.*?)\1.*$/', $str, $m);
var_dump($m);

Output:

array(3) {
  [0]=>
  string(37) "<a title="this" href="that">what?</a>"
  [1]=>
  string(1) """
  [2]=>
  string(4) "that"
}
查看更多
登录 后发表回答