Preg_match_all <a

Hello i want to extract links <a href="/portal/clients/show/entityId/2121" > and i want a regex which givs me /portal/clients/show/entityId/2121 the number at last 2121 is in other links different any idea?

标签： php preg-match hyperlink

6条回答

甜甜的少女心

2楼-- · 2019-01-09 18:56

Paring links from HTML can be done using am HTML parser.

When you have all links, simple get the index of the last forward slash, and you have your number. No regex needed.

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

3楼-- · 2019-01-09 18:58

Simple PHP HTML Dom Parser example:

// Create DOM from string
$html = str_get_html($links);

//or
$html = file_get_html('www.example.com');

foreach($html->find('a') as $link) {
    echo $link->href . '<br />';
}

0人赞添加讨论(0) 举报

爷、活的狠高调

4楼-- · 2019-01-09 19:00

This is my solution:

<?php
// get links
$website = file_get_contents("http://www.example.com"); // download contents of www.example.com
preg_match_all("<a href=\x22(.+?)\x22>", $website, $matches); // save all links \x22 = "

// delete redundant parts
$matches = str_replace("a href=", "", $matches); // remove a href=
$matches = str_replace("\"", "", $matches); // remove "

// output all matches
print_r($matches[1]);
?>

I recommend to avoid using xml-based parsers, because you will not always know, whether the document/website has been well formed.

Best regards

0人赞添加讨论(0) 举报

一纸荒年 Trace。

5楼-- · 2019-01-09 19:04

When "parsing" html I mostly rely on PHPQuery: http://code.google.com/p/phpquery/ rather then regex.

0人赞添加讨论(0) 举报

冷血范

6楼-- · 2019-01-09 19:05

Regex for parsing links is something like this:

'/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'

Given how horrible that is, I would recommend using Simple HTML Dom for getting the links at least. You could then check links using some very basic regex on the link href.

0人赞添加讨论(0) 举报

劳资没心，怎么记你

7楼-- · 2019-01-09 19:18

Don't use regular expressions for proccessing xml/html. This can be done very easily using the builtin dom parser:

$doc = new DOMDocument();
$doc->loadHTML($htmlAsString);
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//a/@href');
for ($i = 0; $i < $nodeList->length; $i++) {
    # Xpath query for attributes gives a NodeList containing DOMAttr objects.
    # http://php.net/manual/en/class.domattr.php
    echo $nodeList->item($i)->value . "<br/>\n";
}

0人赞添加讨论(0) 举报

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间