preg_replace change link from href

2020-06-30 01:32发布

问题:

I need to replace urls in the page taken by curl and add correct link to images and links. My php curl code is:

<?php

$result = '<a href="http://host.org"><img src="./sec.png"></a>
<link href="./styles.css" rel="alternate stylesheet" type="text/css" />
<script type="text/javascript" src="./style.js"></script>';

echo $result;
 if (!preg_match('/src="https?:\/\/"/', $result)) {
        $result = preg_replace('/src="(http:\/\/([^\/]+)\/)?([^"]+)"/', "src=\"http://google.com/\\3\"", $result);
    }
echo $result;
if (!preg_match('/href="https?:\/\/"/', $result)) {
        $result = preg_replace('/href="(http:\/\/([^\/]+)\/)?([^"]+)"/', "href=\"http://google.com/\\3\"", $result);
    }
echo $result;

?>

Output is:

//original links
<a href="http://host.org"><img src="./sec.png"></a>
<link href="./styles.css" type="text/css" />
<script src="./style.js"></script><br />

//fixed SRC path
<a href="http://host.org"><img src="http://google.com/./sec.png"></a>
<link href="./styles.css" type="text/css" />
<script src="http://google.com/./style.js"></script>

//fixed HREF path
<a href="http://google.com//google.com/./sec.png"></a>
<link href="http://google.com/./styles.css" type="text/css" />
<script src="http://google.com/./style.js"></script>

But when the link is "a" it cut all link and left only href value.

//from
<a href="http://host.org"><img src="./sec.png"></a>
//to src fix:
<a href="http://host.org"><img src="http://google.com/./sec.png"></a>
//ERRRROR when href fix make :
<a href="http://google.com//google.com/.sec.png"></a>

Can any body help in fix it. Thank you

回答1:

Remove this unnecessary part from your regexps: ([^/]+)/

It causes your regular expressions to match all the way to the url in the next tag.

Code:

$result = preg_replace('/src="(http:\/\/)?([^"]+)"/', "src=\"http://google.com/\\2\"", $result);
$result = preg_replace('/href="(http:\/\/)?([^"]+)"/', "href=\"http://google.com/\\2\"", $result);

Result:

<a href="http://google.com/host.org"><img src="http://google.com/./sec.png"></a> 
<link href="http://google.com/./styles.css" rel="alternate stylesheet" type="text/css" /> 
<script type="text/javascript" src="http://google.com/./style.js"></script>

But! I think what you really want is a way to replace relative urls with absolute urls. For that you can use these regexp (with this you can skip the if-checks):

$result = preg_replace('/src="(?!http:\/\/)([^"]+)"/', "src=\"http://google.com/\\1\"", $result);
$result = preg_replace('/href="(?!http:\/\/)([^"]+)"/', "href=\"http://google.com/\\1\"", $result);