Regular Expression to filter tracking parameters f

2019-08-08 02:56发布

I've got strings that contain a tracking-string that i want to remove. Regular expressions seemed to be the best solution but i can't figure a regular expression that will work.

Example URLs:

tracking=foo should be removed where foo can be pretty much anything except &, URLs without tracking shouldn't be touched.

The best shot i got working is /(http:\/\/[^?]*?.*)tracking=[^&]*&?(.*?["|\'])/i but it matches too much with the [^&]*-part thus eliminating everything behind the link if there isn't a second parameter on the URL after the tracking string.

And i'm using it like this at the moment $html contains the whole html for the page to be output and i want to remove the tracking from all urls within:

$html = preg_replace($pattern, '$1$2', $html);

So the minimum the $html would contain would be something like this:

<body>
 <a href="[one of the examples above]">Some Link</a>
</body>

3条回答
混吃等死
2楼-- · 2019-08-08 03:51
/tracking=.*?(?=(&|$|\r|"))/

Should match all tracking=foo variables. Just replace with empty string.

http://regexr.com?30ofo

查看更多
【Aperson】
3楼-- · 2019-08-08 03:55

As a modification to your own regex (http:\/\/[^?]*?.*)(tracking=[^&]*)(.*)?

If it matches remove the second group from the string (the one with the tracking)

查看更多
叛逆
4楼-- · 2019-08-08 03:59

You should do this by parsing the URL, using parse_url and parse_str. It makes things much easier than using a regular expression.

<?php
$params = array();

$url = "http://example.com/bar.php?param=baz&tracking=foo";
$url_parts = parse_url( $url);

parse_str( $url_parts['query'], $params);

// Remove the "tracking" parameter
if( isset( $params['tracking'])) {
    unset( $params['tracking']); 
}

Now you just have to rebuild the string using the parts in $url_parts and the rest of the params in $params. You can do this with http_build_query.

Try something like this, although I haven't tested it so it will need some modifications:

$url = $url_parts['scheme'] . '://' . $url_parts['host'] . $url_parts['path'] . '?' . http_build_query( $params);

For your specific use-case, I would use PHP's DOMDocument class to parse the HTML, then grab all of the URLs from that, then use the above to remove the tracking parameter. However, if you must use a regular expression, you can use a generic regular expression to find just URLs, then apply the above to each URL you find using preg_replace_callback.

查看更多
登录 后发表回答