I've got strings that contain a tracking-string that i want to remove. Regular expressions seemed to be the best solution but i can't figure a regular expression that will work.
Example URLs:
- http://example.com?tracking=foo
- http://example.com/bar.html?tracking=foo
- http://example.com?tracking=foo¶m=baz
- http://example.com/bar.php?param=baz&tracking=foo
tracking=foo
should be removed where foo
can be pretty much anything except &
, URLs without tracking shouldn't be touched.
The best shot i got working is /(http:\/\/[^?]*?.*)tracking=[^&]*&?(.*?["|\'])/i
but it matches too much with the [^&]*
-part thus eliminating everything behind the link if there isn't a second parameter on the URL after the tracking string.
And i'm using it like this at the moment $html
contains the whole html for the page to be output and i want to remove the tracking from all urls within:
$html = preg_replace($pattern, '$1$2', $html);
So the minimum the $html would contain would be something like this:
<body>
<a href="[one of the examples above]">Some Link</a>
</body>
Should match all
tracking=foo
variables. Just replace with empty string.http://regexr.com?30ofo
As a modification to your own regex
(http:\/\/[^?]*?.*)(tracking=[^&]*)(.*)?
If it matches remove the second group from the string (the one with the tracking)
You should do this by parsing the URL, using
parse_url
andparse_str
. It makes things much easier than using a regular expression.Now you just have to rebuild the string using the parts in
$url_parts
and the rest of the params in$params
. You can do this withhttp_build_query
.Try something like this, although I haven't tested it so it will need some modifications:
For your specific use-case, I would use PHP's
DOMDocument
class to parse the HTML, then grab all of the URLs from that, then use the above to remove the tracking parameter. However, if you must use a regular expression, you can use a generic regular expression to find just URLs, then apply the above to each URL you find usingpreg_replace_callback
.