I am presented with an HTML document similar to this in view source mode (the below is simplified for brevity):
<html>
<head>
<title>System version: {{variable:system_version}}</title>
</head>
<body>
<p>You are using system version {{variable:system_version}}</p>
{{block:welcome}}
<form>
<input value="System version: {{variable:system_version}}">
<textarea>
You are using system version {{variable:system_version}}.
</textarea>
</form>
</body>
</html>
I have written some functions that can replace these {{...}}
type strings, but they need to be replaced selectively.
In the example above, I want it replaced in <title>
and in <p>
, but not in <input>
and <textarea>
as this is user-provided input, that would be inserted via a wysiwyg editor or form, and must be saved as received from the user. The {{block:welcome}}
must also be replaced with whatever content it contains.
When rendering my output, I will sanitize it, then result should be something like this:
<html>
<head>
<title>System version: 6.0</title>
</head>
<body>
<p>You are using system version 6.0</p>
<div>
This was the content of the welcome block.
</div>
<form>
<input value="System version: {{variable:system_version}}">
<textarea>
You are using system version {{variable:system_version}}.
</textarea>
</form>
</body>
</html>
Here is what I have tried. For the below code, $var's value is '6.0' and $val's value = '{{variable:system_version}}', and $data is the entire string to be searched:
if (!preg_match('/<textarea|<input|<select(.+?)' . $val . '(.+?)<\/textarea|<\/input|<\/select\>/s', $data)) {
$data = str_replace($val, $var, $data);
}
Please advise what is wrong with my regex, as it currently replaces nothing whatsoever, so the if
condition is never matched. If I do the str_replace
without the if
, the replacements are made, in all cases.
EDIT 1
After some assistance by @Emma, the replacement still does not work. The below is the code that does the replacement as it stands:
function replace_variable($matches, $data)
{
$ci =& get_instance();
if (!empty($matches['variable_matches'])) {
foreach ($matches['variable_matches'][0] as $key => $val) {
$vals = explode(':', $val);
$ci->load->module('core');
$var = $ci->core->get_variable(rtrim($vals[1], '}}'));
$re1 = '/<(?:textarea|select)[\s\S]*?>[\s\S]*?(' . $val . ')[\s\S]*?<\/(?:textarea|select)>/';
$re2 = '/<(?:input)[\s\S]*?(' . $val . ')[\s\S]*?>/';
if (!preg_match($re1, $data) && !preg_match($re2, $data)) {
$data = str_replace($val, $var, $data);
}
}
}
return $data;
}
Here are the output values of the matches found via preg_match, and then I am trying to replace via str_replace where NOT inside a form tag (select/textarea/input).
Array
(
[0] => Array
(
[0] => {{variable:system_version}}
[1] => {{variable:system_version}}
[2] => {{variable:system_version}}
[3] => {{variable:system_version}}
)
[1] => Array
(
[0] => system_version
[1] => system_version
[2] => system_version
[3] => system_version
)
)
So - there are four matches on the page where I try to replace, two of them inside form tags, the other two not. The check is done on the entire output that is buffered, and contains all four elements, but somehow, the preg_match triggers for all of them, despite the regex. Any ideas what I am doing wrong?
My guess is that you are likely designing an expression similar to:
which you might probably want to modify it, and then replace with what you like to replace.
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Test
RegEx Circuit
jex.im visualizes regular expressions:
Edit for two steps:
Demo 1
Demo 2
Demo 3
I was about to post an answer on your next question but Casimir closed it before I got the chance. I am coming back here to post a proper html parse-then-replace technique for the benefit of researchers and you.
Code: (Demo)
Output:
There aren't too many tricks involved.
Parse the HTML with DOMDocument and write a filtering query with XPath which requires nodes to not be textarea|select|input tags and they must contain
{{{
in their text. There will be several "magical" ways to filter the dom -- this is just one way that feels efficient/direct to me.I use
preg_replace_callback()
to perform replacements based on a lookup array.To avoid
use()
in the callback syntax, I make the lookup available inside the callback's scope by declaring it as a constant (I can't imagine you need it to be a variable anyhow).I found during testing that DOMDocument didn't like the
<section>
tags, so I silenced the complaints withlibxml_use_internal_errors(true);
.