Replacing Tags with Includes in PHP with RegExps

2019-09-06 09:22发布

I need to read a string, detect a {VAR}, and then do a file_get_contents('VAR.php') in place of {VAR}. The "VAR" can be named anything, like TEST, or CONTACT-FORM, etc. I don't want to know what VAR is -- not to do a hard-coded condition, but to just see an uppercase alphanumeric tag surrounded by curly braces and just do a file_get_contents() to load it.

I know I need to use preg_match and preg_replace, but I'm stumbling through the RegExps on this.

How is this useful? It's useful in hooking WordPress.

5条回答
在下西门庆
2楼-- · 2019-09-06 09:33

You can do it without regexes (god forbid), something like:

//return true if $str ends with $sub
function endsWith($str,$sub) {
    return ( substr( $str, strlen( $str ) - strlen( $sub ) ) === $sub );
}

$theStringWithVars = "blah.php cool.php awesome.php";
$sub = '.php';
$splitStr = split(" ", $theStringWithVars);
for($i=0;$i<count($splitStr);$i++) {
    if(endsWith(trim($splitStr[$i]),$sub)) {
        //file_get_contents($splitStr[$i]) etc...
    }    
}
查看更多
forever°为你锁心
3楼-- · 2019-09-06 09:34

Off the top of my head, you want this:

// load the "template" file
$input = file_get_contents($template_file_name);

// define a callback. Each time the regex matches something, it will call this function.
// whatever this function returns will be inserted as the replacement
function replaceCallback($matches){
  // match zero will be the entire match - eg {FOO}. 
  // match 1 will be just the bits inside the curly braces because of the grouping parens in the regex - eg FOO
  // convert it to lowercase and append ".html", so you're loading foo.html

  // then return the contents of that file.
  // BEWARE. GIANT MASSIVE SECURITY HOLES ABOUND. DO NOT DO THIS
  return file_get_contents( strtolower($matches[1]) . ".html" );
};
// run the actual replace method giving it our pattern, the callback, and the input file contents
$output = preg_replace_callback("\{([-A-Z]+)\}", replaceCallback, $input);

// todo: print the output

Now I'll explain the regex

 \{([-A-Z]+)\}
  • The \{ and \} just tell it to match the curly braces. You need the slashes, as { and } are special characters, so they need escaping.
  • The ( and ) create a grouping. Basically this lets you extract particular parts of the match. I use it in the function above to just match the things inside the braces, without matching the braces themselves. If I didn't do this, then I'd need to strip the { and } out of the match, which would be annoying
  • The [-A-Z] says "match any uppercase character, or a -
  • The + after the [-A-Z] means we need to have at least 1 character, but we can have up to any number.
查看更多
smile是对你的礼貌
4楼-- · 2019-09-06 09:34

Comparatively speaking, regular expression are expensive. While you may need them to figure out which files to load, you certainly don't need them for doing the replace, and probably shouldn't use regular expressions. After all, you know exactly what you are replacing so why do you need fuzzy search?

Use an associative array and str_replace to do your replacements. str_replace supports arrays for doing multiple substitutions at once. One line substitution, no loops.

For example:

$substitutions = array('{VAR}'=>file_get_contents('VAR.php'),
'{TEST}'=>file_get_contents('TEST.php'),
...
);

$outputContents = str_replace( array_keys($substitutions), $substitutions, $outputContents);
查看更多
Anthone
5楼-- · 2019-09-06 09:35

You'll need to do a number of things. I'm assuming you can do the legwork to get the page data you want to preprocess into a string.

  1. First, you'll need the regular expression to match correctly. That should be fairly easy with something like /{\w+}/.

  2. Next you'll need to use all of the flags to preg_match to get the offset location in the page data. This offset will let you divide the string into the before, matching, and after parts of the match.

  3. Once you have the 3 parts, you'll need to run your include, and stick them back together.

  4. Lather, rinse, repeat.

  5. Stop when you find no more variables.

This isn't terribly efficient, and there are probably better ways. You may wish to consider doing a preg_split instead, splitting on /[{}]/. No matter how you slice it you're assuming that you can trust your incoming data, and this will simplify the whole process a lot. To do this, I'd lay out the code like so:

  1. Take your content and split it like so: $parts = preg_split('/[{}]/', $page_string);

  2. Write a recursive function over the parts with the following criteria:

    • Halt when length of arg is < 3
    • Else, return a new array composed of
    • $arg[0] . load_data($arg[1]) . $arg[2]
    • plus whatever is left in $argv[3...]
  3. Run your function over $parts.

查看更多
Anthone
6楼-- · 2019-09-06 09:36

Orion above has a right solution, but it's not really necessary to use a callback function in your simple case.

Assuming that the filenames are A-Z + hyphens you can do it in 1 line using PHP's /e flag in the regex:

$str = preg_replace('/{([-A-Z]+)}/e', 'file_get_contents(\'$1.html\')', $str);

This'll replace any instance of {VAR} with the contents of VAR.html. You could prefix a path into the second term if you need to specify a particular directory.

There are the same vague security worries as outlined above, but I can't think of anything specific.

查看更多
登录 后发表回答