Saving a .php file and saving the includes too (po

2019-04-13 14:26发布

问题:

The setup:

I have a standard .php file (index.php) that contains two includes, one for header (header.php) and one for footer (footer.php). The index.php file looks like this:

index.php

<?php
include header.php;
?>

<h2>Hello</h2>
<p class="editable">Lorem ipsum dolar doo dah day</p>

<?php
include footer.php;
?>

header.php like this:

<html>
<head>
<title>This is my page</title>
</head>
<body>
<h1 class="editable">My Website rocks</h1>

and footer .php like this:

<p>The end of my page</p>
</body>

I am writing a PHP script that allows you to edit any of the ".editable" items on a page. My problem is that these editable regions could appear in any included files as well as the main body of index.php.

My php code is grabbing the index.php file with file_get_contents(); which works well. I am also able to edit and save any ".editable" regions in index.php.

My issue:

I have been unable to find a way of "finding" the includes and parse through those for ".editable" regions as well. I am looking for suggestions on how I would work through all the includes in index.php - checking them for editable regions. Would I need to use regular expressions to find "include *.php"? I am unsure of where to even start...

For those of you who may wish to see my PHP code. I am making use of the PHP class: [link text][1] which allows me to write code like:

// load the class and file
$html = new simple_html_dom();
$html->load_file("index.php");

// find the first editable area and change its content to "edited"  
$html->find('*[class*=editable]', 0)->innertext = "Edited";

// save the file
$html->save(index.php);

[1]: http://simplehtmldom.sourceforge.net/manual_api.htm simple php dom parser


UPDATE

I have been playing around with regular expressions to try and match the includes. I am pretty rubbish at regex but I think I am getting close. Here is what I have so far:

$findinclude = '/(?:include|include_once|require|require_once)\s*(?:[a-z]|"|\(|\)|\'|_|\.|\s|\/)*(?=(?:[^\<\?]|[^\?\>])*\?>)/i';

This matches fairly well although it does seem to return the odd ) and ' when using preg_match. I am trying to add a bit of security into the regex to ensure it only matches between php tags - this part: (?=(?:[^\<\?]|[^\?>])*\?>) - but it only returns the first include on a page. Any tips on how to improve this regular expression? (I have been at it for about 6 hours)

回答1:

What type of system are you creating?

If it's going to be used by the public, you'd have serious security concerns. People could include their own PHP code or JavaScript in the supplied content.

This isn't the standard way at all to create dynamic content. For most purposes, you'd want to create a single template, and then allow users to save their changes into a database. You'd then fill in the info into the template from the database for display.

If you allow them to include HTML use something like html purifier to clean it up, insert the data into your database with a prepared statement using PDO. I'm sure people here would be happy to answer any questions you may have about using a database.



回答2:

I've misunderstood you, disregard everything after the hr.

To do what you want I guess the simplest way is to present the page to the browser, build some kind of javascript that finds and edits editable areas and submit that to a PHP file via AJAX.

The PHP file would then receive the content and the place where it should change the content, I still don't understand very well how the static CMS do it, but there are some open source projects, check here and here. I suggest you study their code to find out how they do it.


That's really simple, instead of incluiding the file like this:

file_get_contents('/path/to/file.php');

You have to do it like this:

file_get_contents('http://your-host.com/path/to/file.php');

Also, take a look at QueryPath, seems to be a lot better than SimpleHTMLDom.



回答3:

Based on the regex you provided, I've optimized it a bit and fixed some crucial bugs:

~<[?].*?(?:include|require(?:_once)?)\s*?(?:[(]?['"])(.+?)(?:['"][)]?)\s*?;.*?(?:[?]>)?~is

And in preg_match_all():

preg_match_all('~<[?].*?(?:include|require(?:_once)?)\s*?(?:[(]?[\'"])(.+?)(?:[\'"][)]?)\s*?;.*?(?:[?]>)?~is', $html, $includes);

It should match filenames with numbers, digits, dashes, underscores, slashes, spaces, dots and so on.

Also, the filename is stored in reference #1 and the ending PHP tag is optional.

It's worth mentioning that the token_get_all() function is much more reliable than regular expressions.



回答4:

If users can submit content into these and then they get included into a PHP file, then you are in some serious trouble.

You should have simple templates that have little or no PHP in them, which get parsed -- then and only then should you insert content into the DOM, after it has been properly sanitized.

The way to resolve your 'finding the includes' issue -- you don't need to, PHP does that for you -- maybe use ob_start et al. and then include the template file. Then grab the buffer contents (which will be HTML) and then parse the already assembled template with the DOM parser.

Please, please PLEASE make sure that you sanitize whatever you are injecting into the DOM.

Otherwise, tyranny and destruction are certain to rain down upon your web site (and you, depending on what else is on your server).



回答5:

You need to just store the user-inputted text somewhere and load it into, and output it with, your PHP template.

I'd look into learning to use a database. There is nothing heavy-weight or slow about it, and really, this is what they're for. If you don't want to use a database, you can use files instead. I'd suggest storing the data in the file in JSON format to give it some structure.

Here's a very simple system to use files to store and retrieve JSON encoded data.

Make an array of what you want to save after editing

$user_data=array('title'=>$user_supplied_info,'content'=>$user_supplied_words);
$json_data=json_encode($user_data);
file_put_contents('path_to/user_data/thisuser',$json_data);

Then when it's time to display the page

<?php
$user_data=array('title'=>'My page rocks!','content'=>'lorems ipso diddy doo dah');

$file_data=file_get_contents('path_to/user_data/thisuser');
if(!$user_data){$no_data=true;}//file not found
$data_array=json_decode($user_data,true);
if(!is_array($data_array))
  { $no_data=true; }//maybe the json could not be parsed
else
  { $user_data=array_merge($user_data,$data_array); }
?>
<html>
<head>
<title>This is my page</title>
</head>
<body>
<h1 class="editable"><?php echo $user_data['title']?></h1>

And so on. The defaults array holds the standard content for editable sections, which are printed if the user has not supplied any. If they have, it's loaded, and then merged with the default array. The data loaded from a file will overwrite the default array's info, if available, in array_merge part.



回答6:

Ok, I finally worked it out. If anyone is looking to find any include, include_once, require, require_once in a .php file then you can use the following regular expression with a php function like preg_match_all.

'/(?:include|include_once|require|require_once)\s*(?:[a-z]|"|\(|\)|\'|_|\.|\s|\/)*(?=(?:[^\<\?])*\?>)/i';

This looks for any includes etc within tags. Referencing this back to my original example. My code looks like this:

$html = new simple_html_dom();
$html->load_file("index.php");

$findinclude = '/(?:include|include_once|require|require_once)\s*(?:[a-z]|"|\(|\)|\'|_|\.|\s|\/)*(?=(?:[^\<\?])*\?>)/i';

if (preg_match_all($findinclude, $html,$includes)):

    // shift the array to the left
    $incfiles = $includes[0];
    $i = 0;

    // then loop through the includes array and print our filename
    foreach ($incfiles as $inc) {
       print basename(preg_replace('/[^a-zA-Z0-9\s\.\_\/]/', '', $inc)."\n");
    }
endif;

Job done! I can now work through this to edit each file as required.