I have this source of text which contains HTML tags and PHP code at the same time:
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
<body>
<h1 <?php echo "class='big'" ?>>foo</h1>
</body>
</html>
and I need place my own text (for example: MY_TEXT) after opened tag and get this result:
<html>
<head>
<title><?php echo "title here"; ?></title>
<head>
<body>
<h1 <?php echo "class='big'" ?>>MY_TEXTfoo</h1>
</body>
</html>
thus I need consider nested braces
if I will use regex it will creates problems (I need consider any level of nested braces). I need another strategy.
now my idea try to use pyparsing, but I can't get it now, too complicated for my current level
could anybody make solution please?
Pyparsing has a helper method called
nestedExpr
that makes it easy to match strings of nested open/close delimiters. Since you have nested PHP tags within your<h1>
tag, then I would use anestedExpr
like:However, this will match every tag in your input HTML source:
gives:
You want to match only those tags whose opening text is 'h1'. We can add a condition to an expression in pyparsing using
addCondition
:Now we will match only the desired tag. Just a few more steps...
First of all,
nestedExpr
gives back nested lists of matched items. We want the original text that was matched. Pyparsing includes another helper for that, unimaginatively namedoriginalTextFor
- we combine this with the previous definition to get:Lastly, we have to add one more parse-time callback action, to append "MY_TEXT" to the tag:
Now that we are able to match the desired
<h1>
tag, we can usetransformString
method of the expression to do the search-and-replace work for us:With your original sample saved as a variable named
html
, we get:Note: this will add "MY_TEXT" after every
<h1>
tag. If you want this to be applied only after<h1>
tags containing PHP, then write the appropriate condition and add it tonested_angle_braces_with_h1
.