What is the best way to select all the text between 2 tags - ex: the text between all the 'pre' tags on the page.
相关问题
- Views base64 encoded blob in HTML with PHP
- Is there a way to play audio on a mobile browser w
- HTML form is not sending $_POST values
- implementing html5 drag and drop photos with knock
-
Why does the box-shadow property not apply to a
This is what I would use.
Basically what it does is:
(?<=(<pre>))
Selection have to be prepend with<pre>
tag(\w|\d|\n|[().,\-:;@#$%^&*\[\]"'+–/\/®°⁰!?{}|~]| )
This is just a regular expression I want to apply. In this case, it selects letter or digit or newline character or some special characters listed in the example in the square brackets. The pipe character|
simply means "OR".+?
Plus character states to select one or more of the above - order does not matter. Question mark changes the default behavior from 'greedy' to 'ungreedy'.(?=(</pre>))
Selection have to be appended by the</pre>
tagDepending on your use case you might need to add some modifiers like (i or m)
Here I performed this search in Sublime Text so I did not have to use modifiers in my regex.
Javascript does not support lookbehind
The above example should work fine with languages such as PHP, Perl, Java ... Javascript, however, does not support lookbehind so we have to forget about using
(?<=(<pre>))
and look for some kind of workaround. Perhaps simple strip the first four chars from our result for each selection like in here Regex match text between tagsAlso look at the JAVASCRIPT REGEX DOCUMENTATION for non-capturing parentheses
You can use
Pattern pattern = Pattern.compile( "[^<'tagname'/>]" );
use the below pattern to get content between element. Replace [tag] with the actual element you wish to extract the content from.
Sometime tags will have attributes, like
anchor
tag havinghref
, then use the below pattern.You shouldn't be trying to parse html with regexes see this question and how it turned out.
In the simplest terms, html is not a regular language so you can't fully parse is with regular expressions.
Having said that you can parse subsets of html when there are no similar tags nested. So as long as anything between and is not that tag itself, this will work:
A better idea is to use a parser, like the native DOMDocument, to load your html, then select your tag and get the inner html which might look something like this:
And since this is a proper parser it will be able to handle nesting tags etc.
For multiple lines:
Tag can be completed in another line. This is why
\n
needs to be added.