I have HTML code in an Excel cell.
How to count all words which aren't between <
and >
Alternatively, how can I count all words between >
and <
I want to get a number of words, which aren't HTML tags, classes, attributes etc. I Have i.e. in A1 code like:
<div class="x" style="padding:0px;">Lorem <strong>ipsum</strong> <i>bla</i> <a href="#" title="xmix" target="_blank">opana</a> test <span>word</span> bla</div>
After calculation in B1 i want to get 7, as a number of clean words Lorem ipsum bla opana test word bla
Well I like a challenge so this is a formula solution to the problem. This version only accepts <, > and space as delimiters: it would not be too difficult having got this far to deal with commas, full stops etc.
The algorithm is
If current character is <, extract all characters up to next >
If current character is " ", extract all characters up to next non-space
If current character is any other character, extract all characters up to next space or <.
Then count number of strings starting with an alpha character.
The formula is
=IF(LEFT(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)),1)="<",
MID(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)),1,FIND(">",MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1))&">")),
IF(LEFT(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)),1)=" ",
MID(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)),1,FIND(LEFT(TRIM(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)))),MID($A$1,SUMPRODUCT(LEN($C$1:C1))+2
,LEN($A$1)))),
MID(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)),1,AGGREGATE(15,6,SEARCH({" ","<"},MID($A$1,SUMPRODUCT(LEN($C$1:C1))+2,LEN($A$1))&" "),1))))
and to get the counts just
=COUNTIF(C2:C100,">=a")
The second helper column (column D) is not necessary but is included as a check.