Count words in cell, don't count HTML tags

2019-09-19 13:45发布

问题:

I have HTML code in an Excel cell.

How to count all words which aren't between < and >

Alternatively, how can I count all words between > and <

I want to get a number of words, which aren't HTML tags, classes, attributes etc. I Have i.e. in A1 code like:

<div class="x" style="padding:0px;">Lorem <strong>ipsum</strong> <i>bla</i> <a href="#" title="xmix" target="_blank">opana</a> test <span>word</span> bla</div>

After calculation in B1 i want to get 7, as a number of clean words Lorem ipsum bla opana test word bla

回答1:

Well I like a challenge so this is a formula solution to the problem. This version only accepts <, > and space as delimiters: it would not be too difficult having got this far to deal with commas, full stops etc.

The algorithm is

If current character is <, extract all characters up to next >
If current character is " ", extract all characters up to next non-space
If current character is any other character, extract all characters up to next space or <.

Then count number of strings starting with an alpha character.

The formula is

=IF(LEFT(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)),1)="<",
MID(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)),1,FIND(">",MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1))&">")),
IF(LEFT(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)),1)=" ",
MID(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)),1,FIND(LEFT(TRIM(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)))),MID($A$1,SUMPRODUCT(LEN($C$1:C1))+2

,LEN($A$1)))), MID(MID($A$1,SUMPRODUCT(LEN($C$1:C1))+1,LEN($A$1)),1,AGGREGATE(15,6,SEARCH({" ","<"},MID($A$1,SUMPRODUCT(LEN($C$1:C1))+2,LEN($A$1))&" "),1))))

and to get the counts just

=COUNTIF(C2:C100,">=a")

The second helper column (column D) is not necessary but is included as a check.