Extract hashtags from a string in Excel

2019-10-03 04:15发布

How can I extract multiple hashtags from a string in Excel? I played around with MID and SEARCH formulas but couldn't come up with anything good.

**Example input:**
Daniel Craig primed for action on SKYFALL (2012) and SPECTRE (2015). #007 #JamesBond #DanielCraig

**Example output:**
#007
#JamesBond
#DanielCraig

标签: excel
3条回答
一夜七次
2楼-- · 2019-10-03 04:37

Well, here are two examples showing mid() with find().

Expand as you wish, you may want to check out iferror()

I suggest you take them apart to see and understand what is happening.

enter image description here

查看更多
兄弟一词,经得起流年.
3楼-- · 2019-10-03 04:47

I found a way of doing this, but I needed to add an excel add-on. (credit and instructions to add go to this website http://blog.malcolmp.com/2010/regular-expressions-excel-add-in ) If you add this add-on, then the formula you'd need is =xMATCHALL("#[\S]+",A1,FALSE)

image is from excel, showing this in action

I've added some variables into your example to show that it picks just the words immediately following the hash tag

查看更多
Melony?
4楼-- · 2019-10-03 04:50

If you have Excel 2013+ with the FILTERXML function you can:

  • convert the string into an XML, using the spaces for the different nodes
    • "<t><s>" & SUBSTITUTE(A$1," ","</s><s>") & "</s></t>"
  • use an Xpath to extract the nodes containing the #
    • "//s[contains(.,'#')]
  • in the formula, [" & ROWS($1:1) & "]") becomes a position argument in the xpath so it will sequentially return the first, second, ...nth node that matches the condition.
  • The IFERROR is to blank out the result if you fill down more than there are hashtags.

=IFERROR(FILTERXML("<t><s>" & SUBSTITUTE(A$1," ","</s><s>") & "</s></t>","//s[contains(.,'#')][" & ROWS($1:1) & "]"),"")

In the example, I placed the formula in A3 and filled down five rows.

enter image description here

查看更多
登录 后发表回答