I have the following - really messy looking - data in my excel cell, which consists of around 100 rows of HTML Tags:
Find below two examples:
<ul class=""list-unstyled"">
<li><span title=""Website"" class=""glyphicon glyphicon-link text-gray""></span> <a href=""https://google.org/"" target=""_blank"">Website</a></li>
<li><span title=""Website"" class=""glyphicon glyphicon-link text-gray""></span> <a href=""https://www.google.com/"" target=""_blank"">Website 2</a></li>
<li><span title=""Product"" class=""glyphicon glyphicon-search text-gray""></span> <a href=""http://amazon.com"" target=""_blank"">Product</a></li>
<li><span title=""Product"" class=""glyphicon glyphicon-search text-gray""></span> <a href=""https://amazon.de/"" target=""_blank"">Product 2</a></li>
<li><span title=""Tags"" class=""glyphicon glyphicon glyphicon-tag text-gray""></span>
<small><span class=""label label-warning"">Available</span></small>
<small><span class=""label label-warning"">Country</span></small>
</li>
</ul>
or
<ul class=""list-unstyled"">
<li><span title=""Website"" class=""glyphicon glyphicon-link text-gray""></span> <a href=""https://google.org/"" target=""_blank"">Website</a></li>
<li><span title=""Website"" class=""glyphicon glyphicon-link text-gray""></span> <a href=""https://www.google.com/"" target=""_blank"">Website 2</a></li>
<li><span title=""Product"" class=""glyphicon glyphicon-search text-gray""></span> <a href=""http://amazon.com"" target=""_blank"">Product</a></li>
<li><span title=""Tags"" class=""glyphicon glyphicon glyphicon-tag text-gray""></span>
<small><span class=""label label-warning"">Not Available</span></small>
<small><span class=""label label-warning"">State</span></small>
</li>
</ul>
My goal is to create a table that looks like the following:
| Website 1 | Website 2 | Website 3 | Product 1 | Product 2 | Product 3 | Available | Country |
|---------------------|---------------------|-----------|-------------------|--------------------|-----------|---------------|---------|
| https://google.org/ | https://google.com/ | | http://amazon.com | https://amazon.de/ | | Available | Country |
| https://google.org/ | https://google.com/ | | http://amazon.com | | | Not Available | State |
I honestly have no clue how to approach this challenge.
Any suggestions from your side?
The approach is: create function, that will take
HTML
code as string as a parameter and will return dictionary with keys same as your table headers. The body of a function is:So, to sum up, the function returns dictionary with keys
"Website 1"
,"Website 2"
,"Website 3"
,"Product 1"
,"Product 2"
,"Product 3"
,"Available"
,"Country"
.Now, having that function, it's easy to fill the table you want. Here's one way of doing it:
It works with table arranged like that (already filled):
It's very important to have these headers in column.
Important
Before running anything, in your VBA editor go to: Tools -> References, and in the window that will pop-up you need to select Microsoft Scripting Runtime.
I have written a VB script to parse the xml data.
Assumptions:-
Your XML data is stored in COlumn A from Second Row (First row is header)
Xml data is parsed and stored in the adjacent columns. For xml data in A2, the parsed rows are stored in B2:H2
Nodes Website, Website2, Website3, Product1, Product2, Available and Country can only be derived from this code.
If you want to add more nodes in future, Create a replica of the if loop for the new node
Assuming that your data is in Cell A2 and you are applying formula in Cell B2 for Websites you can use below formula.
Copy down and across.
And for Products in cell E2
Benefit: It uses native Excel functions so there's no need for VBA. And it is non-array formula i.e. doesn't need CTRL+SHIFT+ENTER.
Disadvantage: Formula is complicated and may prove difficult to maintain.
I have uploaded sample file on Dropbox for your ease of understanding and implementation.
Drop Box Link to sample file
I have a few ideas:
If you do this in all VBA ( don't use any libraries), you could search the html as string & look for the
<a> & </a>
tags. Once you pull out substrings that have this:<a href=""https://google.org/"" target=""_blank"">Website</a>
then you do more substring searches for the href & url.Another option is to use regular expressions. It looks like the VBA script DLL has regex capabilities, you can look at that.
One final option is the HTML agility pack. That is designed for working with HTML. I used it from a .net project once. I don't remember the details now, but I remember it being straightforward to work with.
You can do this in Excel, if that's what you're looking for.
First, use Text to Columns to parse the data.
Hope this is what you were looking for