I don't have internet explorer on any of the computers at work, therefore creating a object of internet explorer and using ie.navigate to parse the html and search for the tags isn't possible. My question is, how can I pull certain data with a tag automatically from a frame source to my spreadsheet without using IE? Example of code in answers would be very useful :) Thanks
相关问题
- Views base64 encoded blob in HTML with PHP
- How to fix IE ClearType + jQuery opacity problem i
- Is there a way to play audio on a mobile browser w
- HTML form is not sending $_POST values
- implementing html5 drag and drop photos with knock
Anyone who has done some web scraping will be familiar with creating an instance of Internet Explorer (IE) and the navigating to a web address and then once the page is ready start navigating the DOM using the 'Microsoft HTML Object Library' (MSHTML) type library. The question asks if IE is unavailable what to do. I am in the same situation for my box running Windows 10.
I had suspected it was possible to spin up an instance of MSHTML.HTMLDocument independent of IE but its creation is not obvious. Thanks to the questioner for asking this now. The answer lies in the MSHTML.IHTMLDocument4.createDocumentFromUrl method. One needs a local file to work (EDIT: actually one can put a webby url in as well!) with but we have a nice tidy Windows API function called URLDownloadToFile to download a file.
This codes runs on my Windows 10 box where Microsoft Edge is running and not Internet Explorer. This is an important find and thanks to the questioner for raising it.
Thanks for asking this.
P.S. I have used TaskList to verify that IExplore.exe is not created under the hoods when this code runs.
P.P.S If you liked this then see more at my Excel Development Platform blog
You could use XMLHTTP to retrieve the HTML source of a web page:
I wouldn't suggest using this as a worksheet function, or else the site URL will be re-queried every time the worksheet recalculates. Some sites have logic in place to detect scraping via frequent, repeated calls, and your IP could become banned, temporarily or permanently, depending on the site.
Once you have the source HTML string (preferably stored in a variable to avoid unnecessary repeat calls), you can use basic text functions to parse the string to search for your tag.
This basic function will return the value between the
<tag>
and</tag>
:This will find only basic
<tag>
's but you could add/remove/change the text functions to suit your needs.Example Usage: