How to load HTML data into SQL Server (non-table f

2019-08-31 14:45发布

问题:

I'm posting it here because I couldn't' find any such scenario on the web so far. I have a webpage which contains a set of reports both in XLS and PDF formats. I should be downloading the excel files from the page and load into my database. I wish I could use the URL for XLS file directly but the problem is the naming convention may keep changing every time (Sales_Quarter1.xlsx can be Sales_Q1.xlsx the next year). The only thing that would be constant in the following example is "Sales for Calendar Year". I should be looking up for the file that corresponds to this text and download it before loading it into database table.

I would like to know from experts if this would be possible?

<li>
   <sub>Sales for Calendar Year 2015--All Countries&#160;</sub> 
   <a href="/Data/Downloads/Documents/Sales/Sales_Quarter1.xlsx"> 
   <sub>[XLS]</sub></a><sub>&#160;, <a href="/Data/Downloads/Documents/Sales/Sales_Quarter1.pdf"><sub>[PDF]</sub></a><sub>​</sub></sub>
</li>

PS: I am using SQL Server 2014.

Thanks!

回答1:

Have a look at Integration Services. Create a package for both pulling the web page using a script task, along with a variable name that will represent your downloaded, local filenames for the html file and excel files (you will also have to parse the link out of the html file). Then utilize an Excel Source next in your package.

The variable name for the excel file used in the script task will need to be set to ReadWrite as well.

You can also schedule the resulting package execution via SQL Agent job, if you plan to run this on a reoccurring basis, placing logic into the script or the execution paths,