Automate daily csv file download from website butt

2019-02-19 03:12发布

问题:

I would like to automate the process of visiting a website, clicking a button, and saving the file. The only way to download the file on this site is to click a button. You can't navigate to the file using a url.

I have been trying to use phantomjs and casperjs to automate this process, but haven't had any success.

I recently tried to use brandon's solution here Grab the resource contents in CasperJS or PhantomJS

Here is my code for that

var fs = require('fs');
var cache = require('./cache');
var mimetype = require('./mimetype');
var casper = require('casper').create();

casper.start('http://www.example.com/page_with_download_button', function() {

});

casper.then(function() {    
     this.click('#download_button');
 });

 casper.on('resource.received', function (resource) {
     "use strict";
    for(i=0;i < resource.headers.length; i++){
        if(resource.headers[i]["name"] == "Content-Type" && resource.headers[i]["value"] == "text/csv; charset-UTF-8;"){
            cache.includeResource(resource);
        }
    }
 });

 casper.on('load.finished', function(status) {
    for(i=0; i< cache.cachedResources.length; i++){
        var file = cache.cachedResources[i].cacheFileNoPath;
        var ext = mimetype.ext[cache.cachedResources[index].mimetype];
        var finalFile = file.replace("."+cache.cacheExtension,"."+ext);
        fs.write('downloads/'+finalFile,cache.cachedResources[i].getContents(),'b');
    }
});

casper.run();

I think the problem could be caused by my cachePath being incorrect in cache.js

exports.cachePath = 'C:/Users/username/AppData/Local/Ofi Labs/PhantomJS';

Should I be using something in adition to the backslashes to define the path?

When I try

 casperjs --disk-cache=true export_script.js

Nothing is downloaded. After a little debugging I have found that cache.cachedResources is always empty.

I would also be open to solutions outside of phantomjs/casperjs.


UPDATE

I am not longer trying to accomplish this with CasperJS/PhantomJS. I am using the chrome extension Tampermonkey suggested by dandavis. Tampermonkey was extremely easy to figure out. I installed Tampermonkey, navigated to the page with the download link, and then clicked New Script under tampermonkey and added my javascript code.

document.getElementById("download_button").click();

Now every time I navigate to the page in my browser, the file is downloaded. I then created a batch script that looks like this

set date=%DATE:~10,4%_%DATE:~4,2%_%DATE:~7,2%
chrome "http://www.example.com/page-with-dl-button"
timeout 10
move "C:\Users\user\Downloads\export.csv" "C:\path\to\dir\export_%date%.csv"

I set that batch script to run nightly using the windows task scheduler.

Success!

回答1:

Your button most likely issues a POST request to the server. In order to track it:

  1. Open Network tab in Chrome developer tools
  2. Navigate to the page and hit the button.
  3. Notice which request led to file download. Right click on it and copy as cURL
  4. Run copied cURL

Once you have cURL working you can schedule downloads using cron or Task Scheduler depending on operation system you are using.