downloading a file that comes as an attachment in

2019-01-03 10:03发布

I want to download a CSV file, it is generated on a button click through a POST request. I researched to my best on casperJs and phantomJS forums and returned empty handed. In a normal browser like firefox, a browser download dialog window appears after the post request. How to handle this case in PhantomJS

TTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/7.5
Content-disposition: attachment;filename=ExportData.csv
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Fri, 19 Apr 2013 23:26:40 GMT
Content-Length: 65183

4条回答
萌系小妹纸
2楼-- · 2019-01-03 10:29

I've found a way to do this using casperjs (it should work with phantomjs alone if you implement the download function using XMLHttpRequest, but i've not tried).

I'll leave you the working example, that tries to download the mos recent PDF from this page. When you click the download link, some javascript code is triggered that generates some hidden input fields that are then POSTed.

What we do is replace the form's onsubmit function so that it cancels the submission, and get the form destination (action) and all its fields. We use this information later to do the actual download.

var casper=require('casper').create();
casper.start("https://sede.gobcan.es/tributos/jsf/publico/notificaciones/comparecencia/ultimosanuncios.jsp", function() {

    var theFormRequest = this.page.evaluate(function() {
        var request = {}; 
        var formDom = document.forms["resultadoUltimasNotif"];
        formDom.onsubmit = function() {
            //iterate the form fields
            var data = {};
            for(var i = 0; i < formDom.elements.length; i++) {
               data[formDom.elements[i].name] = formDom.elements[i].value;
            }
            request.action = formDom.action;
            request.data = data;
            return false; //Stop form submission
        }

        //Trigger the click on the link.
        var link = $("table.listado tbody tr:first a");
        link.click();

        return request; //Return the form data to casper
    });

    //Start the download
    casper.download(theFormRequest.action, "downloaded_file.pdf", "POST", theFormRequest.data);
});

casper.run(); 

Note: you have to run it with --ignore-ssl-errors, as the CA they use isn't in your browser default CA list.

casperjs --ignore-ssl-errors=true downloadscript.js
查看更多
Fickle 薄情
3楼-- · 2019-01-03 10:30

You can listen to the page.resource.received event and download() the file when received:

casper.on('page.resource.received', function(resource) {
    if (resource.stage !== "end") {
        return;
    }
    if (resource.url.indexOf('ExportData.csv') > -1) {
        this.download(resource.url, 'ExportData.csv');
    }
});
查看更多
甜甜的少女心
4楼-- · 2019-01-03 10:31

@julianjm aproach is almost the solution, but in my case i did not have the correct form name to replace the form submission.

So i found another solution using phantomjs beta:

There is a beta version of phantomjs 2.0 that includes an event handler that solves this issue.

It is still a beta version, so there is no debugging.

So i have developed the clicks and the page treatments on the release version and then changed the phantom version to make download work.

 casper.start('http://www.website.com.br/', function() {
    this.page.onFileDownload = function(status){console.log('onFileDownload(' + status + ')'); 

//SYSTEM WILL DETECT THE DOWNLOAD, BUT YOU WILL HAVE TO NAME THE FILE BY YOURSLEF!!
return "ContactList_08-25-14.csv"; };

    });
      casper.then(function() {
        //DO YOUR STUFF HERE TO CLICK ON THE DOWNLOAD LINK. 
      });
    casper.run();

Download: Phantom 2.0 BETA

Download the exe, rename the release version of phantom.exe to phantom.bkp.exe and insert this 2.0 version on the place. Then, in casperjs you will need to add some lines at the beggining of casperjs/bin/bootstrap.js

 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.
 *
 */
var system = require('system');
    var argsdeprecated = system.args;
    argsdeprecated.shift();
    phantom.args = argsdeprecated;

also comment the version check (same file):

(function(version) {
        // required version check
      /*  if (version.major !== 1) {
            return __die('CasperJS needs PhantomJS v1.x');
        } if (version.minor < 8) {
            return __die('CasperJS needs at least PhantomJS v1.8 or later.');
        }
        if (version.minor === 8 && version.patch < 1) {
            return __die('CasperJS needs at least PhantomJS v1.8.1 or later.');
        } */
    })(phantom.version);

Remember, this is a tweak!!.

So this lines on bootstrap will cause problems if you want to run phantom release version or slimerjs.

So DEVELOP ON RELEASE VERSION, than tweak to this version to be able to download. If you need to debug, you will have to remove the lines of bootstrap.js

查看更多
Evening l夕情丶
5楼-- · 2019-01-03 10:34

I have to deal with a site written with some kind of ASP.Net framework which sends a remarkable amount of POST data at each request (some 100 Kb of data, of which about 95 never seem to change between requests - viewport state related apparently).

However, no method I could find worked for me. I've looked into intercepting XHR, I've even found someone who is tackling the very same framework (at least judging from the selectors) but with a simpler case, inspired by this very question. I found out that back in the day this couldn't be done with PhantomJS.

My problem is that a click on a button starts a chain of AJAX requests culminating with the sending of this enormous POST form, to which finally the server replies with a "Content-Disposition: attachment".

In the end, I found this approach which works for me, even if it is network-inefficient:

...setting up everything, until I just need to click on a button...

phantomData    = null;
phantomRequest = null;

// Here, I just recognize the form being submitted and copy it.

casper.on('resource.requested', function(requestData, request) {
    for (var h in requestData.headers) {
        if (requestData.headers[h].name === 'Content-Type') {
            if (requestData.headers[h].value === 'application/x-www-form-urlencoded') {
                phantomData         = requestData;
                phantomRequest      = request;
            }
        }
    }
});

// Here, I recognize when the request has FAILED because PhantomJS does
// not support straight downloading.

casper.on('resource.received', function(resource) {
    for (var h in resource.headers) {
        if (resource.headers[h].name === 'content-disposition') {
            if (resource.stage === 'end') {
                if (phantomData) {
                    // to do: get name from resource.headers[h].value
                    casper.download(
                        resource.url,
                        "output.pdf",
                        phantomData.method,
                        phantomData.postData
                    );
                } else {
                    // Something went wrong.
                }
                // Possibly, remove listeners?
            }
        }
    }
});

// Now, click on the button and initiate the dance.
casper.click(pdfLinkSelector);

The download works flawlessly, even if I can see that the file gets requested (and sent) twice.

[debug] [phantom] Navigation requested: url=https://somesite/SomePage.aspx, type=FormSubmitted, willNavigate=true, isMainFrame=true
[debug] [application] GOT FORM, REQUEST DATA SAVED
[warning] [phantom] Loading resource failed with status=fail (HTTP 200): https://somesite/SomePage.aspx
[debug] [application] END STAGE REACHED, PHANTOMDATA PRESENT
[debug] [application] ATTEMPTING CASPERJS.DOWNLOAD
[debug] [remote] sendAJAX(): Using HTTP method: 'POST'
[debug] [phantom] Downloaded and saved resource in output.pdf
[debug] [application] TERMINATING SUCCESSFULLY
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"

(Next, I'll probably modify the script to try invoking request.abort() from inside the resource.requested listener, set a semaphore and invoke again the downloader - I won't be able to get the attachment filename, but that matters little to me).

查看更多
登录 后发表回答