How to wget from URL that needs a key press

2019-08-11 03:20发布

问题:

I am trying to download from this URL:

http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/eurusd/2014/2

with bash wget.

But, I need to manually press the link to right of "Download Historical Data Here"

Is there a way to do this in code from command line?

EDIT 1

Or from java would be great too.

回答1:

I think you will need to write some code to accomplish this, using a html client library that supports Javascript, such as PhantomJS, as mentioned by the answers to this question.

Other options include Python's mechanize library, and some of things mentioned in this answer.

If you're looking for a headless browsing library in Java, I would take a look at HtmlUnit. I have not used it personally though, so I can't vouch for its stability or ease or use.



回答2:

You can't download it because the download is triggered through JavaScript. Better you download it on your normal computer and than upload it to an other server which gives you direct access to the file by HTTP. Than you can download it in command line.



回答3:

Since I wanted to learn PhantomJS myself, I attempted it, but it seems that phantomjs is not mature enough to support this correctly. Since I had taken the time to understand how the link worked, here's a solution in php instead, which you should be able to copy and paste into Download.php and run from the command line, assuming you have php-cli installed. I hope it will also be useful as a sample to people in the future trying to script this kind of thing.

<?php

/**
  * Usage: php Download.php <URL> <FileName>
  * Example: 
  * php Download.php http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/eurusd/2014/2 Output.zip
  */

// Configuration parameters
$post_url = 'http://www.histdata.com/get.php';
$init_url = $argv[1];
$filename = $argv[2];

$ch = curl_init ($init_url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 1);

$output = curl_exec ($ch);

// Pull out the cookies
preg_match('/^Set-Cookie:\s*([^;]*)/mi', $output, $m);
parse_str($m[1], $cookies);

// Get the POST parameters from the form.
$post_array = getPostArray($output);
$post_data = http_build_query($post_array);

$header = array();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";
$header[] = "Content-Type: application/x-www-form-urlencoded";

$ch = curl_init ($post_url);
curl_setopt ($ch, CURLOPT_COOKIE, http_build_query($cookies)); 
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate'); 
curl_setopt($ch, CURLOPT_REFERER, 'http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/eurusd/2014/2/HISTDATA_COM_ASCII_EURUSD_M1_201402.zip'); 

$output = curl_exec ($ch);
$fp = fopen($filename,'wb') or die('Cannot open file for writing!'. $filename);
fwrite($fp, $output);
fclose($fp);

function getPostArray($doc) {
    $dom_doc = new DOMDocument;
    if (! @$dom_doc->loadhtml($doc))
    {
        die('Could not load html!');
    }
    else
    {
        $xpath = new DOMXpath($dom_doc);

        foreach($xpath->query('//form[@name="file_down"]//input') as $input)
        {
            //get name and value of input
            $input_name = $input->getAttribute('name');
            $input_value = $input->getAttribute('value');
            $post_items[$input_name] = $input_value;
        }
        return $post_items;
    }
}
?>


标签: java bash wget