I am trying to download from this URL:
http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/eurusd/2014/2
with bash wget
.
But, I need to manually press the link to right of "Download Historical Data Here"
Is there a way to do this in code from command line?
EDIT 1
Or from java would be great too.
I think you will need to write some code to accomplish this, using a html client library that supports Javascript, such as PhantomJS
, as mentioned by the answers to this question.
Other options include Python's mechanize
library, and some of things mentioned in this answer.
If you're looking for a headless browsing library in Java
, I would take a look at HtmlUnit.
I have not used it personally though, so I can't vouch for its stability or ease or use.
You can't download it because the download is triggered through JavaScript.
Better you download it on your normal computer and than upload it to an other server which gives you direct access to the file by HTTP. Than you can download it in command line.
Since I wanted to learn PhantomJS
myself, I attempted it, but it seems that phantomjs
is not mature enough to support this correctly.
Since I had taken the time to understand how the link worked, here's a solution in php
instead, which you should be able to copy and paste into Download.php
and run from the command line, assuming you have php-cli
installed. I hope it will also be useful as a sample to people in the future trying to script this kind of thing.
<?php
/**
* Usage: php Download.php <URL> <FileName>
* Example:
* php Download.php http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/eurusd/2014/2 Output.zip
*/
// Configuration parameters
$post_url = 'http://www.histdata.com/get.php';
$init_url = $argv[1];
$filename = $argv[2];
$ch = curl_init ($init_url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 1);
$output = curl_exec ($ch);
// Pull out the cookies
preg_match('/^Set-Cookie:\s*([^;]*)/mi', $output, $m);
parse_str($m[1], $cookies);
// Get the POST parameters from the form.
$post_array = getPostArray($output);
$post_data = http_build_query($post_array);
$header = array();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";
$header[] = "Content-Type: application/x-www-form-urlencoded";
$ch = curl_init ($post_url);
curl_setopt ($ch, CURLOPT_COOKIE, http_build_query($cookies));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($ch, CURLOPT_REFERER, 'http://www.histdata.com/download-free-forex-historical-data/?/ascii/1-minute-bar-quotes/eurusd/2014/2/HISTDATA_COM_ASCII_EURUSD_M1_201402.zip');
$output = curl_exec ($ch);
$fp = fopen($filename,'wb') or die('Cannot open file for writing!'. $filename);
fwrite($fp, $output);
fclose($fp);
function getPostArray($doc) {
$dom_doc = new DOMDocument;
if (! @$dom_doc->loadhtml($doc))
{
die('Could not load html!');
}
else
{
$xpath = new DOMXpath($dom_doc);
foreach($xpath->query('//form[@name="file_down"]//input') as $input)
{
//get name and value of input
$input_name = $input->getAttribute('name');
$input_value = $input->getAttribute('value');
$post_items[$input_name] = $input_value;
}
return $post_items;
}
}
?>