I want to automate the download of a CSV file "Projects.csv" from this website:
https://www.vcsprojectdatabase.org/#/projects/st_/c_/ss_0/so_/di_/np_
The CSV can be downloaded manually by clicking the CSV icon but i'm not sure how I can automate this download in python and store the CSV file locally on my drive.
So far I have tried inspecting the button element via chrome developer console to find the correct url in the Network tab like so?:
'https://www.vcsprojectdatabase.org/services/publicViewServices/fetchProjectsExport'
But i'm unsure if this URL should include the file name at the end like so:
'https://www.vcsprojectdatabase.org/services/publicViewServices/fetchProjectsExport/Projects.csv'
This is what I have tried but it just writes a blank file:
import requests
url = 'https://www.vcsprojectdatabase.org/services/publicViewServices/fetchProjectsExport/Projects.csv'
r = requests.get(url)
with open('a.csv', 'wb') as f:
f.write(r.content)
How do I get the CSV file to properly download and save?
First of all, you should understand that HTTP protocol based on requests. Final result of JavaScript execution will be formed HTTP request which let server respond with file content. You need to "reverse" web page, find how to create proper request and repeat it as similar as it can be done.
So, let's try to do this step by step:
frmDownload
. So, go back to "Inspector" tab and type this id into search box.Now we found that this element is HTML form. This form send POST request to URL
https://www.vcsprojectdatabase.org/services/publicViewServices/fetchProjectsExport
with next data:This information is enough to try repeat this request in Python.
Let's write small script which form and send same request and save result into .csv file:
Launch it and it ... works.
res.csv
contains proper result.BUT THAT'S NOT ALL. Usually everything is not so easy. To let our request look same as sent by browser we should take a look on request headers. To capture HTTP request from browser we can open "Network" tab:
Now let's press download button on web page and download csv file. In requests table now we can see our post request. Click on it and take a look on "Headers" tab into "Request headers" section.
There's Cookie header, which mostly in such as requests is not important and can be missed. But if you have some issues with request you should take a look on previous requests, find request with
Set-Cookie
header in server response and repeat it.Let's improve our script and copy important (Host, Content-Length, Connection we don't include, cause Python requests module will add them automatically; DNT and Upgrade-Insecure-Requests are not necessary at all) headers from browser.
P.S. Don't forget to ask website owner for permission
The request type is POST and you need to specify the appropriate headers, which you can see in the developer console. Also, you would not to provide the form data. The below code works.