I'd just written a Python crawler to download midi files from freemidi.org. Looking at the request headers in Chrome, I found that the "Referer" attribute had to be https://freemidi.org/download-20225 (referred to as "download-20225" later) if the download page was https://freemidi.org/getter-20225 (referred to as "getter-20225" later) in order to download the midi file properly. I did so in Python, setting the header like this:
headers = {
'Referer': 'https://freemidi.org/download-20225',
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
}
which was exactly the same as the request header I had viewed in Chrome, and I tried to download the file using this line of code.
midi = requests.get(url, headers=headers).content
However, it did not work properly. Instead of downloading the midi file, it downloaded a html file of the site "download-20225". I later found that if I tried to access the site "getter-20225" directly, it takes me to "download-20225" as well. I think it probably indicates that the header was wrong, so it took me to the other website instead of starting the download.
I'm quite new to writing Python crawlers, so could someone help me find what went wrong with the program?