可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to get data from FlightRadar24 using the script below, based on this answer to handle cookies. When I currently type that url into a browser, I get a nice long json or dictionary including a list of lat/long/alt updates. But when I try the code below, I get the error message listed below.

What do I need to do to successfully read the json into python?

NOTE: that link may stop working in a week or two - they don't make the data available forever.

import urllib2 
import cookielib

jar = cookielib.FileCookieJar("cookies")
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
url = "http://lhr.data.fr24.com/_external/planedata_json.1.3.php?f=72c5ef5"

response = opener.open(url)
print response.headers
print "Got page"
print "Currently have %d cookies" % len(jar)
print jar

Traceback (most recent call last): File "[mypath]/test v00.py", line 8, in response = opener.open(link) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 403: Forbidden

回答1:

I am not sure what you need cookies for, but the issue is that the webserver is blocking access to the user-agent being sent by urllib in the request header (which is something like - 'Python-urllib/2.7' or so) .

You should add a valid browser User-agent to the header to get the correct data. Example -

import urllib2
url = "http://lhr.data.fr24.com/_external/planedata_json.1.3.php?f=72c5ef5"
req = urllib2.Request(url, headers={"Connection":"keep-alive", "User-Agent":"Mozilla/5.0"})
response = urllib2.urlopen(req)
jsondata = response.read()

回答2:

The first Answer by @AnandSKumar is the accepted answer but here are a few more lines that are helpful, since the jsondata = response.read() returns a string.

NOTE: that link may stop working in a week or two - they don't make the data available forever.

import urllib2
import json
import numpy as np
import matplotlib.pyplot as plt

# FROM this question: https://stackoverflow.com/a/32163003
# and THIS ANSWER: https://stackoverflow.com/a/32163003/3904031
# and a little from here: https://stackoverflow.com/a/6826511

url        = "http://lhr.data.fr24.com/_external/planedata_json.1.3.php?f=72c5ef5"

req        = urllib2.Request(url, headers={"Connection":"keep-alive", "User-Agent":"Mozilla/5.0"})

response   = urllib2.urlopen(req)

the_dict   = json.loads(response.read())

trail      = the_dict['trail']

trailarray = np.array(trail)


s0, s1 = len(trailarray)/3, 3

lat, lon, alt = trailarray[:s0*s1].reshape(s0,s1).T

alt *= 10.  # they drop the last zero


# plot raw data of the trail. Note there are gaps - no time information here

plt.figure()

plt.subplot(2,2,1)

plt.plot(lat)
plt.hold
plt.plot(lon)
plt.title('raw lat lon')

plt.subplot(2,2,3)
plt.plot(alt)
plt.title('raw alt')

plt.subplot(1,2,2)
plt.plot(lon, lat)
plt.title('raw lat vs lon')
plt.text(-40, 46, "this segment is")
plt.text(-40, 45.5, "transatlantic")
plt.text(-40, 45, "gap in data")

plt.savefig('raw lat lon alt')
plt.show()

To convert the time and date info to human form:

def humanize(seconds_since_epoch):
    """ from https://stackoverflow.com/a/15953715/3904031 """
    return datetime.datetime.fromtimestamp(seconds_since_epoch).strftime('%Y-%m-%d %H:%M:%S')

import datetime
humanize(the_dict['arrival'])

returns