可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I wish to know, for a given predicted commute journey duration in minutes, the range of actual commute times I might expect. For example, if Google Maps predicts my commute to be 20 minutes, what is the minimum and maximum commute I should expect (perhaps a 95% range)?

Let's import my data into pandas:

%matplotlib inline
import pandas as pd

commutes = pd.read_csv('https://raw.githubusercontent.com/blokeley/commutes/master/commutes.csv')
commutes.tail()

This gives:

We can create a plot easily which shows the scatter of raw data, a regression curve, and the 95% confidence interval on that curve:

import seaborn as sns

# Create a linear model plot
sns.lmplot('prediction', 'duration', commutes);

How do I now calculate and plot the 95% range of actual commute times versus predicted times?

Put another way, if Google Maps predicts my commute to take 20 minutes, it looks like it could actually take anywhere between something like 14 and 28 minutes. It would be great to calculate or plot this range.

Thanks in advance for any help.

回答1:

The relationship between actual duration of a commute and the prediction should be linear, so I can use quantile regression:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf

# Import data and print the last few rows
commutes = pd.read_csv('https://raw.githubusercontent.com/blokeley/commutes/master/commutes.csv')

# Create the quantile regression model
model = smf.quantreg('duration ~ prediction', commutes)

# Create a list of quantiles to calculate
quantiles = [0.05, 0.25, 0.50, 0.75, 0.95]

# Create a list of fits
fits = [model.fit(q=q) for q in quantiles]

# Create a new figure and axes
figure, axes = plt.subplots()

# Plot the scatter of data points
x = commutes['prediction']
axes.scatter(x, commutes['duration'], alpha=0.4)

# Create an array of predictions from the minimum to maximum to create the regression line
_x = np.linspace(x.min(), x.max())

for index, quantile in enumerate(quantiles):
    # Plot the quantile lines
    _y = fits[index].params['prediction'] * _x + fits[index].params['Intercept']
    axes.plot(_x, _y, label=quantile)

# Plot the line of perfect prediction
axes.plot(_x, _x, 'g--', label='Perfect prediction')
axes.legend()
axes.set_xlabel('Predicted duration (minutes)')
axes.set_ylabel('Actual duration (minutes)');

This gives: