matplotlib:打印时无视离群(matplotlib: disregard outliers

2019-06-26 21:58发布

我从绘制各种测试的一些数据。 有时,在测试我碰巧有一个异常值(0.1说),而所有其它值幅度要小三个数量级。

与matplotlib,我暗算范围[0, max_data_value]

我如何才能放大到我的数据并没有显示异常,这将弄乱在我的剧情x轴?

应予简单地采取的95百分位,并具有范围[0, 95_percentile]在x轴?

Answer 1:

没有单一的“最佳”测试异常值。 理想情况下,你应该将先验信息(例如,“这个参数不宜过度,因为嗒嗒的X ...”)。

离群大多数测试使用中位数绝对偏差,而不是第95百分位或其他一些基于方差的测量。 否则,所计算出的方差/ STDDEV将很大程度上受到异常值偏斜。

下面是实现比较常见的异常点检验的一个功能。

def is_outlier(points, thresh=3.5):
    """
    Returns a boolean array with True if points are outliers and False 
    otherwise.

    Parameters:
    -----------
        points : An numobservations by numdimensions array of observations
        thresh : The modified z-score to use as a threshold. Observations with
            a modified z-score (based on the median absolute deviation) greater
            than this value will be classified as outliers.

    Returns:
    --------
        mask : A numobservations-length boolean array.

    References:
    ----------
        Boris Iglewicz and David Hoaglin (1993), "Volume 16: How to Detect and
        Handle Outliers", The ASQC Basic References in Quality Control:
        Statistical Techniques, Edward F. Mykytka, Ph.D., Editor. 
    """
    if len(points.shape) == 1:
        points = points[:,None]
    median = np.median(points, axis=0)
    diff = np.sum((points - median)**2, axis=-1)
    diff = np.sqrt(diff)
    med_abs_deviation = np.median(diff)

    modified_z_score = 0.6745 * diff / med_abs_deviation

    return modified_z_score > thresh

由于使用它的一个例子,你会做类似如下:

import numpy as np
import matplotlib.pyplot as plt

# The function above... In my case it's in a local utilities module
from sci_utilities import is_outlier

# Generate some data
x = np.random.random(100)

# Append a few "bad" points
x = np.r_[x, -3, -10, 100]

# Keep only the "good" points
# "~" operates as a logical not operator on boolean numpy arrays
filtered = x[~is_outlier(x)]

# Plot the results
fig, (ax1, ax2) = plt.subplots(nrows=2)

ax1.hist(x)
ax1.set_title('Original')

ax2.hist(filtered)
ax2.set_title('Without Outliers')

plt.show()



Answer 2:

如果你不大惊小怪由乔提到拒绝离群值,它是纯粹的审美理由这样做,你可以只设置你的情节的x轴的限制:

plt.xlim(min_x_data_value,max_x_data_value)

当值是你希望显示的限制。

plt.ylim(min,max)工作以设置在y轴上还限制。



文章来源: matplotlib: disregard outliers when plotting