How to replace outliers with the 5th and 95th perc

2020-06-04 02:13发布

I'd like to replace all values in my relatively large R dataset which take values above the 95th and below the 5th percentile, with those percentile values respectively. My aim is to avoid simply cropping these outliers from the data entirely.

Any advice would be much appreciated, I can't find any information on how to do this anywhere else.

4条回答
欢心
2楼-- · 2020-06-04 02:34

I used this code to get what you need:

qn = quantile(df$value, c(0.05, 0.95), na.rm = TRUE)
df = within(df, { value = ifelse(value < qn[1], qn[1], value)
                  value = ifelse(value > qn[2], qn[2], value)})

where df is your data.frame, and value the column that contains your data.

查看更多
混吃等死
3楼-- · 2020-06-04 02:37

There is a better way to solve this problem. An outlier is not any point over the 95th percentile or below the 5th percentile. Instead, an outlier is considered so if it is below the first quartile – 1.5·IQR or above third quartile + 1.5·IQR.
This website will explain in more thoroughly

To know more about outlier treatment refer here

capOutlier <- function(x){
   qnt <- quantile(x, probs=c(.25, .75), na.rm = T)
   caps <- quantile(x, probs=c(.05, .95), na.rm = T)
   H <- 1.5 * IQR(x, na.rm = T)
   x[x < (qnt[1] - H)] <- caps[1]
   x[x > (qnt[2] + H)] <- caps[2]
   return(x)
}
df$colName=capOutlier(df$colName)
Do the above line over and over for all of the columns in your data frame
查看更多
兄弟一词,经得起流年.
4楼-- · 2020-06-04 02:54

This would do it.

fun <- function(x){
    quantiles <- quantile( x, c(.05, .95 ) )
    x[ x < quantiles[1] ] <- quantiles[1]
    x[ x > quantiles[2] ] <- quantiles[2]
    x
}
fun( yourdata )
查看更多
放我归山
5楼-- · 2020-06-04 02:57

You can do it in one line of code using squish():

d2 <- squish(d, quantile(d, c(.05, .95)))



In the scales library, look at ?squish and ?discard

#--------------------------------
library(scales)

pr <- .95
q  <- quantile(d, c(1-pr, pr))
d2 <- squish(d, q)
#---------------------------------

# Note: depending on your needs, you may want to round off the quantile, ie:
q <- round(quantile(d, c(1-pr, pr)))

example:

d <- 1:20
d
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20


d2 <- squish(d, round(quantile(d, c(.05, .95))))
d2
# [1]  2  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 19
查看更多
登录 后发表回答