I'd like to replace all values in my relatively large R dataset which take values above the 95th and below the 5th percentile, with those percentile values respectively. My aim is to avoid simply cropping these outliers from the data entirely.
Any advice would be much appreciated, I can't find any information on how to do this anywhere else.
I used this code to get what you need:
where
df
is your data.frame, andvalue
the column that contains your data.There is a better way to solve this problem. An outlier is not any point over the 95th percentile or below the 5th percentile. Instead, an outlier is considered so if it is below the first quartile – 1.5·IQR or above third quartile + 1.5·IQR.
This website will explain in more thoroughly
To know more about outlier treatment refer here
This would do it.
You can do it in one line of code using
squish()
:In the scales library, look at
?squish
and?discard
example: