Multivariate robust outlier detection using R

2019-04-13 02:11发布

What is the preferred way (in your opinion) to perform multivariate robust outlier detection in R in an automatic way, i.e. without manual inspection and plotting?

I have found the "dprep" package, but it seems discontinued. However, as outlier detection is a frequent and important task, a generic default method should be available, e.g. the MCD estimator (Rousseeuw and Van Driesen, 1999).

2条回答
三岁会撩人
2楼-- · 2019-04-13 02:59

Try covMcd in package robustbase.

查看更多
我想做一个坏孩纸
3楼-- · 2019-04-13 03:09

Use Cook's Distance enter image description here You could use cook's distance. Cook's distance is computed based on a linear regression model. That means, you will be able to include multiple X variables to compute the outlier (high influence observations, more precisely). This effectively gives you the flexibility to add or drop the variables on which you would want to determine the outliers. The way to compute it for every observation in R would look something like this:

mod <- lm(Y ~ X1 + X2 + X3, data=inputData)
cooksd <- cooks.distance(mod)

In general convention, those observations with a cook's distance > 4*mean(cooks distance) are considered outliers. For more information about the formula and interpretation of cook's distance refer to this example

Disclaimer: I am the author.

查看更多
登录 后发表回答