adjOutlyingness {robustbase} | R Documentation |
For an n * p data matrix (or data frame) x
,
compute the “outlyingness” of all n observations.
Outlyingness here is a generalization of the Donoho-Stahel
outlyingness measure, where skewness is taken into account via the
medcouple, mc()
.
adjOutlyingness(x, ndir = 250, clower = 3, cupper = 4, alpha.cutoff = 0.75, coef = 1.5, qr.tol = 1e-12)
x |
a numeric |
ndir |
positive integer specifying the number of directions that should be searched. |
clower, cupper |
the constant to be used for the lower and upper tails, in order to transform the data towards symmetry. |
alpha.cutoff |
number in (0,1) specifying the quantiles (α, 1-α) which determine the “outlier” cutoff. |
coef |
positive number specifying the factor with which the
interquartile range ( |
qr.tol |
positive tolerance to be used for |
FIXME: Details in the comment of the Matlab code; also in the reference(s).
The method as described can be useful as preprocessing in FASTICA (http://www.cis.hut.fi/projects/ica/fastica/; see also the R package fastICA.
a list with components
adjout |
numeric of |
cutoff |
cutoff for “outlier” with respect to the adjusted
outlyingnesses, and depending on |
nonOut |
logical of |
The result is random as it depends on the sample of
ndir
directions chosen.
Guy Brys; help page and improvements by Martin Maechler
Brys, G., Hubert, M., and Rousseeuw, P.J. (2005) A Robustification of Independent Component Analysis; Journal of Chemometrics, 19, 1–12.
For the up-to-date reference, please consult http://wis.kuleuven.be/stat/robust.html
the adjusted boxplot, adjbox
and the medcouple,
mc
.
## An Example with bad condition number and "border case" outliers if(FALSE) {## Not yet ok, because of bug in adjOutl dim(longley) set.seed(1) ## result is random %% and there's a bug - FIXME! -- try set.seed(3) ao1 <- adjOutlyingness(longley) ## which are not outlying ? table(ao1$nonOut) ## all of them stopifnot(all(ao1$nonOut)) } ## An Example with outliers : dim(hbk) set.seed(1) ao.hbk <- adjOutlyingness(hbk) str(ao.hbk) hist(ao.hbk $adjout)## really two groups table(ao.hbk$nonOut)## 14 outliers, 61 non-outliers: ## outliers are : which(! ao.hbk$nonOut) # 1 .. 14 --- but not for all random seeds! ## here, they are the same as found by (much faster) MCD: cc <- covMcd(hbk) stopifnot(all(cc$mcd.wt == ao.hbk$nonOut)) ## This is revealing (about 1--2 cases, where outliers are *not* == 1:14 ## but needs almost 1 [sec] per call: if(interactive()) { for(i in 1:30) { print(system.time(ao.hbk <- adjOutlyingness(hbk))) if(!identical(iout <- which(!ao.hbk$nonOut), 1:14)) { cat("Outliers:\n"); print(iout) } } }