distancefactor {fpc} | R Documentation |
Computes a factor that can be used to standardise ordinal categorical variables and binary dummy variables coding categories of nominal scaled variables for Euclidean dissimilarity computation in mixed type data. See Hennig and Liao (2010).
distancefactor(cat,n=NULL, catsizes=NULL,type="categorical", normfactor=2,qfactor=ifelse(type=="categorical",1/2, 1/(1+1/(cat-1))))
cat |
integer. Number of categories of the variable to be standardised.
Note that for |
n |
integer. Number of data points. |
catsizes |
vector of integers giving numbers of observations per
category. One of |
type |
|
normfactor |
numeric. Factor on which standardisation is based.
As a default, this is |
qfactor |
numeric. Factor q in Hennig and Liao (submitted) to adjust for clumping effects due to discreteness. |
A factor by which to multiply the variable in order to make it
comparable to a unit variance continuous variable when aggregated in
Euclidean fashion for dissimilarity computation, so that expected
effective difference between two realisations of the variable equals
qfactor*normfactor
.
Christian Hennig chrish@stats.ucl.ac.uk http://www.homepages.ucl.ac.uk/~ucakche
Hennig, C. and Liao, T. (2010) Comparing latent class and dissimilarity based clustering for mixed type variables with application to social stratification. Research report no. 308, Department of Statistical Science, UCL. http://www.ucl.ac.uk/Stats/research/reports/psfiles/rr308.pdf
set.seed(776655) d1 <- sample(1:5,20,replace=TRUE) d2 <- sample(1:4,20,replace=TRUE) ldata <- cbind(d1,d2) lc <- cat2bin(ldata,categorical=1)$data lc[,1:5] <- lc[,1:5]*distancefactor(5,20,type="categorical") lc[,6] <- lc[,6]*distancefactor(4,20,type="ordinal")