cfa {cfa}R Documentation

Analysis of configuration frequencies

Description

Determines the frequency of all combinations of variable values (their configurations) in comparison with their expected frequency calculated from the marginals and displays them in the order of decreasing chi-squared. In addition, a global chi squared is calculated.

Usage

cfa(configmatrix, cntvector, 
    descending=TRUE, sort.on.chisq=TRUE, sort.on.n=FALSE,ignore.na=FALSE, 
    binom.test=FALSE, binom.test.limit=10, 
    bonferroni.p.z=T, bonferroni.alpha=0.05, 
    lehmacher=F, holm.alpha=0.01, verbose=FALSE)

Arguments

configmatrix Dataframe with the variables to be analyzed
cntvector Vector of counts (1 if the data are not aggregated)
descending Output in the order of decreasing chi squared
sort.on.chisq Sort output on chi squared
sort.on.n Sort output on the frequency of the configurations
ignore.na Ignore (casewise) missing data in the configurations
binom.test Perform binomial test for each configuration
binom.test.limit Maximum count (frequency) for which a binomial test is performed
bonferroni.p.z Bonferroni-adjust in sig. test for each configuration
bonferroni.alpha Alpha to be adjusted
lehmacher Perform Lehmacher's test with Holm's correction
holm.alpha Alpha to be adjusted according to Holm
verbose Long output

Details

Each variable must have at least two different values and may have more (extension of the classical CFA).

configmatrix must consist of at least two variables (columns). Factors and numbers are both accepted (the numbers are internally converted to factors). cntvector must be numeric.

Counts should be at least = 5 for the chi squared test to be reliable but when using the CFA as a purely heuristic tool counts of 0 are possible.

A z-approximation is used for the test of significance of configuration frequency.

If the data are not aggregated, i.e., there are several entries with the same configuration (the same contents of a row in configmatrix) the counts of these configurations are added and one entry with the summed up count is generated, replacing the original entries.

Value

A list with class "cfa" contains the tabular results and the overall parameters

Row names Configuration
n Frequency (count) of this configuration
pct Relative Frequency (count) of this configuration
expected Expected Frequency (count) of this configuration calculated from the marginals
Q Coefficient of pronouncedness of the configuration. Varies between 0 and 1
chisq Chi squared for the given configuration
z z-approximation
p p(z)
sig(p(z)) 1: significant 0: not significant (limit is Bonferroni-adjusted by default
Overall chi squared Overall chi squared for the entire table
p(chi squared) p(chi squared) for the entire table
Degrees of freedom Degrees of freedom for chi squared test of the entire table
Total n Sum of all counts

WARNING

The program is implemented in R itself rather than a compiled library and therefore slow. In most cases the input is a pre-aggregated table and speed is no problem because the configmatrix is small. There are no hard-coded limits in the program so even large tables can be processed but this will take time and memory. The outout table can be very wide if the levels of the factors variables are long strings so options(width=..) may need to be adjusted

Note

The CFA is very useful a a heuristic tool when large numbers of categorical variables are to be screened because there is only one table of results instead of a multi-dimensional crosstabulation or a large amounts of sub-tables generated from it

Author(s)

Stefan Funke <funke@attglobal.net>

References

Krauth J., Lienert G. A. (1973, Reprint 1995) Die Konfigurationsfrequenzanalyse (KFA) und ihre Anwendung in Psychologie und Medizin Beltz Psychologie Verlagsunion

Eye, A. von (1990) Introduction to configural frequency analysis. The search for types and anti-types in cross-classification. Cambride 1990

See Also

mcfa, hier.cfa, boot.cfa

Examples

library(cfa)
data(cfadat)
cfa(cfadat[c("gender","married","children")],cfadat["count"],verbose=T)