Can't run correspondence analysis on two-way c

2019-02-25 14:04发布

问题:

It does not appear to work on this table, named mytable:

              0      1      2      3      4      5      7
 Click_No  242854  91661    102     21     65     51    291
 Click_Yes  48274  20785     14      2     19      4    146

However, it works on this table:

          0      1      2      3      4      5      7
Row1      4      0      0      0      0      0     11
Row2     35      2      0      0      0      0      0
Row3  18364     14      0      0      0      0      0
Row4     13      0      0      0      0      0      7
Row5   1497   1521      6      0      0      0      0
Row6    686      2      0      0      0      0    393
Row7 270167 110512    110     23     84     54      0
Row8      1      0      0      0      0      0     26
Row9    361    395      0      0      0      1      0

I used the FactoMineR function:

 res.ca <- CA(mytable)

Does CA not work on specific types of contingency tables? I haven't read anything in the literature to suggest this, other than for very large sizes.

Error generated when running summary(res.ca):

Call:
CA(X = mytable) 

The chi square of independence between the two variables is equal to 297.3778 (p-value =  2.982623e-61 ).

Eigenvalues
                     Dim.1
Variance             1e-03
% of var.            1e+02
Cumulative % of var. 1e+02

Rows
Error in if (nrow(res$row$coord) > nbelements) cat(paste(" (the ", nbelements,  : 
  argument is of length zero
In addition: Warning message:
In max(nchar(rownames(res[aux[1]][[1]]$coord))) :
  no non-missing arguments to max; returning -Inf

Edit:

dput(mytable) output:

mytable <- structure(c(242854L, 48274L, 91661L, 20785L, 102L, 14L, 21L, 
2L, 65L, 19L, 51L, 4L, 291L, 146L), .Dim = c(2L, 7L), .Dimnames = structure(list(
    c("0", "1"), c("0", "1", "2", "3", "4", "5", "7")), .Names = c("", 
"")), class = "table")

回答1:

I think the problem is statistical and not computational. A correspondence analysis produces a maximum of min(j-1;i-1) dimensions where i is the number of lines and j the number of columns (ie. the number of modalities of the two variables). You are trying to do a CA of a j=2;i=8 contingency table. It can only output a single axis. This is why you get this error: you should not use CA with a two-modality variable.

There are mathematical explanations of this p. 84 of Benzecry's Correspondance Analysis Handbook for instance. You may get a better explanation if you ask a question about this on CV.

Here is an example with the children data set in FactoMineR:

library(FactoMineR)
data("children")
## Example from help("CA"), works fine
summary(CA(children, row.sup = 15:18, col.sup = 6:8))
## Example when we restrict the contingency table to the first two rows.
## Produces an error
summary(CA(children, row.sup = 3:18, col.sup = 6:8))