I am trying to create a contingency table from a particular type of data. This would be doable with loops etc... but because my final table would contain more than 10E5 cells, I am looking for a pre-existing function.
My initial data are as follow:
PLANT ANIMAL INTERACTIONS
---------------------- ------------------------------- ------------
Tragopogon_pratensis Propylea_quatuordecimpunctata 1
Anthriscus_sylvestris Rhagonycha_nigriventris 3
Anthriscus_sylvestris Sarcophaga_carnaria 2
Heracleum_sphondylium Sarcophaga_carnaria 1
Anthriscus_sylvestris Sarcophaga_variegata 4
Anthriscus_sylvestris Sphaerophoria_interrupta_Gruppe 3
Cerastium_holosteoides Sphaerophoria_interrupta_Gruppe 1
I would like to create a table like this:
Propylea_quatuordecimpunctata Rhagonycha_nigriventris Sarcophaga_carnaria Sarcophaga_variegata Sphaerophoria_interrupta_Gruppe
---------------------- ----------------------------- ----------------------- ------------------- -------------------- -------------------------------
Tragopogon_pratensis 1 0 0 0 0
Anthriscus_sylvestris 0 3 2 4 3
Heracleum_sphondylium 0 0 1 0 0
Cerastium_holosteoides 0 0 0 0 1
That is, all plant species in row, all animal species in columns, and sometimes there are no interactions (while my initial data only list interactions that occur).
With
dplyr / tidyr
:xtabs in base R should work, for example:
I think that should do what you're looking for fairly easily. I'm not sure how it scales up in terms of efficiency to a 10E5 contingency table, but that might be a separate issue statistically.
I'd like to point out that we can get the same results Andrie posted without using the function
with
:R Base Package
Package gmodels:
the
reshape
package should do the trick.I'm still figuring out how to fix the
order
issue, any suggestion?In base R, use
table
orxtabs
:The
gmodels
packages has a functionCrossTable
that gives output similar to what users of SPSS or SAS expects:Simply use
dcast()
function of "reshape2
" package:Here "PLANT" will be on the left column, "ANIMALS" on the top row, filling of the table will happen using "INTERACTIONS" and "NULL" values will be filled using 0's.