I earlier asked "How to display two columns as binary (presence/absence) matrix?". This question received two excellent answers. I would now like to take this a step further and add a third column to the original site by species columns which reflects the biomass of each species in each plot.
Column 1 (plot) specifies code for ~ 200 plots, column 2 (species) specifies code for ~ 1200 species and Column 3 (biomass) specifies the dryweight. Each plot has > 1 species and each species can occur in > 1 plot. The total number of rows is ~ 2700.
> head(dissim)
plot species biomass
1 a1f56r jactom 20.2
2 a1f56r zinunk 10.3
3 a1f56r mikcor 0.4
4 a1f56r rubcle 1.3
5 a1f56r sphoos 12.4
6 a1f56r nepbis1 8.2
tail(dissim)
plot species biomass
2707 og100m562r selcup 4.7
2708 og100m562r pip139 30.5
2709 og100m562r stasum 0.1
2710 og100m562r artani 3.4
2711 og100m562r annunk 20.7
2712 og100m562r rubunk 22.6
I would like to create a plot by species matrix that displays the biomass of each species in each plot (rather than a binary presence/absence matrix), something of the form:
jactom rubcle chrodo uncgla
a1f56r 1.3 0 10.3 0
a1f17r 0 22.3 0 4
a1m5r 3.2 0 3.7 9.7
a1m5r 1 0 0 20.1
a1m17r 5.4 6.9 0 1
Any advice on how to add this additional level of complexity would be very much appreciated.
The xtabs and tapply functions return a table which is a matrix:
# Using MrFlick's example
> xtabs(~a+b,dd)
b
a f g h i j
a 0 1 0 2 3
b 0 0 2 1 0
c 0 3 0 0 1
d 2 2 2 1 1
e 1 1 2 4 1
# --- the tapply solution is a bit less elegant
> dd$one=1
> with(dd, tapply(one, list(a,b), sum))
f g h i j
a NA 1 NA 2 3
b NA NA 2 1 NA
c NA 3 NA NA 1
d 2 2 2 1 1
e 1 1 2 4 1
# If you want to make the NA's become zeros then:
> tbl <- with(dd, tapply(one, list(a,b), sum))
> tbl[is.na(tbl)] <- 0
> tbl
f g h i j
a 0 1 0 2 3
b 0 0 2 1 0
c 0 3 0 0 1
d 2 2 2 1 1
e 1 1 2 4 1
With sample data
set.seed(15)
dd<-data.frame(
a=sample(letters[1:5], 30, replace=T),
b=sample(letters[6:10], 30, replace=T)
)
if you know each occurrence only appears once you can do
with(dd, table(a,b))
# b
# a f g h i j
# a 0 1 0 2 3
# b 0 0 2 1 0
# c 0 3 0 0 1
# d 2 2 2 1 1
# e 1 1 2 4 1
if they are potentially duplicated, and you only want to track presence/absence, you can do
with(unique(dd), table(a,b))
# or
with(dd, (table(a,b)>0)+0)
# b
# a f g h i j
# a 0 1 0 1 1
# b 0 0 1 1 0
# c 0 1 0 0 1
# d 1 1 1 1 1
# e 1 1 1 1 1
You asked also about a solution when there are three variables. Below I provide two solutions that you asked for.
First, let's set up the data the data:
set.seed(15)
dd<-data.frame(
a=sample(letters[1:5], 30, replace=T),
b=sample(letters[6:10], 30, replace=T),
c=sample(letters[1:3], 30, replace=T)
)
If you have three discrete variables and want only to count the occurrences, here you have a version of solution by @MrFlick:
by(dd, dd$c, function(x) with(x, table(a, b)))
And if you want average values of the third variable you can use this solution:
reshape::cast(dd, a ~ b, value = 'c', fun = mean)