I want to use rmarkdown to make a table where each cell has two values, for example 3.1 (0.05)
or 78 ± 23.3
. These kinds of tables are pretty common in scientific literature (like ones with bold values), where we want to compactly show mean and standard deviation, or a value plus-minus some error term. So it would be useful to have a simple way to produce them when using Rmarkdown. For example:
# my table
mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
[snipped]
# my other table, that I want to combine with the first
some_error_term_for_mtcars <- data.frame(sapply(1:ncol(mtcars), function(i) sample(x = (min(mtcars[, i])/10):max(mtcars[, i])/10, nrow(mtcars), replace = TRUE)))
some_error_term_for_mtcars
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
1 2.704 0.44 26.011 3.92 0.4276 0.21513 1.145 0.0 0.0 0.03 0.41
2 0.604 0.44 5.211 6.32 0.0276 0.01513 1.345 0.1 0.1 0.33 0.21
3 3.304 0.14 31.511 20.42 0.1276 0.51513 0.145 0.1 0.0 0.43 0.71
4 1.004 0.44 16.011 26.02 0.2276 0.11513 1.345 0.1 0.0 0.03 0.31
5 2.604 0.34 4.311 30.02 0.0276 0.31513 1.745 0.1 0.1 0.23 0.41
6 2.404 0.64 8.011 27.92 0.1276 0.21513 1.145 0.0 0.1 0.33 0.41
7 2.804 0.14 4.811 14.92 0.1276 0.01513 0.345 0.1 0.0 0.13 0.31
[snipped]
What is the simplest way to combine these two tables in rmarkdown to produce a single where a single cells can contain things like 21 (0.904)
or 21 ± 0.904
?
We could do it like this, and then use knitr::kable
to get the markdown:
two_tables_into_one <- as.data.frame(do.call(cbind, lapply(1:ncol(mtcars), function(i) paste0(mtcars[ , i], " (", some_error_term_for_mtcars[ , i], ")" ) )))
names(two_tables_into_one) <- names(mtcars)
head(two_tables_into_one)
mpg cyl disp hp drat wt qsec vs
1 21 (2.704) 6 (0.44) 160 (26.011) 110 (3.92) 3.9 (0.4276) 2.62 (0.21513) 16.46 (1.145) 0 (0)
2 21 (0.604) 6 (0.44) 160 (5.211) 110 (6.32) 3.9 (0.0276) 2.875 (0.01513) 17.02 (1.345) 0 (0.1)
3 22.8 (3.304) 4 (0.14) 108 (31.511) 93 (20.42) 3.85 (0.1276) 2.32 (0.51513) 18.61 (0.145) 1 (0.1)
4 21.4 (1.004) 6 (0.44) 258 (16.011) 110 (26.02) 3.08 (0.2276) 3.215 (0.11513) 19.44 (1.345) 1 (0.1)
5 18.7 (2.604) 8 (0.34) 360 (4.311) 175 (30.02) 3.15 (0.0276) 3.44 (0.31513) 17.02 (1.745) 0 (0.1)
6 18.1 (2.404) 6 (0.64) 225 (8.011) 105 (27.92) 2.76 (0.1276) 3.46 (0.21513) 20.22 (1.145) 1 (0)
am gear carb
1 1 (0) 4 (0.03) 4 (0.41)
2 1 (0.1) 4 (0.33) 4 (0.21)
3 1 (0) 4 (0.43) 1 (0.71)
4 0 (0) 3 (0.03) 1 (0.31)
5 0 (0.1) 3 (0.23) 2 (0.41)
6 0 (0.1) 3 (0.33) 1 (0.41)
knitr::kable(head(two_tables_into_one))
or for a plus-minus separator:
two_tables_into_one <- as.data.frame(do.call(cbind, lapply(1:ncol(mtcars), function(i) paste0(mtcars[ , i], " ± ", some_error_term_for_mtcars[ , i] ) )))
names(two_tables_into_one) <- names(mtcars)
head(two_tables_into_one)
mpg cyl disp hp
1 21 ± 2.704 6 ± 0.44 160 ± 26.011 110 ± 3.92
2 21 ± 0.604 6 ± 0.44 160 ± 5.211 110 ± 6.32
3 22.8 ± 3.304 4 ± 0.14 108 ± 31.511 93 ± 20.42
4 21.4 ± 1.004 6 ± 0.44 258 ± 16.011 110 ± 26.02
5 18.7 ± 2.604 8 ± 0.34 360 ± 4.311 175 ± 30.02
6 18.1 ± 2.404 6 ± 0.64 225 ± 8.011 105 ± 27.92
drat wt qsec
1 3.9 ± 0.4276 2.62 ± 0.21513 16.46 ± 1.145
2 3.9 ± 0.0276 2.875 ± 0.01513 17.02 ± 1.345
3 3.85 ± 0.1276 2.32 ± 0.51513 18.61 ± 0.145
4 3.08 ± 0.2276 3.215 ± 0.11513 19.44 ± 1.345
5 3.15 ± 0.0276 3.44 ± 0.31513 17.02 ± 1.745
6 2.76 ± 0.1276 3.46 ± 0.21513 20.22 ± 1.145
vs am gear carb
1 0 ± 0 1 ± 0 4 ± 0.03 4 ± 0.41
2 0 ± 0.1 1 ± 0.1 4 ± 0.33 4 ± 0.21
3 1 ± 0.1 1 ± 0 4 ± 0.43 1 ± 0.71
4 1 ± 0.1 0 ± 0 3 ± 0.03 1 ± 0.31
5 0 ± 0.1 0 ± 0.1 3 ± 0.23 2 ± 0.41
6 1 ± 0 0 ± 0.1 3 ± 0.33 1 ± 0.41
knitr::kable(head(two_tables_into_one))
But this as.data.frame(do.call(cbind, lapply...
seems a bit awkward. Is there a neater way?
I used the following technique in my summarytools package (you can look at the source code for descr()
and print.summarytools()
to get all the details).
> install.packages("devtools")
> library(devtools)
> install_github('dcomtois/summarytools')
> library(summarytools)
> obs <- descr(iris)$observ
> obs
Sepal.Length Sepal.Width Petal.Length Petal.Width
Valid "150 (100%)" "150 (100%)" "150 (100%)" "150 (100%)"
<NA> "0 (0%)" "0 (0%)" "0 (0%)" "0 (0%)"
Total "150 (100%)" "150 (100%)" "150 (100%)" "150 (100%)"
The $observ
dataframe has been constructed this way - it's part of a bigger loop, hence the i
iterator. Note that the dataframe is transposed later on in the code.
output$observ[i,] <- c(paste0(n.valid, " (", p.valid, "%)"),
paste0(n.NA, " (", p.NA, "%)"),
paste(n.valid + n.NA, "(100%)"))
Then for generating an rmarkdown table using pander, we can simply do this:
> library(pander)
> pander(x = obs, style="rmarkdown")
| | Sepal.Length | Sepal.Width | Petal.Length |
|:-----------:|:--------------:|:-------------:|:--------------:|
| **Valid** | 150 (100%) | 150 (100%) | 150 (100%) |
| **<NA>** | 0 (0%) | 0 (0%) | 0 (0%) |
| **Total** | 150 (100%) | 150 (100%) | 150 (100%) |
Table: Table continues below
| | Petal.Width |
|:-----------:|:-------------:|
| **Valid** | 150 (100%) |
| **<NA>** | 0 (0%) |
| **Total** | 150 (100%) |
Here's the full output for the descr()
function:
> descr(iris, style = "rmarkdown", plain.ascii = FALSE)
Non-numerical variable(s) ignored: Species
Descriptive Statistics
Dataframe: iris
| | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width |
|------------------:|---------------:|--------------:|---------------:|--------------:|
| **Mean** | 5.84 | 3.06 | 3.76 | 1.2 |
| **Std.Dev** | 0.83 | 0.44 | 1.77 | 0.76 |
| **Min** | 4.3 | 2 | 1 | 0.1 |
| **Max** | 7.9 | 4.4 | 6.9 | 2.5 |
| **Median** | 5.8 | 3 | 4.35 | 1.3 |
| **mad** | 1.04 | 0.44 | 1.85 | 1.04 |
| **IQR** | 1.3 | 0.5 | 3.5 | 1.5 |
| **CV** | 7.06 | 7.01 | 2.13 | 1.57 |
| **Skewness** | 0.31 | 0.31 | -0.27 | -0.1 |
| **SE.Skewness** | 0.2 | 0.2 | 0.2 | 0.2 |
| **Kurtosis** | -0.61 | 0.14 | -1.42 | -1.36 |
Observations
| | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width |
|------------:|---------------:|--------------:|---------------:|--------------:|
| **Valid** | 150 (100%) | 150 (100%) | 150 (100%) | 150 (100%) |
| **<NA>** | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) |
| **Total** | 150 (100%) | 150 (100%) | 150 (100%) | 150 (100%) |
Now for combining data from 2 distinct datasets, a good old for
loop can very well do the job:
names(some_error_term_for_mtcars) <- names(mtcars)
new.df <- mtcars
for (n in names(mtcars)) {
new.df[,n] <- paste(mtcars[,n], "±",round(some_error_term_for_mtcars[,n],2))
}
pander(new.df, style="rmarkdown")
Partial output:
| | mpg | cyl | disp |
|:-------------------------:|:----------:|:--------:|:-------------:|
| **Mazda RX4** | 21 ± 2 | 6 ± 0.04 | 160 ± 33.61 |
| **Mazda RX4 Wag** | 21 ± 0.8 | 6 ± 0.14 | 160 ± 26.11 |
| **Datsun 710** | 22.8 ± 0.1 | 4 ± 0.64 | 108 ± 45.81 |
| **Hornet 4 Drive** | 21.4 ± 1.7 | 6 ± 0.04 | 258 ± 33.81 |
| **Hornet Sportabout** | 18.7 ± 2.7 | 8 ± 0.54 | 360 ± 37.81 |
| **Valiant** | 18.1 ± 3.3 | 6 ± 0.14 | 225 ± 36.31 |
| **Duster 360** | 14.3 ± 0.1 | 8 ± 0.24 | 360 ± 2.01 |
| **Merc 240D** | 24.4 ± 2.3 | 4 ± 0.14 | 146.7 ± 8.81 |
| **Merc 230** | 22.8 ± 1.7 | 4 ± 0.04 | 140.8 ± 43.91 |
| **Merc 280** | 19.2 ± 1.5 | 6 ± 0.24 | 167.6 ± 6.91 |
| **Merc 280C** | 17.8 ± 3 | 6 ± 0.14 | 167.6 ± 27.11 |
| **Merc 450SE** | 16.4 ± 3 | 8 ± 0.34 | 275.8 ± 11.21 |
| **Merc 450SL** | 17.3 ± 2.8 | 8 ± 0.14 | 275.8 ± 32.21 |
| **Merc 450SLC** | 15.2 ± 0.3 | 8 ± 0.44 | 275.8 ± 11.61 |