Rmarkdown table with cells that have two values

2019-04-12 21:38发布

问题:

I want to use rmarkdown to make a table where each cell has two values, for example 3.1 (0.05) or 78 ± 23.3. These kinds of tables are pretty common in scientific literature (like ones with bold values), where we want to compactly show mean and standard deviation, or a value plus-minus some error term. So it would be useful to have a simple way to produce them when using Rmarkdown. For example:

# my table
mtcars

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
[snipped]

# my other table, that I want to combine with the first
some_error_term_for_mtcars <- data.frame(sapply(1:ncol(mtcars), function(i) sample(x = (min(mtcars[, i])/10):max(mtcars[, i])/10, nrow(mtcars), replace = TRUE)))

some_error_term_for_mtcars
      X1   X2     X3    X4     X5      X6    X7  X8  X9  X10  X11
1  2.704 0.44 26.011  3.92 0.4276 0.21513 1.145 0.0 0.0 0.03 0.41
2  0.604 0.44  5.211  6.32 0.0276 0.01513 1.345 0.1 0.1 0.33 0.21
3  3.304 0.14 31.511 20.42 0.1276 0.51513 0.145 0.1 0.0 0.43 0.71
4  1.004 0.44 16.011 26.02 0.2276 0.11513 1.345 0.1 0.0 0.03 0.31
5  2.604 0.34  4.311 30.02 0.0276 0.31513 1.745 0.1 0.1 0.23 0.41
6  2.404 0.64  8.011 27.92 0.1276 0.21513 1.145 0.0 0.1 0.33 0.41
7  2.804 0.14  4.811 14.92 0.1276 0.01513 0.345 0.1 0.0 0.13 0.31
[snipped]

What is the simplest way to combine these two tables in rmarkdown to produce a single where a single cells can contain things like 21 (0.904) or 21 ± 0.904?

回答1:

We could do it like this, and then use knitr::kable to get the markdown:

two_tables_into_one <- as.data.frame(do.call(cbind, lapply(1:ncol(mtcars), function(i) paste0(mtcars[ , i], " (", some_error_term_for_mtcars[ , i], ")"  ) )))
names(two_tables_into_one) <- names(mtcars)
head(two_tables_into_one)
           mpg      cyl         disp          hp          drat              wt          qsec      vs
1   21 (2.704) 6 (0.44) 160 (26.011)  110 (3.92)  3.9 (0.4276)  2.62 (0.21513) 16.46 (1.145)   0 (0)
2   21 (0.604) 6 (0.44)  160 (5.211)  110 (6.32)  3.9 (0.0276) 2.875 (0.01513) 17.02 (1.345) 0 (0.1)
3 22.8 (3.304) 4 (0.14) 108 (31.511)  93 (20.42) 3.85 (0.1276)  2.32 (0.51513) 18.61 (0.145) 1 (0.1)
4 21.4 (1.004) 6 (0.44) 258 (16.011) 110 (26.02) 3.08 (0.2276) 3.215 (0.11513) 19.44 (1.345) 1 (0.1)
5 18.7 (2.604) 8 (0.34)  360 (4.311) 175 (30.02) 3.15 (0.0276)  3.44 (0.31513) 17.02 (1.745) 0 (0.1)
6 18.1 (2.404) 6 (0.64)  225 (8.011) 105 (27.92) 2.76 (0.1276)  3.46 (0.21513) 20.22 (1.145)   1 (0)
       am     gear     carb
1   1 (0) 4 (0.03) 4 (0.41)
2 1 (0.1) 4 (0.33) 4 (0.21)
3   1 (0) 4 (0.43) 1 (0.71)
4   0 (0) 3 (0.03) 1 (0.31)
5 0 (0.1) 3 (0.23) 2 (0.41)
6 0 (0.1) 3 (0.33) 1 (0.41)

knitr::kable(head(two_tables_into_one))

or for a plus-minus separator:

two_tables_into_one <- as.data.frame(do.call(cbind, lapply(1:ncol(mtcars), function(i) paste0(mtcars[ , i], " ± ", some_error_term_for_mtcars[ , i]  ) )))
names(two_tables_into_one) <- names(mtcars)
head(two_tables_into_one)
           mpg      cyl         disp          hp
1   21 ± 2.704 6 ± 0.44 160 ± 26.011  110 ± 3.92
2   21 ± 0.604 6 ± 0.44  160 ± 5.211  110 ± 6.32
3 22.8 ± 3.304 4 ± 0.14 108 ± 31.511  93 ± 20.42
4 21.4 ± 1.004 6 ± 0.44 258 ± 16.011 110 ± 26.02
5 18.7 ± 2.604 8 ± 0.34  360 ± 4.311 175 ± 30.02
6 18.1 ± 2.404 6 ± 0.64  225 ± 8.011 105 ± 27.92
           drat              wt          qsec
1  3.9 ± 0.4276  2.62 ± 0.21513 16.46 ± 1.145
2  3.9 ± 0.0276 2.875 ± 0.01513 17.02 ± 1.345
3 3.85 ± 0.1276  2.32 ± 0.51513 18.61 ± 0.145
4 3.08 ± 0.2276 3.215 ± 0.11513 19.44 ± 1.345
5 3.15 ± 0.0276  3.44 ± 0.31513 17.02 ± 1.745
6 2.76 ± 0.1276  3.46 ± 0.21513 20.22 ± 1.145
       vs      am     gear     carb
1   0 ± 0   1 ± 0 4 ± 0.03 4 ± 0.41
2 0 ± 0.1 1 ± 0.1 4 ± 0.33 4 ± 0.21
3 1 ± 0.1   1 ± 0 4 ± 0.43 1 ± 0.71
4 1 ± 0.1   0 ± 0 3 ± 0.03 1 ± 0.31
5 0 ± 0.1 0 ± 0.1 3 ± 0.23 2 ± 0.41
6   1 ± 0 0 ± 0.1 3 ± 0.33 1 ± 0.41

knitr::kable(head(two_tables_into_one))

But this as.data.frame(do.call(cbind, lapply... seems a bit awkward. Is there a neater way?



回答2:

I used the following technique in my summarytools package (you can look at the source code for descr() and print.summarytools() to get all the details).

> install.packages("devtools")
> library(devtools)
> install_github('dcomtois/summarytools')
> library(summarytools)
> obs <- descr(iris)$observ
> obs
      Sepal.Length Sepal.Width  Petal.Length Petal.Width 
Valid "150 (100%)" "150 (100%)" "150 (100%)" "150 (100%)"
<NA>  "0 (0%)"     "0 (0%)"     "0 (0%)"     "0 (0%)"    
Total "150 (100%)" "150 (100%)" "150 (100%)" "150 (100%)"

The $observ dataframe has been constructed this way - it's part of a bigger loop, hence the i iterator. Note that the dataframe is transposed later on in the code.

output$observ[i,] <- c(paste0(n.valid, " (", p.valid, "%)"),
                       paste0(n.NA, " (", p.NA, "%)"),
                       paste(n.valid + n.NA, "(100%)"))

Then for generating an rmarkdown table using pander, we can simply do this:

> library(pander)
> pander(x = obs, style="rmarkdown")    


|   &nbsp;    |  Sepal.Length  |  Sepal.Width  |  Petal.Length  |
|:-----------:|:--------------:|:-------------:|:--------------:|
|  **Valid**  |   150 (100%)   |  150 (100%)   |   150 (100%)   |
|  **<NA>**   |     0 (0%)     |    0 (0%)     |     0 (0%)     |
|  **Total**  |   150 (100%)   |  150 (100%)   |   150 (100%)   |

Table: Table continues below


|   &nbsp;    |  Petal.Width  |
|:-----------:|:-------------:|
|  **Valid**  |  150 (100%)   |
|  **<NA>**   |    0 (0%)     |
|  **Total**  |  150 (100%)   |

Here's the full output for the descr() function:

> descr(iris, style = "rmarkdown", plain.ascii = FALSE)
Non-numerical variable(s) ignored: Species

Descriptive Statistics

Dataframe: iris

|            &nbsp; |   Sepal.Length |   Sepal.Width |   Petal.Length |   Petal.Width |
|------------------:|---------------:|--------------:|---------------:|--------------:|
|          **Mean** |           5.84 |          3.06 |           3.76 |           1.2 |
|       **Std.Dev** |           0.83 |          0.44 |           1.77 |          0.76 |
|           **Min** |            4.3 |             2 |              1 |           0.1 |
|           **Max** |            7.9 |           4.4 |            6.9 |           2.5 |
|        **Median** |            5.8 |             3 |           4.35 |           1.3 |
|           **mad** |           1.04 |          0.44 |           1.85 |          1.04 |
|           **IQR** |            1.3 |           0.5 |            3.5 |           1.5 |
|            **CV** |           7.06 |          7.01 |           2.13 |          1.57 |
|      **Skewness** |           0.31 |          0.31 |          -0.27 |          -0.1 |
|   **SE.Skewness** |            0.2 |           0.2 |            0.2 |           0.2 |
|      **Kurtosis** |          -0.61 |          0.14 |          -1.42 |         -1.36 |

Observations

|      &nbsp; |   Sepal.Length |   Sepal.Width |   Petal.Length |   Petal.Width |
|------------:|---------------:|--------------:|---------------:|--------------:|
|   **Valid** |     150 (100%) |    150 (100%) |     150 (100%) |    150 (100%) |
|    **<NA>** |         0 (0%) |        0 (0%) |         0 (0%) |        0 (0%) |
|   **Total** |     150 (100%) |    150 (100%) |     150 (100%) |    150 (100%) |

Now for combining data from 2 distinct datasets, a good old for loop can very well do the job:

names(some_error_term_for_mtcars) <- names(mtcars)
new.df <- mtcars
for (n in names(mtcars)) {
  new.df[,n] <- paste(mtcars[,n], "±",round(some_error_term_for_mtcars[,n],2))
}
pander(new.df, style="rmarkdown")

Partial output:

|          &nbsp;           |    mpg     |   cyl    |     disp      |
|:-------------------------:|:----------:|:--------:|:-------------:|
|       **Mazda RX4**       |   21 ± 2   | 6 ± 0.04 |  160 ± 33.61  |
|     **Mazda RX4 Wag**     |  21 ± 0.8  | 6 ± 0.14 |  160 ± 26.11  |
|      **Datsun 710**       | 22.8 ± 0.1 | 4 ± 0.64 |  108 ± 45.81  |
|    **Hornet 4 Drive**     | 21.4 ± 1.7 | 6 ± 0.04 |  258 ± 33.81  |
|   **Hornet Sportabout**   | 18.7 ± 2.7 | 8 ± 0.54 |  360 ± 37.81  |
|        **Valiant**        | 18.1 ± 3.3 | 6 ± 0.14 |  225 ± 36.31  |
|      **Duster 360**       | 14.3 ± 0.1 | 8 ± 0.24 |  360 ± 2.01   |
|       **Merc 240D**       | 24.4 ± 2.3 | 4 ± 0.14 | 146.7 ± 8.81  |
|       **Merc 230**        | 22.8 ± 1.7 | 4 ± 0.04 | 140.8 ± 43.91 |
|       **Merc 280**        | 19.2 ± 1.5 | 6 ± 0.24 | 167.6 ± 6.91  |
|       **Merc 280C**       |  17.8 ± 3  | 6 ± 0.14 | 167.6 ± 27.11 |
|      **Merc 450SE**       |  16.4 ± 3  | 8 ± 0.34 | 275.8 ± 11.21 |
|      **Merc 450SL**       | 17.3 ± 2.8 | 8 ± 0.14 | 275.8 ± 32.21 |
|      **Merc 450SLC**      | 15.2 ± 0.3 | 8 ± 0.44 | 275.8 ± 11.61 |