The minimal reproducible example (RE) below is my attempt to figure out how can I use knitr
for generating complex dynamic documents, where "complex" here refers not to the document's elements and their layout, but to non-linear logic of the underlying R code chunks. While the provided RE and its results show that a solution, based on such approach might work well, I would like to know: 1) is this a correct approach of using knitr
for such situations; 2) are there any optimizations that can be made to improve the approach; 3) what are alternative approaches, which could decrease the granularity of code chunks.
EDA source code (file "reEDA.R"):
## @knitr CleanEnv
rm(list = ls(all.names = TRUE))
## @knitr LoadPackages
## @knitr PrepareData
set.seed(100) # for reproducibility
data(diamonds, package='ggplot2') # use built-in data
## @knitr PerformEDA
generatePlot <- function (df, colName) {
df <- df
df$var <- df[[colName]]
g <- ggplot(data.frame(df)) +
scale_fill_continuous("Density", low="#56B1F7", high="#132B43") +
scale_x_log10("Diamond Price [log10]") +
scale_y_continuous("Density") +
geom_histogram(aes(x = var, y = ..density..,
fill = ..density..),
binwidth = 0.01)
return (g)
performEDA <- function (data) {
d_var <- paste0("d_", deparse(substitute(data)))
assign(d_var, describe(data), envir = .GlobalEnv)
for (colName in names(data)) {
if (is.numeric(data[[colName]]) || is.factor(data[[colName]])) {
t_var <- paste0("t_", colName)
assign(t_var, summary(data[[colName]]), envir = .GlobalEnv)
g_var <- paste0("g_", colName)
assign(g_var, generatePlot(data, colName), envir = .GlobalEnv)
EDA report R Markdown document (file "reEDA.Rmd"):
```{r KnitrSetup, echo=FALSE, include=FALSE}
opts_knit$set(progress = TRUE, verbose = TRUE)
echo = FALSE,
include = FALSE,
tidy = FALSE,
warning = FALSE,
```{r ReadChunksEDA, cache=FALSE}
```{r CleanEnv}
```{r LoadPackages}
```{r PrepareData}
Narrative: Data description
```{r PerformEDA}
Narrative: Intro to EDA results
Let's look at summary descriptive statistics for our dataset
```{r DescriptiveDataset, include=TRUE}
Now, let's examine each variable of interest individually.
Varible Price is ... Decriptive statistics for 'Price':
```{r DescriptivePrice, include=TRUE}
Finally, let's examine price distribution across the dataset visually:
```{r VisualPrice, include=TRUE, fig.align='center'}
The result can be found here: