Importing common YAML in rstudio/knitr document

2019-03-16 03:37发布

问题:

I have a few Rmd documents that all have the same YAML frontmatter except for the title. How can I keep this frontmatter in one file and have it used for all the documents? It is getting rather large and I don't want to keep every file in step every time I tweak the frontmatter.

I want to still

  • use the Knit button/Ctrl+Shift+K shortcut in RStudio to do the compile
  • keep the whole setup portable: would like to avoid writing a custom output format or overriding rstudio.markdownToHTML (as this would require me to carry around a .Rprofile too)

Example

common.yaml:

author: me
date: "`r format (Sys.time(), format='%Y-%m-%d %H:%M:%S %z')`"
link-citations: true
reference-section-title: References
# many other options

an example document

----
title: On the Culinary Preferences of Anthropomorphic Cats
----

I do not like green eggs and ham. I do not like them, Sam I Am!

Desired output: The compiled example document (ie either HTML or PDF), which has been compiled with the metadata in common.yaml injected in. The R code in the YAML (in this case, the date) would be compiled as a bonus, but it is not necessary (I only use it for the date which I don't really need).

Options/Solutions?

I haven't quite got any of these working yet.

  • With rmarkdown one can create a _output.yaml to put common YAML metadata, but this will put all of that metadata under output: in the YAML so is only good for options under html_document: and pdf_document:, and not for things like author, date, ...
  • write a knitr chunk to import the YAML, e.g.

    ----
    title: On the Culinary Preferences of Anthropomorphic Cats
    ```{r echo=F, results='asis'}
    cat(readLines('common.yaml'), sep='\n')
    ```
    ----
    
    I do not like green eggs and ham. I do not like them, Sam I Am!
    

    This works if I knitr('input.Rmd') and then pandoc the output, but not if I use the Knit button from Rstudio (which I assume calls render), because this parses the metadata first before running knitr, and the metadata is malformed until knitr has been run.

  • Makefile: if I was clever enough I could write a Makefile or something to inject common.yaml into input.Rmd, then run rmarkdown::render(), and somehow hook it up to the Knit button of Rstudio, and perhaps somehow save this Rstudio configuration into the .Rproj file so that the whole thing is portable without me needing to edit .Rprofile too. But I'm not clever enough.

EDIT: I had a go at this last option and hooked up a Makefile to the Build command (Ctrl+Shift+B). However, this will build the same target every time I use it via Ctrl+Shift+B, and I want to build the target that corresponds with the Rmd file I currently have open in the editor [as for Ctrl+Shift+K].

回答1:

Have found two options to do this portably (ie no .Rprofile customisation needed, minimal duplication of YAML frontmatter):

  1. You can provide common yaml to pandoc on the command-line! d'oh!
  2. You can set the knit: property of the metadata to your own function to have greater control over what happens when you Ctrl+Shift+K.

Option 1: common YAML to command line.

Put all the common YAML in its own file

common.yaml:

---
author: me
date: "`r format (Sys.time(), format='%Y-%m-%d %H:%M:%S %z')`"
link-citations: true
reference-section-title: References
---

Note it's complete, ie the --- are needed.

Then in the document you can specify the YAML as the last argument to pandoc, and it'll apply the YAML (see this github issue)

in example.rmd:

---
title: On the Culinary Preferences of Anthropomorphic Cats
output:
  html_document:
    pandoc_args: './common.yaml'
---

I do not like green eggs and ham. I do not like them, Sam I Am!

You could even put the html_document: stuff in an _output.yaml since rmarkdown will take that and place it under output: for all the documents in that folder. In this way there can be no duplication of YAML between all documents using this frontmatter.

Pros:

  • no duplication of YAML frontmatter.
  • very clean

Cons:

  • the common YAML is not passed through knit, so the date field above will not be parsed. You will get the literal string "r format(Sys.time(), format='%Y-%m-%d %H:%M:%S %z')" as your date.
  • from the same github issue:

    Metadata definitions seen first are kept and left unchanged, even if conflicting data is parsed at a later point.

Perhaps this could be a problem at some point depending on your setup.

Option 2: override the knit command

This allows for much greater control, though is a bit more cumbersome/tricky.

This link and this one mention an undocumented feature in rmarkdown: the knit: part of the YAML will be executed when one clicks the "Knit" button of Rstudio.

In short:

  1. define a function myknit(inputFile, encoding) that would read the YAML, put it in to the RMD and call render on the result. Saved in its own file myknit.r.
  2. in the YAML of example.rmd, add

     knit:  (function (...) { source('myknit.r'); myknit(...) })
    

    It seems to have to be on one line. The reason for source('myknit.r') instead of just putting the function definition int he YAML is for portability. If I modify myknit.r I don't have to modify every document's YAML. This way, the only common YAML that all documents must repeat in their frontmatter is the knit line; all other common YAML can stay in common.yaml.

Then Ctrl+Shift+K works as I would hope from within Rstudio.

Further notes:

  • myknit could just be a system call to make if I had a makefile setup.
  • the injected YAML will be passed through rmarkdown and hence knitted, since it is injected before the call to render.
  • Preview window: so long as myknit produces a (single) message Output created: path/to/file.html, then the file will be shown in the preview window.

    I have found that there can be only one such message in the output [not multiple], or you get no preview window. So if you use render (which makes an "Output created: basename.extension") message and the final produced file is actually elsewhere, you will need to suppress this message via either render(..., quiet=T) or suppressMessages(render(...)) (the former suppresses knitr progress and pandoc output too), and create your own message with the correct path.

Pros:

  • the YAML frontmatter is knitted
  • much more control than option 1 if you need to do custom pre- / post-processing.

Cons:

  • a bit more effort than option 1
  • the knit: line must be duplicated in each document (though by source('./myknit.r') at least the function definition may be stored in one central location)

Here is the setup for posterity. For portability, you only need to carry around myknit.r and common.yaml. No .Rprofile or project-specific config needed.

example.rmd:

---
title: On the Culinary Preferences of Anthropomorphic Cats
knit:  (function (...) { source('myknit.r'); myknit(...) })
---

I do not like green eggs and ham. I do not like them, Sam I Am!

common.yaml [for example]:

author: me
date: "`r format (Sys.time(), format='%Y-%m-%d %H:%M:%S %z')`"
link-citations: true
reference-section-title: References

myknit.r:

myknit <- function (inputFile, encoding, yaml='common.yaml') {   
    # read in the YAML + src file
    yaml <- readLines(yaml)
    rmd <- readLines(inputFile)

    # insert the YAML in after the first ---
    # I'm assuming all my RMDs have properly-formed YAML and that the first
    # occurence of --- starts the YAML. You could do proper validation if you wanted.
    yamlHeader <- grep('^---$', rmd)[1]
    # put the yaml in
    rmd <- append(rmd, yaml, after=yamlHeader)

    # write out to a temp file
    ofile <- file.path(tempdir(), basename(inputFile))
    writeLines(rmd, ofile)

    # render with rmarkdown.
    message(ofile)
    ofile <- rmarkdown::render(ofile, encoding=encoding, envir=new.env())

    # copy back to the current directory.
    file.copy(ofile, file.path(dirname(inputFile), basename(ofile)), overwrite=T)
}

Pressing Ctrl+Shift+K/Knit from the editor of example.rmd will compile the result and show a preview. I know it is using common.yaml, because the result includes the date and author whereas example.rmd on its own does not have a date or author.