I try to produce one PDF report split on sections based on a grouping variable, using brew
and knitr
. My grouping variable may contain special characters (umlauts), such as å æ ø.
Umlauts in the document title only are handled fine with \usepackage[utf8]{inputenc}
(see examples below). However, umlauts in the grouping variable generate an error with \usepackage[utf8]{inputenc}
.
On the other hand, when I tried \usepackage[T1]{fontenc}
, umlauts in the grouping variable are handled properly. But now the title is not correctly encoded.
I am struggling to get encoding right in both title and grouping variable.
Here is an example where I try to produce one PDF report with subsections of summary statistics per species in the iris dataset. I hope it may illustrate my problem.
R code to prepare data without umlauts
library(plyr)
library(xtable)
library(knitr)
library(brew)
library(stringr)
Create a summary table for each species in the built-in iris
dataset. First, use the original Species
names, without umlauts. Umlaut in document \title
only (see code for the .rnw
template file). Store summary tables in a list.
data(iris)
iris_tbl <- dlply(.data = iris, .variables = .(Species), function(x) xtable(summary(x)))
Define function brew_knit_pdf
. The function brews a template latex file xxx.rnw
to a new .rnw
file xxx_out.rnw
, which has one section for each item/group that is looped over. The xxx_out.rnw
from brew
is then used as an input file in knit2pdf
and is converted to a PDF.
brew_knit_pdf <- function(template, ...){
brew_out <- str_replace(string = template, pattern = ".rnw", replacement = "_out.rnw")
brew(file = template, output = brew_out)
knit2pdf(input = brew_out, ...)
}
brew_knit_pdf("iris_umlaut_tbl.rnw")
Code for the .rnw template file
In my example, I have named the template file for the following code iris_umlaut_tbl.rnw
. This file is used as input in the brew_knit_pdf
function in the R script.
\documentclass{article}
% \usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{geometry}
\geometry{tmargin=2.5cm,bmargin=2.5cm,lmargin=2.5cm,rmargin=2.5cm}
\begin{document}
\begin{titlepage}
\title{Using brew and knitr to produce one PDF report split by a grouping variable.\\Problem with å æ ø in grouping variable}
\clearpage\maketitle
\thispagestyle{empty}
\tableofcontents
\end{titlepage}
\newpage
\section{Summary statistics for each species}
% R code loop wrapped in brew syntax, which brews the template file xxx.rnw to a new .rnw file xxx_out.rnw, which has one section for each group that is looped over, i.e. the names of the list iris_tbl produced in the R script.
<% for (Sp in names(iris_tbl)) { -%>
\subsection{<%= Sp %>}
<<sum-<%= Sp %>, echo=FALSE, results='asis'>>=
print(iris_tbl[["<%= Sp %>"]])
@
\newpage
<% } %>
\end{document}
R code to prepare data with umlauts
To mimic my real data, I replace Species names in the iris data with (non-sensical) names than contains umlauts.
data(iris)
iris$Species <- as.character(iris$Species)
iris$Species[iris$Species == "setosa"] <- "åsetosa"
iris$Species[iris$Species == "versicolor"] <- "æversicolor"
iris$Species[iris$Species == "virginica"] <- "øvirginica"
# create a summary table for each species
iris_tbl <- dlply(.data = iris, .variables = .(Species), function(x) xtable(summary(x)))
When the 'umlaut version' of iris_tbl has been prepared, I run the brew_knit_pdf function on the same .rnw file as above, except that I use different encoding packages (inputenc and/or fontenc).
Result
Here is a summary of four attempts so far; using datasets without or with umlauts, and using different encoding packages in the .rnw file.
- The R data: iris_tbl prepared with non-umlaut Species
- The .rnw file: umlauts in
\title{ }
,\usepackage[utf8]{inputenc}
Output umlauts in title OK
- The R data: iris_tbl prepared with umlaut version of Species
- The .rnw file: umlauts in
\title{ }
,\usepackage[utf8]{inputenc}
Output
Error: running 'texi2dvi' on 'iris_umlaut_tbl_out.tex' failed LaTeX errors: ...Package inputenc Error: Unicode char \u8:æve not set up for use with LaTeX.
- The R data: iris_tbl prepared with umlaut version of Species
- The .rnw file:
umlauts in \title{ }
,\usepackage[T1]{fontenc}
,\usepackage[utf8]{inputenc}
Output
Error: running 'texi2dvi' on 'iris_umlaut_tbl_out.tex' failed LaTeX errors: ...Package inputenc Error: Unicode char \u8:æve not set up for use with LaTeX.
- The R data: iris_tbl prepared with umlaut version of Species
- The .rnw file: umlauts in
\title{ }
,\usepackage[T1]{fontenc}
Output
umlauts in title not OK, umlauts in grouping variable OK
Can anyone point me in the right direction to get the encoding right in both title and grouping variable? Thanks a lot in advance for taking your time.
Session info
Default text encoding in my R Studio 0.97.336: UTF-8
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Norwegian (Bokmål)_Norway.1252 LC_CTYPE=Norwegian (Bokmål)_Norway.1252
[3] LC_MONETARY=Norwegian (Bokmål)_Norway.1252 LC_NUMERIC=C
[5] LC_TIME=Norwegian (Bokmål)_Norway.1252
attached base packages:
[1] splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] Hmisc_3.10-1 survival_2.37-4 pastecs_1.3-13 boot_1.3-9
[5] pspline_1.0-15 ggplot2_0.9.3.1 lubridate_1.2.0 stringr_0.6.2
[9] brew_1.0-6 knitr_1.1 xtable_1.7-1 plyr_1.8
[13] PerformanceAnalytics_1.1.0 xts_0.9-3 zoo_1.7-9 gdata_2.12.0.2
loaded via a namespace (and not attached):
[1] cluster_1.14.4 colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 evaluate_0.4.3 formatR_0.7
[7] grid_3.0.0 gtable_0.1.2 gtools_2.7.1 labeling_0.1 lattice_0.20-15 MASS_7.3-26
[13] memoise_0.1 munsell_0.4 proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2 scales_0.2.3
[19] tools_3.0.0
> getOption("encoding")
[1] "native.enc"
Update:
I am very grateful for an 'off-SO' input from the brew package maintainer Jeffrey Horner. He had no encoding problems when running my script with Ubuntu and command-line R. This gave me some renewed hope. I have no opportunity to run Ubuntu myself, but today I updated RStudio (0.97.449) and set the default encoding to ISO8859-1 (thanks Yihui!). Now the special characters are encoded correctly both in the title and in the grouping variable with \usepackage[latin1]{inputenc}
in the .rnw file. Also \usepackage[ansinew]{inputenc}
works. I am not sure what went wrong in my original attempt. Possibly RStudio did not apply the default encoding set in Options, which I changed following Yihui's advice, to the script files when I re-opened them. But that's just a speculation.
Since you are using
UTF-8
, which is not the native encoding of your OS, you need to explicitly tellknitr
the encoding of your input document. For example, you have to callBut I'm not sure if
brew
can handle non-native character encodings. If not, I suggest you use your system default encoding (should beISO8859-1
in this case), andOr do everything in
knitr
if you have to useUTF-8
(this also enables you to click the button to compile the document); see 075-knit-expand.Rnw for an example.