In R, some packages (e.g. haven
) insert a label
attributes to variables (e.g. haven
), which explains the substantive name of the variable. For example, gdppc
may have the label GDP per capita
.
This is extremely useful, especially when importing data from Stata. However, I still struggle to know how to use this in my workflow.
How to quickly browse the variable and the variable label? Right now I have to do
attributes(df$var)
, but this is hardly convenient to get a glimpse (a lanames(df)
)How to use these labels in plots? Again, I can use
attr(df$var, "label")
to access the string label. However, it seems cumbersome.
Is there any official way to use these labels in a workflow? I can certainly write a custom function that wraps around the attr
, but it may break in the future when packages implement the label
attribute differently. Thus, ideally I'd want an official way supported by haven
(or other major packages).
A solution with purrr package from tidyverse:
A simple solution with the labelled package (tidyverse)
This is one of the innovations addressed in rio (full disclosure: I wrote this package). Basically, it provides various ways of importing variable labels, including haven's way of doing things and foreign's. Here's a trivial example:
Start by making a reproducible example:
Import using
foreign::read.dta()
(viario::import()
):Read in using
haven::read_dta()
using its native variable attributes because the attributes are stored at the data.frame level rather than the variable level:Read in using
haven::read_dta()
using an alternative that we (the rio developers) have found more convenient:By moving the attributes to be at the level of the data.frame, they're much easier to access using
attr(data, "label.var")
, etc. rather than digging through each variable's attributes.Note: the values of attributes will be NULL because I'm just writing a native R dataset to a local file in order to make this reproducible.
Using sapply in a simple function to return a variable list as in Stata's Variable Window:
The purpose of the labelled package is to provide convenient functions to manipulate variable and value labels as imported with
haven
.In addition, the functions
lookfor
anddescribe
from thequestionr
package are also useful to display variable and value labels.