I have a bunch of data frames with different variables. I want to read them into R and add columns to those that are short of a few variables so that they all have a common set of standard variables, even if some are unobserved.
In other words... Is there a way to add columns of NA
in the tidyverse when a column does not exist? My current attempt works for adding new variables where the column doesn't exist (top_speed
) but fails when the column already exists (mpg
) (it sets all observations to the first value, Mazda RX4
).
library(tidyverse)
mtcars %>%
tbl_df() %>%
rownames_to_column("car") %>%
mutate(top_speed = ifelse("top_speed" %in% names(.), top_speed, NA),
mpg = ifelse("mpg" %in% names(.), mpg, NA)) %>%
select(car, top_speed, mpg, everything())
# # A tibble: 32 x 13
# car top_speed mpg cyl disp hp drat wt qsec vs am gear carb
# <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Mazda RX4 NA 21 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2 Mazda RX4 Wag NA 21 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 3 Datsun 710 NA 21 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 4 Hornet 4 Drive NA 21 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# 5 Hornet Sportabout NA 21 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# 6 Valiant NA 21 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# 7 Duster 360 NA 21 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# 8 Merc 240D NA 21 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# 9 Merc 230 NA 21 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# 10 Merc 280 NA 21 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Try the following,
I think the ifelse() doesn't inherit the class from the object.
You can use the
rowwise
function like this :If you already have a dataframe with all the required columns, say
then you can simply
bind_rows
filtering out all the rows:Note that missing columns will take the type from
df_with_required_columns
.Another option that does not require creating a helper function (or an already complete data.frame) using tibble's
add_column
:You can bind columns of the new data.frame with a fake complete data.frame filled with NA, rename the duplicated columns, and then filter only the original names.
If you had an empty dataframe that contains all the names to check for, you can use
bind_rows
to add columns.I used
purrr:map_dfr
to make the emptytibble
with the appropriate column names.