I have two dataframes, like those:
and
I want to get something like that:
Here are the data frames for reproducibility:
df1 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento", "Servicios"), anualidad = rep(2014, 4), valor = c(10, 11, 12, 13))
df2 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento","Prestaciones", "Servicios"), anualidad = rep(2014, 5), valor = c(11, 20, 8, 9))
A dataframe in which missed positions are filled with 0, because in some cases I am not getting data frames with the same amount of rows, and in those cases rbind fails, and I get an error.
Which instruccion should use to combine those dataframes?
Thanks
PS: I know I can erase the repeated rows once the data frames are together.
Try using left_join in the dplyr package.
library(dplyr)
# make fake data
df1 <- data.frame(id = c("A", "B", "C", "D", "E"), val = rpois(5, 5))
df2 <- data.frame(id = c("A", "B", "C", "E"), val = rpois(4, 20))
# use left_join
df3 <- left_join(df1, df2, by = "id")
# rename and set NAs to 0
names(df3) <- c("id", "val", "val")
df3[is.na(df3)] <- 0
First of all it is not a good practice to have 2 variables with the same name or the same variable in two columns. It is best to have duplicate observations (e.g. Consmos twice in this case).
Based on that it is as simple as row binding or merging the two data frames:
df1 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento", "Servicios"), anualidad = rep(2014, 4), valor = c(10, 11, 12, 13))
df2 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento", "Servicios"), anualidad = rep(2014, 4), valor = c(11, 20, 8, 9))
df <- merge(df1, df2, all = TRUE)
which gives:
descripcion_cuenta_N2 anualidad valor
1 Consumos 2014 10
2 Consumos 2014 11
3 Costes Personal 2014 11
4 Costes Personal 2014 20
5 Fungible Equipamiento 2014 8
6 Fungible Equipamiento 2014 12
7 Servicios 2014 9
8 Servicios 2014 13
This way is better as stated above.
If you insist on what you're asking you simply specify the variables to use in the merging:
df <- merge(df1, df2, by = c("descripcion_cuenta_N2", "anualidad"))
which results in:
descripcion_cuenta_N2 anualidad valor.x valor.y
1 Consumos 2014 10 11
2 Costes Personal 2014 11 20
3 Fungible Equipamiento 2014 12 8
4 Servicios 2014 13 9
PS: It would make it easier to answer if your data frames where given in R. That is make your question reproducible and easy to answer. See How to make a great R reproducible example?
You probably want to use something like merge()
:
merge(df1, df2, by=c("descripcion_cuenta_N2", "anualidad"))
In SQL lingo, you are trying to join together the two tables on the descripcion_cuenta_N2
and anualidad
columns (presumably both of them).