How can I combine two dataframes with different le

2019-07-15 02:14发布

问题:

I have two dataframes, like those:

and

I want to get something like that:

Here are the data frames for reproducibility:

df1 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento", "Servicios"), anualidad = rep(2014, 4), valor = c(10, 11, 12, 13))
df2 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento","Prestaciones", "Servicios"), anualidad = rep(2014, 5), valor = c(11, 20, 8, 9))

A dataframe in which missed positions are filled with 0, because in some cases I am not getting data frames with the same amount of rows, and in those cases rbind fails, and I get an error.

Which instruccion should use to combine those dataframes?

Thanks

PS: I know I can erase the repeated rows once the data frames are together.

回答1:

Try using left_join in the dplyr package.

library(dplyr)

# make fake data
df1 <- data.frame(id = c("A", "B", "C", "D", "E"), val = rpois(5, 5))
df2 <- data.frame(id = c("A", "B", "C", "E"), val = rpois(4, 20))

# use left_join
df3 <- left_join(df1, df2, by = "id")

# rename and set NAs to 0
names(df3) <- c("id", "val", "val")
df3[is.na(df3)] <- 0


回答2:

First of all it is not a good practice to have 2 variables with the same name or the same variable in two columns. It is best to have duplicate observations (e.g. Consmos twice in this case).

Based on that it is as simple as row binding or merging the two data frames:

df1 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento", "Servicios"), anualidad = rep(2014, 4), valor = c(10, 11, 12, 13))
df2 <- data.frame(descripcion_cuenta_N2 = c("Consumos", "Costes Personal", "Fungible Equipamiento", "Servicios"), anualidad = rep(2014, 4), valor = c(11, 20, 8, 9))
df <- merge(df1, df2, all = TRUE)

which gives:

  descripcion_cuenta_N2 anualidad valor
1              Consumos      2014    10
2              Consumos      2014    11
3       Costes Personal      2014    11
4       Costes Personal      2014    20
5 Fungible Equipamiento      2014     8
6 Fungible Equipamiento      2014    12
7             Servicios      2014     9
8             Servicios      2014    13

This way is better as stated above.

If you insist on what you're asking you simply specify the variables to use in the merging:

df <- merge(df1, df2, by = c("descripcion_cuenta_N2", "anualidad"))

which results in:

  descripcion_cuenta_N2 anualidad valor.x valor.y
1              Consumos      2014      10      11
2       Costes Personal      2014      11      20
3 Fungible Equipamiento      2014      12       8
4             Servicios      2014      13       9

PS: It would make it easier to answer if your data frames where given in R. That is make your question reproducible and easy to answer. See How to make a great R reproducible example?



回答3:

You probably want to use something like merge():

merge(df1, df2, by=c("descripcion_cuenta_N2", "anualidad"))

In SQL lingo, you are trying to join together the two tables on the descripcion_cuenta_N2 and anualidad columns (presumably both of them).