tolower function and merging two dataframes

2019-09-02 03:20发布

I have 3 dataframes called respectively: barometre2013, barometre2016, barometre2018.

I've already merge barometre2018 and barometre2016 like this:

baro1618 <- merge(barometre2016, barometre2018, all = TRUE)

All was good, I have all rows of the two dataframes and the columns names that are the same are merged in one with all rows of the tow dataframes. Exactly what I wanted.

The merged table looks like this:

names(baro1618)
    [1] "q0qc"           "regio"          "sexe"           "age"            "langu"          "q1a_1"          "q1a_2"          "q1a_3"          "q1a_4"          "q1a_5"         
    [11] "q1a_6"          "q1a_7"          "q1a_8"          "q1a_9"          "q1a_10"         "q1b_1"          "q1b_2"          "q1b_3"          "q1b_4"          "q1b_5"         
    [21] "q1b_6"          "q1b_7"          "q1b_8"          "q1b_9"          "q1b_10"

NOW, my problem start here.

I want to merge baro1618 with barometre2013, but before doing that I have to lower case all the columns names because when I tried to merge without doing this, the columns in uppercase of barometre2013 that have the same name in lower case baro1618 weren't merged.

The df barometre2013 looks like this:

names(barometre2013)
    [229] "POND"        "Q1A_1"       "Q1A_2"       "Q1A_3"       "Q1A_4"       "Q1A_5"       "Q1A_6"       "Q1A_7"       "Q1A_8"       "Q1A_9"       "Q1A_10"      "Q1B_1"      
    [241] "Q1B_2"       "Q1B_3"       "Q1B_4"       "Q1B_5"       "Q1B_6"       "Q1B_7"       "Q1B_8"       "Q1B_9"       "Q1B_10"      "Q5A_1"       "Q5A_2"       "Q5A_3"  

So I've tried this two solutions to lower case (both works):

barometre2013 <- setnames(barometre2013, tolower(names(barometre2013)))

colnames(barometre2013) <- tolower(colnames(barometre2013))

The result:

[229] "pond"        "q1a_1"       "q1a_2"       "q1a_3"       "q1a_4"       "q1a_5"       "q1a_6"       "q1a_7"       "q1a_8"       "q1a_9"       "q1a_10"      "q1b_1"      
[241] "q1b_2"       "q1b_3"       "q1b_4"       "q1b_5"       "q1b_6"       "q1b_7"       "q1b_8"       "q1b_9"       "q1b_10"      "q5a_1"       "q5a_2"       "q5a_3"  

BUT, when I've tried to merge like this :

baro1118 <- merge(baro1618, barometre2013, all = TRUE)

It give me this error :

Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column

I don't understand why it was working in the first example and not in this second one. I can't specify any columns because I have TOO much name columns that match and a lot that do not match.

It should be possible not to specify right ?

Also, I want to keep all the columns names that match and the ones that don't match of both df.

Sorry for this long explanation, but I really need answer and I've read a lot of Q/A on SO and didn't find my answer.

1条回答
仙女界的扛把子
2楼-- · 2019-09-02 03:59

Maybe worth a try:

baro1118 <- merge(baro1618, barometre2013, all = TRUE, by=intersect(names(baro1618), names(barometre2013))

This merges only by common columns.

That being said, your hunch of using rbind for this is probably more correct. If this is data from differentt time periods, and they don't overlap, rbind will simply stack one on top of the other. This doesn't always go smoothly, but here's a crude hack:

# maybe barometre2013 has missing column names
missing.column.names <- setdiff(names(baro1618), names(barometre2013))
barometre2013[, missing.column.names] <- NA

# maybe baro1618 has missing column names
missing.column.names <- setdiff(names(barometre2013), names(baro1618))
baro1618[, missing.column.names] <- NA
查看更多
登录 后发表回答