tolower function and merging two dataframes

I have 3 dataframes called respectively: barometre2013, barometre2016, barometre2018.

I've already merge barometre2018 and barometre2016 like this:

baro1618 <- merge(barometre2016, barometre2018, all = TRUE)

All was good, I have all rows of the two dataframes and the columns names that are the same are merged in one with all rows of the tow dataframes. Exactly what I wanted.

The merged table looks like this:

names(baro1618)
    [1] "q0qc"           "regio"          "sexe"           "age"            "langu"          "q1a_1"          "q1a_2"          "q1a_3"          "q1a_4"          "q1a_5"         
    [11] "q1a_6"          "q1a_7"          "q1a_8"          "q1a_9"          "q1a_10"         "q1b_1"          "q1b_2"          "q1b_3"          "q1b_4"          "q1b_5"         
    [21] "q1b_6"          "q1b_7"          "q1b_8"          "q1b_9"          "q1b_10"

NOW, my problem start here.

I want to merge baro1618 with barometre2013, but before doing that I have to lower case all the columns names because when I tried to merge without doing this, the columns in uppercase of barometre2013 that have the same name in lower case baro1618 weren't merged.

The df barometre2013 looks like this:

names(barometre2013)
    [229] "POND"        "Q1A_1"       "Q1A_2"       "Q1A_3"       "Q1A_4"       "Q1A_5"       "Q1A_6"       "Q1A_7"       "Q1A_8"       "Q1A_9"       "Q1A_10"      "Q1B_1"      
    [241] "Q1B_2"       "Q1B_3"       "Q1B_4"       "Q1B_5"       "Q1B_6"       "Q1B_7"       "Q1B_8"       "Q1B_9"       "Q1B_10"      "Q5A_1"       "Q5A_2"       "Q5A_3"

So I've tried this two solutions to lower case (both works):

barometre2013 <- setnames(barometre2013, tolower(names(barometre2013)))

colnames(barometre2013) <- tolower(colnames(barometre2013))

The result:

[229] "pond"        "q1a_1"       "q1a_2"       "q1a_3"       "q1a_4"       "q1a_5"       "q1a_6"       "q1a_7"       "q1a_8"       "q1a_9"       "q1a_10"      "q1b_1"      
[241] "q1b_2"       "q1b_3"       "q1b_4"       "q1b_5"       "q1b_6"       "q1b_7"       "q1b_8"       "q1b_9"       "q1b_10"      "q5a_1"       "q5a_2"       "q5a_3"

BUT, when I've tried to merge like this :

baro1118 <- merge(baro1618, barometre2013, all = TRUE)

It give me this error :

Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column

I don't understand why it was working in the first example and not in this second one. I can't specify any columns because I have TOO much name columns that match and a lot that do not match.

It should be possible not to specify right ?

Also, I want to keep all the columns names that match and the ones that don't match of both df.

Sorry for this long explanation, but I really need answer and I've read a lot of Q/A on SO and didn't find my answer.

标签： r merge tolower merging-data

1条回答

仙女界的扛把子

2楼-- · 2019-09-02 03:59

Maybe worth a try:

baro1118 <- merge(baro1618, barometre2013, all = TRUE, by=intersect(names(baro1618), names(barometre2013))

This merges only by common columns.

That being said, your hunch of using rbind for this is probably more correct. If this is data from differentt time periods, and they don't overlap, rbind will simply stack one on top of the other. This doesn't always go smoothly, but here's a crude hack:

# maybe barometre2013 has missing column names
missing.column.names <- setdiff(names(baro1618), names(barometre2013))
barometre2013[, missing.column.names] <- NA

# maybe baro1618 has missing column names
missing.column.names <- setdiff(names(barometre2013), names(baro1618))
baro1618[, missing.column.names] <- NA

0人赞添加讨论(0) 举报

tolower function and merging two dataframes

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间