Dictionary style replace multiple items-第2页回答

I have a large data.frame of character data that I want to convert based on what is commonly called a dictionary in other languages.

Currently I am going about it like so:

foo <- data.frame(snp1 = c("AA", "AG", "AA", "AA"), snp2 = c("AA", "AT", "AG", "AA"), snp3 = c(NA, "GG", "GG", "GC"), stringsAsFactors=FALSE)
foo <- replace(foo, foo == "AA", "0101")
foo <- replace(foo, foo == "AC", "0102")
foo <- replace(foo, foo == "AG", "0103")

This works fine, but it is obviously not pretty and seems silly to repeat the replace statement each time I want to replace one item in the data.frame.

Is there a better way to do this since I have a dictionary of approximately 25 key/value pairs?

标签： r dataframe bioinformatics

8条回答

公子世无双

2楼-- · 2019-01-02 21:34

Here is a quick solution

dict = list(AA = '0101', AC = '0102', AG = '0103')
foo2 = foo
for (i in 1:3){foo2 <- replace(foo2, foo2 == names(dict[i]), dict[i])}

0人赞添加讨论(0) 举报

路过你的时光

3楼-- · 2019-01-02 21:37

Note this answer started as an attempt to solve the much simpler problem posted in How to replace all values in data frame with a vector of values?. Unfortunately, this question was closed as duplicate of the actual question. So, I'll try to suggest a solution based on replacing factor levels for both cases, here.

In case there is only a vector (or one data frame column) whose values need to be replaced and there are no objections to use factor we can coerce the vector to factor and change the factor levels as required:

x <- c(1, 1, 4, 4, 5, 5, 1, 1, 2)
x <- factor(x)
x
#[1] 1 1 4 4 5 5 1 1 2
#Levels: 1 2 4 5
replacement_vec <- c("A", "T", "C", "G")
levels(x) <- replacement_vec
x
#[1] A A C C G G A A T
#Levels: A T C G

Using the forcatspackage this can be done in a one-liner:

x <- c(1, 1, 4, 4, 5, 5, 1, 1, 2)
forcats::lvls_revalue(factor(x), replacement_vec)
#[1] A A C C G G A A T
#Levels: A T C G

In case all values of multiple columns of a data frame need to be replaced, the approach can be extended.

foo <- data.frame(snp1 = c("AA", "AG", "AA", "AA"), 
                  snp2 = c("AA", "AT", "AG", "AA"), 
                  snp3 = c(NA, "GG", "GG", "GC"), 
                  stringsAsFactors=FALSE)

level_vec <- c("AA", "AC", "AG", "AT", "GC", "GG")
replacement_vec <- c("0101", "0102", "0103", "0104", "0302", "0303")
foo[] <- lapply(foo, function(x) forcats::lvls_revalue(factor(x, levels = level_vec), 
                                                       replacement_vec))
foo
#  snp1 snp2 snp3
#1 0101 0101 <NA>
#2 0103 0104 0303
#3 0101 0103 0303
#4 0101 0101 0302

Note that level_vec and replacement_vec must have equal lengths.

More importantly, level_vec should be complete , i.e., include all possible values in the affected columns of the original data frame. (Use unique(sort(unlist(foo))) to verify). Otherwise, any missing values will be coerced to <NA>. Note that this is also a requirement for Martin Morgans's answer.

So, if there are only a few different values to be replaced you will be probably better off with one of the other answers, e.g., Ramnath's.

0人赞添加讨论(0) 举报

上一页 1 2

Dictionary style replace multiple items

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间