Replacing occurrences of a number in multiple colu

2019-01-18 03:02发布

问题:

ETA: the point of the below, by the way, is to not have to iterate through my entire set of column vectors, just in case that was a proposed solution (just do what is known to work once at a time).


There's plenty of examples of replacing values in a single vector of a data frame in R with some other value.

  • Replace a value in a data frame based on a conditional (if) statement in R
  • replace numbers in data frame column in r [duplicate]

And also how to replace all values of NA with something else:

  • How to replace all values in a data.frame with another ( not 0) value

What I'm looking for is analogous to the last question, but basically trying to replace one value with another. I'm having trouble generating a data frame of logical values mapped to my actual data frame for cases where multiple columns meet a criteria, or simply trying to do the actions from the first two questions on more than one column.

An example:

data <- data.frame(name = rep(letters[1:3], each = 3), var1 = rep(1:9), var2 = rep(3:5, each = 3))

data
  name var1 var2
1    a    1    3
2    a    2    3
3    a    3    3
4    b    4    4
5    b    5    4
6    b    6    4
7    c    7    5
8    c    8    5
9    c    9    5

And say I want all of the values of 4 in var1 and var2 to be 10.

I'm sure this is elementary and I'm just not thinking through it properly. I have been trying things like:

data[data[, 2:3] == 4, ]

That doesn't work, but if I do the same with data[, 2] instead of data[, 2:3], things work fine. It seems that logical test (like is.na()) work on multiple rows/columns, but that numerical comparisons aren't playing as nicely?

Thanks for any suggestions!

回答1:

you want to search through the whole data frame for any value that matches the value you're trying to replace. the same way you can run a logical test like replacing all missing values with 10..

data[ is.na( data ) ] <- 10

you can also replace all 4s with 10s.

data[ data == 4 ] <- 10

at least i think that's what you're after?

and let's say you wanted to ignore the first row (since it's all letters)

# identify which columns contain the values you might want to replace
data[ , 2:3 ]

# subset it with extended bracketing..
data[ , 2:3 ][ data[ , 2:3 ] == 4 ]
# ..those were the values you're going to replace

# now overwrite 'em with tens
data[ , 2:3 ][ data[ , 2:3 ] == 4 ] <- 10

# look at the final data
data


回答2:

Basically data[, 2:3]==4 gave you the index for data[,2:3] instead of data:

R > data[, 2:3] ==4
       var1  var2
 [1,] FALSE FALSE
 [2,] FALSE FALSE
 [3,] FALSE FALSE
 [4,]  TRUE  TRUE
 [5,] FALSE  TRUE
 [6,] FALSE  TRUE
 [7,] FALSE FALSE
 [8,] FALSE FALSE
 [9,] FALSE FALSE

So you may try this:

R > data[,2:3][data[, 2:3] ==4]
[1] 4 4 4 4


回答3:

Just to provide a different answer, I thought I would write up a vector-math approach:

You can create a transformation matrix (really a data frame here, but will work the same), using a the vectorized 'ifelse' statement and multiply the transformation matrix and your original data, like so:

df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
   .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns]
    return(.data_Frame)
}

To replace all values 4 with 10 in the data frame 'data' in columns 2 through 3, you would use the function like so:

# Either of these will work.  I'm just showing options.
df.Rep(data, 2:3, 4, 10)
df.Rep(data, c("var1","var2"), 4, 10)

#   name var1 var2
# 1    a    1    3
# 2    a    2    3
# 3    a    3    3
# 4    b   10   10
# 5    b    5   10
# 6    b    6   10
# 7    c    7    5
# 8    c    8    5
# 9    c    9    5


回答4:

Just for continuity

    data[,2:3][ data[,2:3] == 4 ] <- 10

But it looks ugly, So do it in 2 steps is better.