Conditional replacement of string with another str

2019-02-25 16:57发布

问题:

I have the following structure of the data with all variables being in the string format:

    v1      v2      c1      c2           c1c2
00035A  943567  00088E  63968E  00088E;63968E
00088E  63968E  00088E  63968E  00088E;63968E
00088E  925524  00088E  63968E  00088E;63968E
000361  237924  00088E  63968E  00088E;63968E
000361  83367A  00088E  63968E  00088E;63968E
00055X  49328R  00088E  63968E  00088E;63968E
00056N  87885Q  00088E  63968E  00088E;63968E
000794  69911G  00088E  63968E  00088E;63968E
23792A  001674  00088E  63968E  00088E;63968E
63968E  17275R  00088E  63968E  00088E;63968E

What I'd like to do is to replace the value of v1 with c1c2 if v1=c1 and v2 with c1c2 if v2=c2 by using some general command in R, i.e. which does not contain specific values of c1, c2, and c1c2.

Would be grateful for your help.

回答1:

There are several ways in which you can do this:

1: with ifelse statements in base R:

df$v1 <- ifelse(df$v1==df$c1, df$c1c2, df$v1)
df$v2 <- ifelse(df$v2==df$c2, df$c1c2, df$v2)

2: or with subsetting assignments:

df[df$v1==df$c1,"v1"] <- df[df$v1==df$c1,"c1c2"]
df[df$v2==df$c2,"v2"] <- df[df$v2==df$c2,"c1c2"]

3: or with the data.table package:

library(data.table)
setDT(df)[v1==c1, v1 := c1c2][v2==c2, v2 := c1c2]

each of these solutions gives the following result:

> df
               v1            v2     c1     c2          c1c2
 1:        00035A        943567 00088E 63968E 00088E;63968E
 2: 00088E;63968E 00088E;63968E 00088E 63968E 00088E;63968E
 3: 00088E;63968E        925524 00088E 63968E 00088E;63968E
 4:        000361        237924 00088E 63968E 00088E;63968E
 5:        000361        83367A 00088E 63968E 00088E;63968E
 6:        00055X        49328R 00088E 63968E 00088E;63968E
 7:        00056N        87885Q 00088E 63968E 00088E;63968E
 8:        000794        69911G 00088E 63968E 00088E;63968E
 9:        23792A        001674 00088E 63968E 00088E;63968E
10:        63968E        17275R 00088E 63968E 00088E;63968E


回答2:

There is also an alternative approach using update in a self-join

library(data.table)
#coerce to data.table
setDT(df)[
  # 1st self join & update
  df, on = .(v1 = c1), v1 := c1c2][
    # 2nd slef join & update
    df, on = .(v2 = c2), v2 := c1c2][]
               v1            v2     c1     c2          c1c2
 1:        00035A        943567 00088E 63968E 00088E;63968E
 2: 00088E;63968E 00088E;63968E 00088E 63968E 00088E;63968E
 3: 00088E;63968E        925524 00088E 63968E 00088E;63968E
 4:        000361        237924 00088E 63968E 00088E;63968E
 5:        000361        83367A 00088E 63968E 00088E;63968E
 6:        00055X        49328R 00088E 63968E 00088E;63968E
 7:        00056N        87885Q 00088E 63968E 00088E;63968E
 8:        000794        69911G 00088E 63968E 00088E;63968E
 9:        23792A        001674 00088E 63968E 00088E;63968E
10:        63968E        17275R 00088E 63968E 00088E;63968E

Caveat



标签: r replace