Selecting rows on basis of the string length of tw

2019-04-13 02:24发布

问题:

I want to select the rows of a data frame in which the length of the string in the column v3 is equal to the length of the string of the column v4. My dataframe 'df' looks like:

    v1  v2  v3  v4
1   456 .   C   T
2   462 .   C   T
3   497 .   C   T
4   499 .   GC  AC
5   499 .   GC  G
6   499 .   GC  CC
7   513 .   GCACA   GCA
8   513 .   GCACA   GCACACA
9   513 .   GCACA   ACACA
10  513 .   GCACA   GCACACACA
11  513 .   GCACA   GCACACACACA

12  513 .   GCACA   GACCACA
13  513 .   GCACA   G
14  521 .   ACN A
15  522 .   CNN C

The output should be:

v1  v2  v3  v4
1   456 .   C   T
2   462 .   C   T
3   497 .   C   T
4   499 .   GC  AC
9   513 .   GCACA   ACACA

I have tried:
new_df = df[nchar(str_sub(df$v3))==nchar(str_sub(df$v4))]

回答1:

@agstudy got the most important part. I would add that str_sub (from the stringr package I assume) is not doing anything useful here. Last, you could use subset to avoid the repetitive use of df$. So you can do:

df[nchar(df$v3) == nchar(df$v4), ]

or

subset(df, nchar(v3) == nchar(v4))