I want to select the rows of a data frame in which the length of the string in the column v3 is equal to the length of the string of the column v4. My dataframe 'df' looks like:
v1 v2 v3 v4
1 456 . C T
2 462 . C T
3 497 . C T
4 499 . GC AC
5 499 . GC G
6 499 . GC CC
7 513 . GCACA GCA
8 513 . GCACA GCACACA
9 513 . GCACA ACACA
10 513 . GCACA GCACACACA
11 513 . GCACA GCACACACACA
12 513 . GCACA GACCACA
13 513 . GCACA G
14 521 . ACN A
15 522 . CNN C
The output should be:
v1 v2 v3 v4
1 456 . C T
2 462 . C T
3 497 . C T
4 499 . GC AC
9 513 . GCACA ACACA
I have tried:
new_df = df[nchar(str_sub(df$v3))==nchar(str_sub(df$v4))]
@agstudy got the most important part. I would add that
str_sub
(from thestringr
package I assume) is not doing anything useful here. Last, you could usesubset
to avoid the repetitive use ofdf$
. So you can do:or