Selecting rows on basis of the string length of tw

2019-04-13 02:24发布

站内文章 / 前端开发

15 0

祖国的老花朵

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want to select the rows of a data frame in which the length of the string in the column v3 is equal to the length of the string of the column v4. My dataframe 'df' looks like:

    v1  v2  v3  v4
1   456 .   C   T
2   462 .   C   T
3   497 .   C   T
4   499 .   GC  AC
5   499 .   GC  G
6   499 .   GC  CC
7   513 .   GCACA   GCA
8   513 .   GCACA   GCACACA
9   513 .   GCACA   ACACA
10  513 .   GCACA   GCACACACA
11  513 .   GCACA   GCACACACACA

12  513 .   GCACA   GACCACA
13  513 .   GCACA   G
14  521 .   ACN A
15  522 .   CNN C

The output should be:

v1  v2  v3  v4
1   456 .   C   T
2   462 .   C   T
3   497 .   C   T
4   499 .   GC  AC
9   513 .   GCACA   ACACA

I have tried:
new_df = df[nchar(str_sub(df$v3))==nchar(str_sub(df$v4))]

回答1:

@agstudy got the most important part. I would add that str_sub (from the stringr package I assume) is not doing anything useful here. Last, you could use subset to avoid the repetitive use of df$. So you can do: