Remove part of string after “.”

2019-01-02 18:35发布

I am working with NCBI Reference Sequence accession numbers like variable a:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")  

To get information from the biomart package I need to remove the .1, .2 etc. after the accession numbers. I normally do this with this code:

b <- sub("..*", "", a)

# [1] "" "" "" "" "" ""

But as you can see, this isn't the correct way for this variable. Can anyone help me with this?

3条回答
姐姐魅力值爆表
2楼-- · 2019-01-02 18:51

You just need to escape the period:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")

gsub("\\..*","",a)
[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155" 
查看更多
泪湿衣
3楼-- · 2019-01-02 19:05

We can pretend they are filenames and remove extensions:

tools::file_path_sans_ext(a)
# [1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"
查看更多
残风、尘缘若梦
4楼-- · 2019-01-02 19:11

You could do:

sub("*\\.[0-9]", "", a)

or

library(stringr)
str_sub(a, start=1, end=-3)
查看更多
登录 后发表回答