Truncating the end of a string in R after a charac

I have the following data:

temp<-c("AIR BAGS:FRONTAL" ,"SERVICE BRAKES HYDRAULIC:ANTILOCK",
    "PARKING BRAKE:CONVENTIONAL",
    "SEATS:FRONT ASSEMBLY:POWER ADJUST",
    "POWER TRAIN:AUTOMATIC TRANSMISSION",
    "SUSPENSION",
    "ENGINE AND ENGINE COOLING:ENGINE",
    "SERVICE BRAKES HYDRAULIC:ANTILOCK",
    "SUSPENSION:FRONT",
    "ENGINE AND ENGINE COOLING:ENGINE",
    "VISIBILITY:WINDSHIELD WIPER/WASHER:LINKAGES")

I would like to create a new vector that retains only the text before the first ":" in the cases where a ":" is present, and the whole word when ":" is not present.

I have tried to use:

temp=data.frame(matrix(unlist(str_split(temp,pattern=":",n=2)), 
+                        ncol=2, byrow=TRUE))

but it does not work in the cases where there is no ":"

I know this question is very similar to: truncate string from a certain character in R, which used:

sub("^[^.]*", "", x)

But I am not very familiar with regular expressions and have struggled to reverse that example to retain only the beginning of the string.

标签： string r truncate

5条回答

傲

2楼-- · 2019-02-06 23:27

sorry to add this as an answer. In response to times taken:

> yy<-rep("foo1:bar1",times=100000)
> system.time(yy1<-sapply(strsplit(yy,":"),'[',1))
   user  system elapsed 
   0.26    0.00    0.27 
> 
> system.time(yy2<-sub("(.*?):.*", "\\1", yy))
   user  system elapsed 
    0.1     0.0     0.1 
> 
> system.time(yy3 <- sub(":.*$", "", yy ))
   user  system elapsed 
   0.08    0.00    0.07 
> 
> system.time(yy4<-gsub("([^:]*).*","\\1",yy))
   user  system elapsed 
   0.09    0.00    0.09

The regex are roughly equivalent the strsplit takes a bit longer

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

3楼-- · 2019-02-06 23:33

You can solve this with a simple regex:

sub("(.*?):.*", "\\1", x)
 [1] "AIR BAGS"                  "SERVICE BRAKES HYDRAULIC"  "PARKING BRAKE"             "SEATS"                    
 [5] "POWER TRAIN"               "SUSPENSION"                "ENGINE AND ENGINE COOLING" "SERVICE BRAKES HYDRAULIC" 
 [9] "SUSPENSION"                "ENGINE AND ENGINE COOLING" "VISIBILITY"

How the regex works:

"(.*?):.*" Look for a repeated set of any characters .* but modify it with ? to not be greedy. This should be followed by a colon and then any character (repeated)
Substitute the entire string with the bit found inside the parentheses - "\\1"

The bit to understand is that any regex match is greedy by default. By modifying it to be non-greedy, the first pattern match can not include the colon, since the first character after the parentheses is a colon. The regex after the colon is back to the default, i.e. greedy.

0人赞添加讨论(0) 举报

手持菜刀，她持情操

4楼-- · 2019-02-06 23:34

Another approach is to look for the first ":" and replace it and anything after it with nothing:

yy <- sub(":.*$", "", yy )

If no ":" is found then nothing is substituted and you get the whole of the original string. If there is a ":" then the first one is matched along with everything after it, this is then replace with nothing ("") which deletes it and leaves everything up to that first colon.

0人赞添加讨论(0) 举报

迷人小祖宗

5楼-- · 2019-02-06 23:38

in this case

yy<-c("AIR BAGS:FRONTAL",
"SERVICE BRAKES HYDRAULIC:ANTILOCK",
"PARKING BRAKE:CONVENTIONAL",
"SEATS:FRONT ASSEMBLY:POWER ADJUST",
"POWER TRAIN:AUTOMATIC TRANSMISSION",
"SUSPENSION",
"ENGINE AND ENGINE COOLING:ENGINE",
"SERVICE BRAKES HYDRAULIC:ANTILOCK",
"SUSPENSION:FRONT",
"ENGINE AND ENGINE COOLING:ENGINE",
"VISIBILITY:WINDSHIELD WIPER/WASHER:LINKAGES")
yy<-gsub("([^:]*).*","\\1",yy)
yy

may work for you

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

6楼-- · 2019-02-06 23:41

Does this work (assuming your data is in a character vector):

x <- c('foobar','foo:bar','foo1:bar1 foo:bar','foo bar')
> sapply(str_split(x,":"),'[',1)
[1] "foobar"  "foo"     "foo1"    "foo bar"

0人赞添加讨论(0) 举报

Truncating the end of a string in R after a charac

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间