Extract string before “|” [duplicate]

2019-01-15 11:33发布

问题:

This question already has an answer here:

  • Remove part of string after “.” 3 answers

I have a data set wherein a column looks like this:

ABC|DEF|GHI,  
ABCD|EFG|HIJK,  
ABCDE|FGHI|JKL,  
DEF|GHIJ|KLM,  
GHI|JKLM|NO|PQRS,  
BCDE|FGHI|JKL  

.... and so on

I need to extract the characters that appear before the first | symbol.

In Excel, we would use a combination of MID-SEARCH or a LEFT-SEARCH, R contains substr().

The syntax is - substr(x, <start>,<stop>)

In my case, start will always be 1. For stop, we need to search by |. How can we achieve this? Are there alternate ways to do this?

回答1:

Another option word function of stringr package

library(stringr)
word(df1$V1,1,sep = "\\|")

Data

df1 <- read.table(text = "ABC|DEF|GHI,  
ABCD|EFG|HIJK,  
ABCDE|FGHI|JKL,  
DEF|GHIJ|KLM,  
GHI|JKLM|NO|PQRS,  
BCDE|FGHI|JKL")


回答2:

We can use sub

sub("\\|.*", "", str1)
#[1] "ABC"

Or with strsplit

strsplit(str1, "[|]")[[1]][1]
#[1] "ABC"

Update

If we use the data from @hrbrmstr

sub("\\|.*", "", df$V1)
#[1] "ABC"   "ABCD"  "ABCDE" "DEF"   "GHI"   "BCDE" 

These are all base R methods. No external packages used.

data

str1 <- "ABC|DEF|GHI ABCD|EFG|HIJK ABCDE|FGHI|JKL DEF|GHIJ|KLM GHI|JKLM|NO|PQRS BCDE|FGHI|JKL"


回答3:

with stringi:

library(stringi)

df <- read.table(text="ABC|DEF|GHI,1
ABCD|EFG|HIJK,2
ABCDE|FGHI|JKL,3  
DEF|GHIJ|KLM,4
GHI|JKLM|NO|PQRS,5
BCDE|FGHI|JKL,6", sep=",", header=FALSE, stringsAsFactors=FALSE)

stri_match_first_regex(df$V1, "(.*?)\\|")[,2]
## [1] "ABC"   "ABCD"  "ABCDE" "DEF"   "GHI"   "BCDE" 


标签: r extract substr