This question already has an answer here:
-
Remove part of string after “.”
3 answers
I have a data set wherein a column looks like this:
ABC|DEF|GHI,
ABCD|EFG|HIJK,
ABCDE|FGHI|JKL,
DEF|GHIJ|KLM,
GHI|JKLM|NO|PQRS,
BCDE|FGHI|JKL
.... and so on
I need to extract the characters that appear before the first |
symbol.
In Excel, we would use a combination of MID-SEARCH or a LEFT-SEARCH, R contains substr()
.
The syntax is - substr(x, <start>,<stop>)
In my case, start will always be 1. For stop, we need to search by |
. How can we achieve this? Are there alternate ways to do this?
Another option word
function of stringr
package
library(stringr)
word(df1$V1,1,sep = "\\|")
Data
df1 <- read.table(text = "ABC|DEF|GHI,
ABCD|EFG|HIJK,
ABCDE|FGHI|JKL,
DEF|GHIJ|KLM,
GHI|JKLM|NO|PQRS,
BCDE|FGHI|JKL")
We can use sub
sub("\\|.*", "", str1)
#[1] "ABC"
Or with strsplit
strsplit(str1, "[|]")[[1]][1]
#[1] "ABC"
Update
If we use the data from @hrbrmstr
sub("\\|.*", "", df$V1)
#[1] "ABC" "ABCD" "ABCDE" "DEF" "GHI" "BCDE"
These are all base R methods. No external packages used.
data
str1 <- "ABC|DEF|GHI ABCD|EFG|HIJK ABCDE|FGHI|JKL DEF|GHIJ|KLM GHI|JKLM|NO|PQRS BCDE|FGHI|JKL"
with stringi
:
library(stringi)
df <- read.table(text="ABC|DEF|GHI,1
ABCD|EFG|HIJK,2
ABCDE|FGHI|JKL,3
DEF|GHIJ|KLM,4
GHI|JKLM|NO|PQRS,5
BCDE|FGHI|JKL,6", sep=",", header=FALSE, stringsAsFactors=FALSE)
stri_match_first_regex(df$V1, "(.*?)\\|")[,2]
## [1] "ABC" "ABCD" "ABCDE" "DEF" "GHI" "BCDE"