I have this mystring
with the delimiter _
. The condition here is if there are two or more delimiters, I want to split at the second delimiter and if there is only one delimiter, I want to split at ".Recal"
and get the result
as shown below.
mystring<-c("MODY_60.2.ReCal.sort.bam","MODY_116.21_C4U.ReCal.sort.bam","MODY_116.3_C2RX-1-10.ReCal.sort.bam","MODY_116.4.ReCal.sort.bam")
result
"MODY_60.2" "MODY_116.21" "MODY_116.3" "MODY_116.4"
You can simply do using
gsub
without using any complex regex.Just replace by\\1
.See demo.https://regex101.com/r/wL4aB6/1
With the
stringr
package:It also works with more than two delimiters.
You can do this using
gsubfn
This allows for cases when you have more than two "_", and you want to split on the second one, for example,
In the function,
f
,x
is the original string,y
andz
are the next matches. So, ifz
is not a "_", then it proceeds with the splitting by the alternative string.gregexpr
can search for a pattern in strings and give the location.First, we use
gregexpr
to find the location of all_
in each element ofmystring
. Then, we loop through that output and extract the index of second_
within each element ofmystring
. If there is no second_
, it'll return anNA
(checkinds
in the example below).After that, we can either extract the relevant part using
substr
based on the extracted index or, if there isNA
, we can split the string at.ReCal
and keep only the first part.A little longer, but needs less regular expression knowledge: