I have table that the first column is:
chr10:100002872-100002872
chr10:100003981-100003981
chr10:100004774-100004774
chr10:100005285-100005285
chr10:100007123-100007123
I want to convert it to 3 separate columns but I couldn't define ":" and "-" to used strsplit command.
What should I do?
Here's one way:
library(data.table)
DF[, paste0("V1.",1:3) ] <- tstrsplit(DF$V1, ":|-")
# V1 V1.1 V1.2 V1.3
# 1 chr10:100002872-100002872 chr10 100002872 100002872
# 2 chr10:100003981-100003981 chr10 100003981 100003981
# 3 chr10:100004774-100004774 chr10 100004774 100004774
# 4 chr10:100005285-100005285 chr10 100005285 100005285
# 5 chr10:100007123-100007123 chr10 100007123 100007123
strsplit
accepts regular expressions involving the "or" operator, |
, as @AnandaMahto said. tstrsplit
is just a convenience function added by the data.table package.
If you convert your data.frame to a data.table (which has many advantages and no disadvantages except a slight learning curve), you would do:
setDT(DF)[, paste0("V1.",1:3) := tstrsplit(V1, ":|-")]
# V1 V1.1 V1.2 V1.3
# 1: chr10:100002872-100002872 chr10 100002872 100002872
# 2: chr10:100003981-100003981 chr10 100003981 100003981
# 3: chr10:100004774-100004774 chr10 100004774 100004774
# 4: chr10:100005285-100005285 chr10 100005285 100005285
# 5: chr10:100007123-100007123 chr10 100007123 100007123
Alternatives. There are (cumbersome) ways to get the same thing in base R, like
DF[, paste0("V1.",1:3) ] <- do.call(rbind, strsplit(DF$V1, ":|-"))
And @AnandaMahto's package also has a convenience function for this:
library(splitstackshape)
cSplit(DF, "V1", ":|-")
# V1.1 V1.2 V1.3 V1_1
# 1: chr10 100002872 100002872 chr10:100002872-100002872
# 2: chr10 100003981 100003981 chr10:100003981-100003981
# 3: chr10 100004774 100004774 chr10:100004774-100004774
# 4: chr10 100005285 100005285 chr10:100005285-100005285
# 5: chr10 100007123 100007123 chr10:100007123-100007123
Also similarly with tidyr
. If you want to keep the original column you
can add , remove = FALSE
and convert = TRUE
if you want to set corresponding classes to the new columns. separate
has a default regex to split on non character/numeric values, thus you don't need to specify your condition. If some of the rows have missing components add , extra = "merge"
library(tidyr)
separate(DF, "V1", paste0("V1.",1:3))
# V1.1 V1.2 V1.3
# 1 chr10 100002872 100002872
# 2 chr10 100003981 100003981
# 3 chr10 100004774 100004774
# 4 chr10 100005285 100005285
# 5 chr10 100007123 100007123