Is there a way to split camel case strings in R?
I have attempted:
string.to.split = "thisIsSomeCamelCase"
unlist(strsplit(string.to.split, split="[A-Z]") )
# [1] "this" "s" "ome" "amel" "ase"
Is there a way to split camel case strings in R?
I have attempted:
string.to.split = "thisIsSomeCamelCase"
unlist(strsplit(string.to.split, split="[A-Z]") )
# [1] "this" "s" "ome" "amel" "ase"
string.to.split = "thisIsSomeCamelCase"
gsub("([A-Z])", " \\1", string.to.split)
# [1] "this Is Some Camel Case"
strsplit(gsub("([A-Z])", " \\1", string.to.split), " ")
# [[1]]
# [1] "this" "Is" "Some" "Camel" "Case"
Looking at Ramnath's and mine I can say that my initial impression that this was an underspecified question has been supported.
And give Tommy and Ramanth upvotes for pointing out [:upper:]
strsplit(gsub("([[:upper:]])", " \\1", string.to.split), " ")
# [[1]]
# [1] "this" "Is" "Some" "Camel" "Case"
Here is one way to do it
split_camelcase <- function(...){
strings <- unlist(list(...))
strings <- gsub("^[^[:alnum:]]+|[^[:alnum:]]+$", "", strings)
strings <- gsub("(?!^)(?=[[:upper:]])", " ", strings, perl = TRUE)
return(strsplit(tolower(strings), " ")[[1]])
}
split_camelcase("thisIsSomeGood")
# [1] "this" "is" "some" "good"
Here's an approach using a single regex (a Lookahead and Lookbehind):
strsplit(string.to.split, "(?<=[a-z])(?=[A-Z])", perl = TRUE)
## [[1]]
## [1] "this" "Is" "Some" "Camel" "Case"
Here is a one-liner using the gsubfn
package's strapply
. The regular expression matches the beginning of the string (^
) followed by one or more lower case letters ([[:lower:]]+
) or (|
) an upper case letter ([[:upper:]]
) followed by zero or more lower case letters ([[:lower:]]*
) and processes the matched strings with c
(which concatenates the individual matches into a vector). As with strsplit
it returns a list so we take the first component ([[1]]
) :
library(gsubfn)
strapply(string.to.split, "^[[:lower:]]+|[[:upper:]][[:lower:]]*", c)[[1]]
## [1] "this" "Is" "Camel" "Case"
The beginnings of an answer is to split all the characters:
sp.x <- strsplit(string.to.split, "")
Then find which string positions are upper case:
ind.x <- lapply(sp.x, function(x) which(!tolower(x) == x))
Then use that to split out each run of characters . . .
I think my other answer is better than the follwing, but if only a oneliner to split is needed...here we go:
library(snakecase)
unlist(strsplit(to_parsed_case(string.to.split), "_"))
#> [1] "this" "Is" "Some" "Camel" "Case"
Here an easy solution via snakecase + some tidyverse helpers:
install.packages("snakecase")
library(snakecase)
library(magrittr)
library(stringr)
library(purrr)
string.to.split = "thisIsSomeCamelCase"
to_parsed_case(string.to.split) %>%
str_split(pattern = "_") %>%
purrr::flatten_chr()
#> [1] "this" "Is" "Some" "Camel" "Case"
Githublink to snakecase: https://github.com/Tazinho/snakecase