I have data that looks like this:
vector = c("hello I like to code hello","Coding is fun", "fun fun fun")
I want to remove duplicate words (space delimited) i.e. the output should look like
vector_cleaned
[1] "hello I like to code"
[2] "coding is fun"
[3] "fun"
Split it up (strsplit
on spaces), use unique
(in lapply
), and paste
it back together:
vapply(lapply(strsplit(vector, " "), unique), paste, character(1L), collapse = " ")
# [1] "hello i like to code" "coding is fun" "fun"
## OR
vapply(strsplit(vector, " "), function(x) paste(unique(x), collapse = " "), character(1L))
Update based on comments
You can always write a custom function to use with your vapply
function. For instance, here's a function that takes a split string, drops strings that are shorter than a certain number of characters, and has the "unique" setting as a user choice.
myFun <- function(x, minLen = 3, onlyUnique = TRUE) {
a <- if (isTRUE(onlyUnique)) unique(x) else x
paste(a[nchar(a) > minLen], collapse = " ")
}
Compare the output of the following to see how it would work.
vapply(strsplit(vector, " "), myFun, character(1L))
vapply(strsplit(vector, " "), myFun, character(1L), onlyUnique = FALSE)
vapply(strsplit(vector, " "), myFun, character(1L), minLen = 0)