Replace multiple letters with accents with gsub

2019-01-01 09:12发布

of course I could replace specific arguments like this:

    mydata=c("á","é","ó")
    mydata=gsub("á","a",mydata)
    mydata=gsub("é","e",mydata)
    mydata=gsub("ó","o",mydata)
    mydata

but surely there is a easier way to do this all in onle line, right? I dont find the gsub help to be very comprehensive on this.

标签: r regex gsub
10条回答
低头抚发
2楼-- · 2019-01-01 09:25

This is very similar to @kith, but in function form, and with the most common diacritcs cases:

removeDiscritics <- function(string) {
  chartr(
     "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
    ,"SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
    , string
  )
}


removeDiscritics("test áéíóú")

"test aeiou"

查看更多
人气声优
3楼-- · 2019-01-01 09:29

Not so elegant, but it works and does what you want

> diag(sapply(1:length(mydata), function(i, x, y) {
+   gsub(x[i],y[i], x=x)
+ }, x=mydata, y=c('a', 'b', 'c')))
[1] "a" "b" "c"
查看更多
何处买醉
4楼-- · 2019-01-01 09:32

A problem with some of the implementations above (e.g., Theodore Lytras's) is that if the patterns are multiple characters, they may conflict in the case that one pattern is a substring of another. A way to solve this is to create a copy of the object and perform the pattern replacement in that copy. This is implemented in my package bayesbio, available on CRAN.

mgsub <- function(pattern, replacement, x, ...) {
  n = length(pattern)
  if (n != length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result = x
  for (i in 1:n) {
    result[grep(pattern[i], x, ...)] = replacement[i]
  }
  return(result)
}

Here is a test case:

  asdf = c(4, 0, 1, 1, 3, 0, 2, 0, 1, 1)

  res = mgsub(c("0", "1", "2"), c("10", "11", "12"), asdf)
查看更多
像晚风撩人
5楼-- · 2019-01-01 09:33

Maybe this can be usefull:

iconv('áéóÁÉÓçã', to="ASCII//TRANSLIT")
[1] "aeoAEOca"
查看更多
皆成旧梦
6楼-- · 2019-01-01 09:33

Another mgsub implementation using Reduce

mystring = 'This is good'
myrepl = list(c('o', 'a'), c('i', 'n'))

mgsub2 <- function(myrepl, mystring){
  gsub2 <- function(l, x){
   do.call('gsub', list(x = x, pattern = l[1], replacement = l[2]))
  }
  Reduce(gsub2, myrepl, init = mystring, right = T) 
}
查看更多
旧人旧事旧时光
7楼-- · 2019-01-01 09:36

You can use stringi package to replace these characters.

> stri_trans_general(c("á","é","ó"), "latin-ascii")

[1] "a" "e" "o"
查看更多
登录 后发表回答