gsub return an empty string when no match is found

2020-03-14 02:00发布

I'm using the gsub function in R to return occurrences of my pattern (reference numbers) on a list of text. This works great unless no match is found, in which case I get the entire string back, instead of an empty string. Consider the example:

data <- list("a sentence with citation (Ref. 12)",
             "another sentence without reference")

sapply(data, function(x) gsub(".*(Ref. (\\d+)).*", "\\1", x))

Returns:

[1] "Ref. 12"                            "another sentence without reference"

But I'd like to get

[1] "Ref. 12"                            ""

Thanks!

标签: regex r grep
6条回答
老娘就宠你
2楼-- · 2020-03-14 02:33

according to the documentation, this is a feature of gsub it returns the input string if there are no matches to the supplied pattern matches returns the entire string.

here, I use the function grepl first to return a logical vector of the presence/absence of the pattern in the given string:

ifelse(grepl(".*(Ref. (\\d+)).*", data), 
      gsub(".*(Ref. (\\d+)).*", "\\1", data), 
      "")

embedding this in a function:

mygsub <- function(x){
     ans <- ifelse(grepl(".*(Ref. (\\d+)).*", x), 
              gsub(".*(Ref. (\\d+)).*", "\\1", x), 
              "")
     return(ans)
}

mygsub(data)
查看更多
仙女界的扛把子
3楼-- · 2020-03-14 02:44

Try strapplyc in the gsubfn package:

library(gsubfn)

L <- fn$sapply(unlist(data), ~ strapplyc(x, "Ref. \\d+"))
unlist(fn$sapply(L, ~ ifelse(length(x), x, "")))

which gives this:

a sentence with citation (Ref. 12) another sentence without reference 
                         "Ref. 12"                                 "" 

If you don't mind list output then you could just use L and forget about the last line of code. Note that the fn$ prefix turns the formula arguments of the function its applied to into function calls so the first line of code could be written without fn as sapply(unlist(data), function(x) strapplyc(x, "Ref x. \\d+")) .

查看更多
三岁会撩人
4楼-- · 2020-03-14 02:48

based on @joran 's answer

function:

extract_matches <- function(x,pattern,replacement,replacement_nomatch=""){
    x <- gsub(pattern,replacement,x)
    x[-grep(pattern,x,value = FALSE)] <- replacement_nomatch
    x
}

usage:

data <- list("with citation (Ref. 12)", "without reference", "")
extract_matches(data,  ".*(Ref. (\\d+)).*", "\\1")
查看更多
Evening l夕情丶
5楼-- · 2020-03-14 02:50

I'd probably go a different route, since the sapply doesn't seem necessary to me as these functions are vectorized already:

fun <- function(x){
    ind <- grep(".*(Ref. (\\d+)).*",x,value = FALSE)
    x <- gsub(".*(Ref. (\\d+)).*", "\\1", x)
    x[-ind] <- ""
    x
}

fun(data)
查看更多
放荡不羁爱自由
6楼-- · 2020-03-14 02:52

You might try embedding grep( ..., value = T) in that function.

data <- list("a sentence with citation (Ref. 12)",
         "another sentence without reference")

unlist( sapply(data, function(x) { 
  x <- gsub(".*(Ref. (\\d+)).*", "\\1", x)
  grep( "Ref\\.", x, value = T )
  } ) )

Kind of bulky but it works? It also removes the empty 2nd reference.

查看更多
够拽才男人
7楼-- · 2020-03-14 02:53
xs <- sapply(data, function(x) gsub(".*(Ref. (\\d+)).*", "\\1", x))
xs[xs==data] <- ""
xs
#[1] "Ref. 12" ""       
查看更多
登录 后发表回答