I'm using the gsub
function in R to return occurrences of my pattern (reference numbers) on a list of text. This works great unless no match is found, in which case I get the entire string back, instead of an empty string. Consider the example:
data <- list("a sentence with citation (Ref. 12)",
"another sentence without reference")
sapply(data, function(x) gsub(".*(Ref. (\\d+)).*", "\\1", x))
Returns:
[1] "Ref. 12" "another sentence without reference"
But I'd like to get
[1] "Ref. 12" ""
Thanks!
according to the documentation, this is a feature of
gsub
it returns the input string if there are no matches to the supplied pattern matches returns the entire string.here, I use the function
grepl
first to return a logical vector of the presence/absence of the pattern in the given string:embedding this in a function:
Try
strapplyc
in the gsubfn package:which gives this:
If you don't mind list output then you could just use L and forget about the last line of code. Note that the
fn$
prefix turns the formula arguments of the function its applied to into function calls so the first line of code could be written withoutfn
assapply(unlist(data), function(x) strapplyc(x, "Ref x. \\d+"))
.based on @joran 's answer
function:
usage:
I'd probably go a different route, since the
sapply
doesn't seem necessary to me as these functions are vectorized already:You might try embedding
grep( ..., value = T)
in that function.Kind of bulky but it works? It also removes the empty 2nd reference.