I am wanting to count the frequencies of certain strings within a dataframe.
strings <- c("pi","pie","piece","pin","pinned","post")
df <- as.data.frame(strings)
I would then like to count the frequency of the strings:
counts <- c("pi", "in", "pie", "ie")
To give me something like:
string freq
pi 5
in 2
pie 2
ie 2
I have experimented with grepl
and table
but I don't see how I can specify the strings I want to search for are.
You can use sapply()
to go the counts
and match every item in counts
against the strings
column in df
using grepl()
this will return a logical
vector (TRUE
if match, FALSE
if non-match). You can sum this vector up to get the number of matches.
sapply(df, function(x) {
sapply(counts, function(y) {
sum(grepl(y, x))
})
})
This will return:
strings
pi 5
in 2
pie 2
ie 2
You can use adist
from base R:
data.frame(counts,freq=rowSums(!adist(counts,strings,partial = T)))
counts freq
1 pi 5
2 in 2
3 pie 2
4 ie 2
If you are comfortable with regular expressions then you can do:
a=sapply(paste0(".*(",counts,").*|.*"),sub,"\\1",strings)
table(grep("\\w",a,value = T))
ie in pi pie
2 2 5 2
Frequency table created by qgrams
from the stringdist
package
library(stringdist)
strings <- c("pi","pie","piece","pin","pinned","post")
frequency <- data.frame(t(stringdist::qgrams(freq = strings, q = 2)))
freq
pi 5
po 1
st 1
ie 2
in 2
nn 1
os 1
ne 1
ec 1
ed 1
ce 1