Count the frequency of strings in a dataframe R

2019-08-11 11:25发布

问题:

I am wanting to count the frequencies of certain strings within a dataframe.

strings  <- c("pi","pie","piece","pin","pinned","post")
df <- as.data.frame(strings)

I would then like to count the frequency of the strings:

counts <- c("pi", "in", "pie", "ie")

To give me something like:

string  freq
 pi       5
 in       2
 pie      2
 ie       2

I have experimented with grepl and table but I don't see how I can specify the strings I want to search for are.

回答1:

You can use sapply() to go the counts and match every item in counts against the strings column in df using grepl() this will return a logical vector (TRUE if match, FALSE if non-match). You can sum this vector up to get the number of matches.

sapply(df, function(x) {
  sapply(counts, function(y) {
    sum(grepl(y, x))
  })
})

This will return:

    strings
pi        5
in        2
pie       2
ie        2


回答2:

You can use adist from base R:

data.frame(counts,freq=rowSums(!adist(counts,strings,partial = T)))
  counts freq
1     pi    5
2     in    2
3    pie    2
4     ie    2

If you are comfortable with regular expressions then you can do:

 a=sapply(paste0(".*(",counts,").*|.*"),sub,"\\1",strings)
 table(grep("\\w",a,value = T))
 ie  in  pi pie 
  2   2   5   2 


回答3:

Frequency table created by qgrams from the stringdist package

library(stringdist)
strings  <- c("pi","pie","piece","pin","pinned","post")
frequency <- data.frame(t(stringdist::qgrams(freq = strings, q = 2)))

   freq
pi    5
po    1
st    1
ie    2
in    2
nn    1
os    1
ne    1
ec    1
ed    1
ce    1


标签: r grepl