Splitting Strings and Generating Frequency Tables

I have a column of firm names in an R dataframe that goes something like this:

"ABC Industries"  
"ABC Enterprises"  
"123 and 456 Corporation"  
"XYZ Company"

And so on. I'm trying to generate frequency tables of every word that appears in this column, so for example, something like this:

Industries   10  
Corporation  31  
Enterprise   40  
ABC          30  
XYZ          40

I'm relatively new to R, so I was wondering of a good way to approach this. Should I be splitting the strings and placing every distinct word into a new column? Is there a way to split up a multi-word row into multiple rows with one word?

标签： r string split frequency

3条回答

虎瘦雄心在

2楼-- · 2019-02-18 07:16

You can use the package tidytext and dplyr:

set.seed(42)

text <- c("ABC Industries", "ABC Enterprises", 
       "123 and 456 Corporation", "XYZ Company")

data <- data.frame(category = sample(text, 100, replace = TRUE),
                   stringsAsFactors = FALSE)

library(tidytext)
library(dplyr)

data %>%
  unnest_tokens(word, category) %>%
  group_by(word) %>%
  count()

#> # A tibble: 9 x 2
#> # Groups:   word [9]
#>          word     n
#>         <chr> <int>
#> 1         123    29
#> 2         456    29
#> 3         abc    45
#> 4         and    29
#> 5     company    26
#> 6 corporation    29
#> 7 enterprises    21
#> 8  industries    24
#> 9         xyz    26

0人赞添加讨论(0) 举报

\"骚年 ilove

3楼-- · 2019-02-18 07:29

If you wanted to, you could do it in a one-liner:

R> text <- c("ABC Industries", "ABC Enterprises", 
+            "123 and 456 Corporation", "XYZ Company")
R> table(do.call(c, lapply(text, function(x) unlist(strsplit(x, " ")))))

        123         456         ABC         and     Company 
          1           1           2           1           1 
Corporation Enterprises  Industries         XYZ 
          1           1           1           1 
R>

Here I use strsplit() to break each entry intro components; this returns a list (within a list). I use do.call() so simply concatenate all result lists into one vector, which table() summarises.

0人赞添加讨论(0) 举报

\"骚年 ilove

4楼-- · 2019-02-18 07:40

Here is another one-liner. It uses paste() to combine all of the column entries into a single long text string, which it then splits apart and tabulates:

text <- c("ABC Industries", "ABC Enterprises", 
         "123 and 456 Corporation", "XYZ Company")

table(strsplit(paste(text, collapse=" "), " "))

0人赞添加讨论(0) 举报

Splitting Strings and Generating Frequency Tables

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间