Count the number of times (frequency) a string occ

2019-09-06 13:11发布

I have a column in my dataframe as follows

   Col1
   ----------------------------------------------------------------------------
   Center for Animal Control, Division of Hypertension, Department of Medicine
   Department of Surgery, Division of Primary Care, Center for Animal Control
   Department of Internal Medicine, Division of Hypertension, Center for Animal Control

How do I count the number of strings that occur that is separated by a comma, in other words what I am trying to accomplish is something like this below

    Affiliation                         Freq
    ------------------------------------------
    Center for Animal Control           3
    Division of Hypertension            2
    Department of Medicine              1
    Department of Surgery               1
    Division of Primary Care            1
    Department of Internal Medicine     1  

Could someone help me to figure this out?

5条回答
孤傲高冷的网名
2楼-- · 2019-09-06 13:24

Assumption: Center for Animal Control, Division of Hypertension, Department of Medicine is value for row 1, Department of Surgery, Division of Primary Care, Center for Animal Control for row 2 and so on.

df is the data frame.

aff_val <- trimws(unlist(strsplit(df$col1,",")))

ans <- data.frame(table(aff_val))

colnames(ans)[1] <- 'Affiliation'
查看更多
聊天终结者
3楼-- · 2019-09-06 13:25
library(stringr)
a<-"Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
con<-textConnection(a)
tbl<-read.table(con,sep=",")
vec<-str_trim(unlist(tbl))
as.data.frame(table(vec))

The answer is

1       Center for Animal Control    3
2 Department of Internal Medicine    1
3          Department of Medicine    1
4           Department of Surgery    1
5        Division of Hypertension    2
6        Division of Primary Care    1
查看更多
干净又极端
4楼-- · 2019-09-06 13:26

I use scan and trimws for these text processing tasks.

inp <- "    Center for Animal Control, Division of Hypertension, Department of Medicine
    Department of Surgery, Division of Primary Care, Center for Animal Control
    Department of Internal Medicine, Division of Hypertension, Center for Animal Control"

> table( trimws(scan(text=inp, what="", sep=",")))
Read 9 items

      Center for Animal Control Department of Internal Medicine 
                              3                               1 
         Department of Medicine           Department of Surgery 
                              1                               1 
       Division of Hypertension        Division of Primary Care 
                              2                               1 

Can also wrap as.data.frame around that result:

> as.data.frame(table(  trimws(scan(text=inp, what="", sep=","))))
Read 9 items
                             Var1 Freq
1       Center for Animal Control    3
2 Department of Internal Medicine    1
3          Department of Medicine    1
4           Department of Surgery    1
5        Division of Hypertension    2
6        Division of Primary Care    1
查看更多
Evening l夕情丶
5楼-- · 2019-09-06 13:28
text = "Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"

library(stringi)
library(dplyr)
library(tidyr)

data_frame(text = text) %>%
  mutate(line = text %>% stri_split_fixed("\n") ) %>%
  unnest(line) %>%
  mutate(phrase = line %>% stri_split_fixed(", ") ) %>%
  unnest(phrase) %>%
  count(phrase)
查看更多
做个烂人
6楼-- · 2019-09-06 13:32

Here is one approach. Also substitute '\n' with a comma since you have some new lines in your text.

df <- data.frame(col1 = rep("Center for Animal Control, Division of Hypertension, Department of Medicine, Department of Surgery, Division of Primary Care, Center for Animal Control, Department of Internal Medicine, Division of Hypertension, Center for Animal Control", 1), stringsAsFactors = FALSE)
df$col1 <- gsub('\\n', ', ', df$col1)
as.data.frame(table(unlist(strsplit(df$col1, ', '))))

Output as follows (on original data):

                             Var1 Freq
1       Center for Animal Control    3
2 Department of Internal Medicine    1
3          Department of Medicine    1
4           Department of Surgery    1
5        Division of Hypertension    2
6        Division of Primary Care    1
查看更多
登录 后发表回答