可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a column in my dataframe as follows
Col1
----------------------------------------------------------------------------
Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control
How do I count the number of strings that occur that is separated by a comma, in other words what I am trying to accomplish is something like this below
Affiliation Freq
------------------------------------------
Center for Animal Control 3
Division of Hypertension 2
Department of Medicine 1
Department of Surgery 1
Division of Primary Care 1
Department of Internal Medicine 1
Could someone help me to figure this out?
回答1:
Assumption: Center for Animal Control, Division of Hypertension, Department of Medicine
is value for row 1, Department of Surgery, Division of Primary Care, Center for Animal Control
for row 2 and so on.
df
is the data frame.
aff_val <- trimws(unlist(strsplit(df$col1,",")))
ans <- data.frame(table(aff_val))
colnames(ans)[1] <- 'Affiliation'
回答2:
Here is one approach. Also substitute '\n'
with a comma since you have some new lines in your text.
df <- data.frame(col1 = rep("Center for Animal Control, Division of Hypertension, Department of Medicine, Department of Surgery, Division of Primary Care, Center for Animal Control, Department of Internal Medicine, Division of Hypertension, Center for Animal Control", 1), stringsAsFactors = FALSE)
df$col1 <- gsub('\\n', ', ', df$col1)
as.data.frame(table(unlist(strsplit(df$col1, ', '))))
Output as follows (on original data):
Var1 Freq
1 Center for Animal Control 3
2 Department of Internal Medicine 1
3 Department of Medicine 1
4 Department of Surgery 1
5 Division of Hypertension 2
6 Division of Primary Care 1
回答3:
I use scan
and trimws
for these text processing tasks.
inp <- " Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
> table( trimws(scan(text=inp, what="", sep=",")))
Read 9 items
Center for Animal Control Department of Internal Medicine
3 1
Department of Medicine Department of Surgery
1 1
Division of Hypertension Division of Primary Care
2 1
Can also wrap as.data.frame around that result:
> as.data.frame(table( trimws(scan(text=inp, what="", sep=","))))
Read 9 items
Var1 Freq
1 Center for Animal Control 3
2 Department of Internal Medicine 1
3 Department of Medicine 1
4 Department of Surgery 1
5 Division of Hypertension 2
6 Division of Primary Care 1
回答4:
library(stringr)
a<-"Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
con<-textConnection(a)
tbl<-read.table(con,sep=",")
vec<-str_trim(unlist(tbl))
as.data.frame(table(vec))
The answer is
1 Center for Animal Control 3
2 Department of Internal Medicine 1
3 Department of Medicine 1
4 Department of Surgery 1
5 Division of Hypertension 2
6 Division of Primary Care 1
回答5:
text = "Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
library(stringi)
library(dplyr)
library(tidyr)
data_frame(text = text) %>%
mutate(line = text %>% stri_split_fixed("\n") ) %>%
unnest(line) %>%
mutate(phrase = line %>% stri_split_fixed(", ") ) %>%
unnest(phrase) %>%
count(phrase)