可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm performing some text analysis on mutliple resume to generate a wordcloud using wordcloud package along with tm package for preprocessing the corpus of document in R.

The problems i'm facing are :

Checking whether the word in corpus have some meaning ie. it belongs to english dictionary.
How to mine / process multiple resumes together.
Checking for tech terms like r,java,eclipse etc.

Appreciate the help.

回答1:

I've faced some issues before, so sharing solutions to your problems :

1. There is a package qdapDictionaries which is a collection of dictionaries and word lists for use with the 'qdap' package.

library(qdapDictionaries)

#create custom function
is.word  <- function(x) x %in% GradyAugmented # or use any dataset from package

#use this function to filter words, df = dataframe from corpus
df <- df[which(is.word(df$terms)),]

2. Using VCorpus(DirSource(...)) to create your corpus from directory containing all resumes

resumeDir <- "path/all_resumes/"
myCorpus <- VCorpus(DirSource(resumeDir))

3. Create your custom dictionary file like my_dict.csv containing tech terms.

#read custom dictionary
tech_dict <- read.csv("path/to/my_dict.csv", stringsAsFactors = FALSE)
#create tech function
is.tech <- function(x) x %in% tech_dict
#filter
tech_df <- df[which(is.tech(df$terms)),]

Hope this helps.

回答2:

You can also add new words or merge two dictionaries in the following manner:

library(qdapDictionaries)

#create custom function
is.word  <- function(x) x %in% c(GradyAugmented, Dictionary2, "new_word1", "new_word2")

checking if word exist in english dictionary r

问题:

回答1:

回答2:

收藏的人(0)

checking if word exist in english dictionary r

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮