从csv文件字符串的TF-IDF(Tf-idf of strings from csv file)

我test.csv文件（无头）：

very good, very bad, you are great
very bad, good restaurent, nice place to visit

我想我的文集与分离,让我最后DocumentTermMatrix变为：

      terms
 docs       very good      very bad        you are great   good restaurent   nice place to visit
  doc1       tf-idf          tf-idf         tf-idf          0                    0
  doc2       0                tf-idf         0                tf-idf             tf-idf

我能够产生上述DTM正确，如果我不加载documents从csv file ，如下图所示：

library(tm)
docs <- c(D1 = "very good, very bad, you are great", 
    D2 = "very bad, good restaurent, nice place to visit")

dd <- Corpus(VectorSource(docs))
dd <- tm_map(dd, function(x) {
    PlainTextDocument(
       gsub("\\s+","~",strsplit(x,",\\s*")[[1]]), 
       id=ID(x)
     )
})
inspect(dd)

# A corpus with 2 text documents
# 
# The metadata consists of 2 tag-value pairs and a data frame
# Available tags are:
#   create_date creator 
# Available variables in the data frame are:
#   MetaID 

# $D1
# very~good
# very~bad
# you~are~great
# 
# $D2
# very~bad
# good~restaurent
# nice~place~to~visit

dtm <- DocumentTermMatrix(dd, control = list(weighting = weightTfIdf))
as.matrix(dtm)

这将产生

# Docs good~restaurent nice~place~to~visit very~bad very~good you~are~great
#   D1       0.0000000           0.0000000        0 0.3333333     0.3333333
#   D2       0.3333333           0.3333333        0 0.0000000     0.0000000

如果，我加载的document从csv文件，然后仅在每个文档的第一项是越来越加入象下面这样：

> file_loc <- "testdata.csv"
> require(tm)
  Loading required package: tm
> x <- read.csv(file_loc, header = FALSE)
> x <- data.frame(lapply(x, as.character), stringsAsFactors=FALSE)
> dd <- Corpus(DataframeSource(x))
> dd <- tm_map(dd, stripWhitespace)
> dd <- tm_map(dd, tolower)
>  dd <- tm_map(dd, function(x) {
            PlainTextDocument(
            gsub("\\s+","~",strsplit(x,",\\s*")[[1]]), 
            id=ID(x)
            )
          })
> inspect(dd)

加入只有第一项是这样的：

# $D1
# very~good

# 
# $D2
# very~bad

我如何加入所有的条款，并创建一个DocumentTermMatrix像上面。

你无法正确读取数据。我用scan阅读。以下工作：

docs <- scan("testdata.csv", "character", sep = "\n")

dd <- Corpus(VectorSource(x))
dd <- tm_map(dd, function(x) {
  PlainTextDocument(
    gsub("\\s+","~",strsplit(x,",\\s*")[[1]]), 
    id=ID(x)
  )
})
inspect(dd)

dtm <- DocumentTermMatrix(dd, control = list(weighting = weightTfIdf))
as.matrix(dtm)