Vectorization in Apache Mahout

2019-02-10 16:10发布

I am new to Mahout. I have a requirement to convert a text file to a vector for classification in later stage.

Could anybody of of shed some light on these below questions?

  1. How to convert a text file to a vector in mahout? The file format is like "username|comment about item|rating"
  2. The data will be few TBs. So which algorithm implementable I can use for classification using the vector I suppose to create?

Thanks, Arun

1条回答
地球回转人心会变
2楼-- · 2019-02-10 16:46

You can check these 2 examples that also somewhat do/explain how to use the Sequence File API. Here and here

And you should definitely read this intro to text analysis

查看更多
登录 后发表回答