Java Open Source Text Mining Frameworks [closed]

2019-03-08 11:41发布

I want to know what is the best open source Java based framework for Text Mining, to use botg Machine Learning and dictionary Methods.

I'm using Mallet but there are not that much documentation and I do not know if it will fit all my requirements.

7条回答
乱世女痞
2楼-- · 2019-03-08 12:17

We use lucene to process live streams from the internet. It has a native java api.

http://lucene.apache.org/java/docs/

You can then use mahout which is a bunch of machien learning algorithms which operate on top of lucene.

http://lucene.apache.org/mahout/

查看更多
Rolldiameter
3楼-- · 2019-03-08 12:19

I built a maximum entropy named entity recognizer for CoNLL data using OpenNLP MaxEnt http://sourceforge.net/projects/maxent/ for a course once.

Required a lot of data preprocessing with custom perl scripts do get all the features extracted into nice neat numerical vectors though.

查看更多
一夜七次
4楼-- · 2019-03-08 12:22

Although not a specialized text mining framework, Weka has a number of classifiers usually employed in text mining tasks such as: SVM, kNN, multinomial NaiveBayes, among others.

It also has a few filters to wok with textual data like the StringToWordVector filter which can perform TF/IDF transformation.

Check out the Weka wiki website for more information.

查看更多
何必那么认真
5楼-- · 2019-03-08 12:23

You may already know about GATE: http://gate.ac.uk/

...but that's what we've used (at my day job) for lots of different text mining problems. It's pretty flexible and open.

查看更多
够拽才男人
6楼-- · 2019-03-08 12:26

I honestly think that the several answers presented here are very good. However, to fulfill my requirements I have chosen to use Apache UIMA with ClearTK. It supports several ML Methods and I do not have any licences problem. Plus, I can make wrappers to other ML methodologies, and I take the advantage of the UIMA framework, which is very well organized and fast.

Thank you all for your interesting answers.

Best Regards, ukrania

查看更多
Deceive 欺骗
7楼-- · 2019-03-08 12:28
登录 后发表回答