Hadoop Machine learning/Data mining project idea?

I am a graduate CS student (Data mining and machine learning) and have a good exposure to core Java (>4 years). I have read up a bunch of stuff on Hadoop and Map/Reduce

I would now like to do a project on this stuff (over my free time of corse) to get a better understanding.

Any good project ideas would be really appreciated. I just wanna do this to learn, so I dont really mind re-inventing the wheel. Also, anything related to data mining/machine learning would be an added bonus (fits with my research) but absolutely not necessary.

标签： hadoop machine-learning data-mining

4条回答

Root（大扎）

2楼-- · 2020-05-19 02:48

You haven't written anything about your interest. I know algorithms in graph mining has been implemented over hadoop framework. This software http://www.cs.cmu.edu/~pegasus/ and paper : "PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations" may give you starting point.

Further, this link discusses something similar to your question: http://atbrox.com/2010/02/08/parallel-machine-learning-for-hadoopmapreduce-a-python-example/ but it is in python. And, there is a very good paper by Andrew Ng "Map-Reduce for Machine Learning on Multicore".

There was a NIPS 2009 workshop on similar topic "Large-Scale Machine Learning: Parallelism and Massive Datasets". You can browse some of the paper and get an idea.

Edit : Also there is Apache Mahout http://mahout.apache.org/ -->" Our core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm"

0人赞添加讨论(0) 举报

你好瞎i

3楼-- · 2020-05-19 03:00

See http://www.quora.com/Machine-Learning/What-are-some-good-class-projects-for-machine-learning-using-MapReduce

and some good toy projects to start with: http://www.quora.com/Programming-Challenges-1/What-are-some-good-toy-problems-in-data-science

0人赞添加讨论(0) 举报

Emotional °昔

4楼-- · 2020-05-19 03:10

Why don't you contribute to Apache Hadoop/Mahout by helping them implement additional algorithms?

https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms

Has a number of algorithms marked as "open". To my understanding, they could use help with implementing these? And there are hundreds of algorithms even missing from this list.

By any means, since you want to do something with Hadoop, why don't you ask them what they need instead of asking on some random internet site?

0人赞添加讨论(0) 举报

老娘就宠你

5楼-- · 2020-05-19 03:13

Trying to think of an efficient way to implement Hierarchical Agglomerative Clustering on Hadoop is a nice project to work on. It not only involves algorithmic aspects but also had hadoop core framework related optimizations.

0人赞添加讨论(0) 举报

Hadoop Machine learning/Data mining project idea?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间