Where do I start with distributed computing?

I'm interested in learning techniques for distributed computing. As a Java developer, I'm probably willing to start with Hadoop. Could you please recommend some books/tutorials/articles to begin with?

标签： hadoop mapreduce distributed-computing

7条回答

Melony?

2楼-- · 2019-03-08 21:27

Here are some resources from Yahoo! Developer Network

a tutorial:

http://developer.yahoo.com/hadoop/tutorial/

an introductory course (requires Siverlight, sigh):

http://yahoo.hosted.panopto.com/CourseCast/Viewer/Default.aspx?id=281cbf37-eed1-4715-b158-0474520014e6

0人赞添加讨论(0) 举报

\"骚年 ilove

3楼-- · 2019-03-08 21:30

Currently, bookwise I would check out - Hadoop A Definitive Guide. Its written by Tom White who has worked on Hadoop for a good while now, and works at Cloudera with Doug Cutting (Hadoop creator).

Also on the free side, Jimmy Lin from UMD has written a book called: Data-Intensive Text Processing with MapReduce. Here's a link to the final pre-production verison (link provided by the author on his website).

0人赞添加讨论(0) 举报

看我几分像从前

4楼-- · 2019-03-08 21:30

The All Things Hadoop Podcast http://allthingshadoop.com/podcast has some good content and good guests. A lot of it is geared to getting started with Distributed Computing.

0人赞添加讨论(0) 举报

够拽才男人

5楼-- · 2019-03-08 21:30

MIT 6.824 is the best stuff. Only reading google papers related to Hadoop is not enough. A systematic course learning is required if you want to go deeper.

0人赞添加讨论(0) 举报

我只想做你的唯一

6楼-- · 2019-03-08 21:37

Hadoop is not necessarily the best tool for all distributed computing problems. Despite its power, it also has a pretty steep learning curve and cost of ownership. You might want to clarify your requirements and look for suitable alternatives in the Java world, such as HTCondor, JPPF or GridGain (my apologies to those I do not mention).

0人赞添加讨论(0) 举报

Luminary・发光体

7楼-- · 2019-03-08 21:41

Maybe you can read some papers related to MapReduce and distributed computing first, to gain a better understanding of it. Here are some I would like to recommand:

MapReduce: Simplified Data Processing on Large Clusters, http://www.usenix.org/events/osdi04/tech/full_papers/dean/dean_html/
Bigtable: A Distributed Storage System for Structured Data, http://www.usenix.org/events/osdi06/tech/chang/chang_html/
Dryad: Distributed data-parallel programs from sequential building blocks, http://pdos.csail.mit.edu/6.824-2007/papers/isard-dryad.pdf
The landscape of parallel computing research: A view from berkeley, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.8705&rep=rep1&type=pdf

On the other hand, if you want to know better of Hadoop, maybe you can start reading Hadoop MapReduce framework source code.

0人赞添加讨论(0) 举报

1 2 下一页

Where do I start with distributed computing?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间