Is MapReduce just a generalisation of another prog

I'm getting into parallel programming and I'm studying mapreduce and other distributed algorithms. Is it best just to learn mapreduce or is there a more general algorithm that will serve me better?

标签： algorithm mapreduce

4条回答

叛逆

2楼-- · 2019-07-01 11:01

It depends what you intend to use the algorithm(s) for.

MapReduce is a generalised and very useful programming model. (Google bases many of it's internal indexing processes on it). Learning it certainly won't do you any harm.

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.

The most important parallel processing concept to learn is quite simple: synchronisation is what you need to minimise if you want to attain effective speedup.

Strive for:

Large granularity of work chunks
Keep size work chunks similiar in size
Minimise the number of synchronisation steps

0人赞添加讨论(0) 举报

成全新的幸福

3楼-- · 2019-07-01 11:04

To really get a good appreciation of parallel programming, you should study several models of parallel programming and not just one parallel programming framework. You should study both shared memory (e.g. pthreads) and message passing (e.g. MPI and MapReduce) approaches to parallel programming.

MPI is a very general purpose tool for creating message-passing applications. If you use MPI extensively, you will find that some elements of MPI programs recur over and over again, such as setting up a "master" process that partitions work to "worker" processes, and aggregates the results. MapReduce is a particular implementation of a message-passing framework and provides a simpler programming model than MPI. It takes care of code that occurs quite frequently in parallel applications and, more importantly, takes care of such issues as failure recovery and data locality. The opensource Hadoop attempts to mimic MapReduce.

I think you will be better able to appreciate what MapReduce does and how it might be implemented by writing several MPI programs of your own. It can't hurt to learn Hadoop, but when it comes to general knowledge of parallel programming, it is good to be familiar with the basics like pthreads, OpenMP, and MPI.

0人赞添加讨论(0) 举报

等我变得足够好

4楼-- · 2019-07-01 11:07

If you want to learn something about parallel processing, I do not believe that picking a single algorithm will provide you with significant insights.

Mapreduce is a composition of a map and a reduce operation. These are typical higher-order functions that functional languages provide.

I would recommend first to learn a functional language, for example Scheme or Clojure. For Scheme, "Structure and Interpretation of Computer Programs" seems to be all the rage.

0人赞添加讨论(0) 举报

可以哭但决不认输i

5楼-- · 2019-07-01 11:13

For many "regular" serial algorithms, there are parallel versions, some of which can be modeled with MapReduce. Certainly learn MapReduce, as it is new and exciting, but it's just another tool in your toolbox, and you certainly can learn more, as there are limitations to MapReduce (and you'll learn about them).

0人赞添加讨论(0) 举报

Is MapReduce just a generalisation of another prog

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间