I'm getting into parallel programming and I'm studying mapreduce and other distributed algorithms. Is it best just to learn mapreduce or is there a more general algorithm that will serve me better?
相关问题
- Finding k smallest elements in a min heap - worst-
- binary search tree path list
- High cost encryption but less cost decryption
- How to get a fixed number of evenly spaced points
- Space complexity of validation of a binary search
相关文章
- What are the problems associated to Best First Sea
- mapreduce count example
- Coin change DP solution to keep track of coins
- Algorithm for partially filling a polygonal mesh
- Robust polygon normal calculation
- Algorithm for maximizing coverage of rectangular a
- How to measure complexity of a string?
- Select unique/deduplication in SSE/AVX
It depends what you intend to use the algorithm(s) for.
MapReduce is a generalised and very useful programming model. (Google bases many of it's internal indexing processes on it). Learning it certainly won't do you any harm.
The most important parallel processing concept to learn is quite simple: synchronisation is what you need to minimise if you want to attain effective speedup.
Strive for:
To really get a good appreciation of parallel programming, you should study several models of parallel programming and not just one parallel programming framework. You should study both shared memory (e.g. pthreads) and message passing (e.g. MPI and MapReduce) approaches to parallel programming.
MPI is a very general purpose tool for creating message-passing applications. If you use MPI extensively, you will find that some elements of MPI programs recur over and over again, such as setting up a "master" process that partitions work to "worker" processes, and aggregates the results. MapReduce is a particular implementation of a message-passing framework and provides a simpler programming model than MPI. It takes care of code that occurs quite frequently in parallel applications and, more importantly, takes care of such issues as failure recovery and data locality. The opensource Hadoop attempts to mimic MapReduce.
I think you will be better able to appreciate what MapReduce does and how it might be implemented by writing several MPI programs of your own. It can't hurt to learn Hadoop, but when it comes to general knowledge of parallel programming, it is good to be familiar with the basics like pthreads, OpenMP, and MPI.
If you want to learn something about parallel processing, I do not believe that picking a single algorithm will provide you with significant insights.
Mapreduce is a composition of a
map
and areduce
operation. These are typical higher-order functions that functional languages provide.I would recommend first to learn a functional language, for example Scheme or Clojure. For Scheme, "Structure and Interpretation of Computer Programs" seems to be all the rage.
For many "regular" serial algorithms, there are parallel versions, some of which can be modeled with MapReduce. Certainly learn MapReduce, as it is new and exciting, but it's just another tool in your toolbox, and you certainly can learn more, as there are limitations to MapReduce (and you'll learn about them).