Difference Between typical Hadoop Architecture and

2019-02-20 06:14发布

问题:

I know that Hadoop is based on Master/Slave architecture

HDFS works with NameNodes and DataNodes

and MapReduce works with jobtrackers and Tasktrackers

But I can't find all these services on MapR, I find out that it has its own Architecture with its own services

I'm a little bit confused, could any one please tell me what is the difference between using Hadoop only and using it with MapR !

回答1:

MapR and apache Hadoop DO NOT have same architecture at storage level. MapR uses its own filesystem MaRFS which is completely different from HDFS in terms of concept and implemenation . you can find more detailed comparision here : https://www.mapr.com/blog/comparing-mapr-fs-and-hdfs-nfs-and-snapshots#.VfGwwxG6eUk https://www.mapr.com/resources/videos/comparison-mapr-fs-and-hdfs



回答2:

You have to refer to Hadoop 2.x latest architecture since YARN ( Yet Another Resource Negotiator) & High Availability have been introduced in 2.x version.

Job tracker and Task tracker are replaced with Resource Manager, Node Manager and Applications Manager.

Hadoop 2.x YARN & High Availability

For MapR architecture, refer to MapR article

For comparison between different distributors, refer to this image

Detailed comparison is available at Data-magnum article by Bill Vorhies



回答3:

Mapr uses most of Apache bigdata distributions as their baseline. Mapr is a hadoop (and bigdata technology stacks) distribution provider with certain add-ons and technical support to its client.

Underline the mapr is entirely on the same architecture as of apache hadoop including all the core library distribution. However mapr distribution is more like a bundle of a complete and compatible bigdata technology package.

The main benefit of mapr is that it's distribution of various technologies like hive, hbase, spark etc will be compatible with core hadoop and among each other. This I'd particularly important because the bigdata technologies are evolving in different pace and hence news releases becomes incompatible very soon.

So, the vendors like mapr, cloudera etc are providing their version of hadoop didtribution and support such that end users can concentrate on the product building without worrying about the compatibility issues. But almost all of them are using apache distribution under the carpet.

In future, they might come up certain variation and additional features in an attempt to prevent client's switch to other vendors, but as of now is not the case.



标签: hadoop mapr