I know that Hadoop
is based on Master/Slave architecture
HDFS
works with NameNodes
and DataNodes
and MapReduce
works with jobtrackers
and Tasktrackers
But I can't find all these services on MapR
, I find out that it has its own Architecture with its own services
I'm a little bit confused, could any one please tell me what is the difference between using Hadoop only and using it with MapR !
MapR and apache Hadoop DO NOT have same architecture at storage level. MapR uses its own filesystem MaRFS which is completely different from HDFS in terms of concept and implemenation . you can find more detailed comparision here : https://www.mapr.com/blog/comparing-mapr-fs-and-hdfs-nfs-and-snapshots#.VfGwwxG6eUk
https://www.mapr.com/resources/videos/comparison-mapr-fs-and-hdfs
You have to refer to Hadoop 2.x
latest architecture since YARN
( Yet Another Resource Negotiator) & High Availability
have been introduced in 2.x version.
Job tracker and Task tracker are replaced with Resource Manager, Node Manager and Applications Manager.
Hadoop 2.x YARN & High Availability
For MapR
architecture, refer to MapR article
For comparison between different distributors, refer to this image
Detailed comparison is available at Data-magnum article by Bill Vorhies
Mapr uses most of Apache bigdata distributions as their baseline.
Mapr is a hadoop (and bigdata technology stacks) distribution provider with certain add-ons and technical support to its client.
Underline the mapr is entirely on the same architecture as of apache hadoop including all the core library distribution. However mapr distribution is more like a bundle of a complete and compatible bigdata technology package.
The main benefit of mapr is that it's distribution of various technologies like hive, hbase, spark etc will be compatible with core hadoop and among each other. This I'd particularly important because the bigdata technologies are evolving in different pace and hence news releases becomes incompatible very soon.
So, the vendors like mapr, cloudera etc are providing their version of hadoop didtribution and support such that end users can concentrate on the product building without worrying about the compatibility issues. But almost all of them are using apache distribution under the carpet.
In future, they might come up certain variation and additional features in an attempt to prevent client's switch to other vendors, but as of now is not the case.