Which hadoop version should I choose among 1.x, 2.

2020-02-26 13:59发布

Hello I am new to Hadoop and pretty confused with the version names and which one should I use among 1.x ( great support and learning resources ), 2.2 or 0.23.

I have read that hadoop is moving to YARN completely from v0.23 ( link1 ).
But at the same time its all over the web that hadoop v2.0 is moving to YARN ( link2 ) and I can see the YARN configuration files in Hadoop 2.2 itself.

  • But since 0.23 seems to be the latest version to me, Does 2.2 also support YARN ? ( Refer link 1, it says hadoop will support YARN from v0.23 )
  • And as a beginner which version should I go for 1.x or 2.x for learning perspective of hadoop.
  • Are other technologies that works with hadoop like pig, hive etc. available with the latest version of hadoop?

Thanks.

UPDATE
Thankyou all for replying. I ended up using hadoop2.2 and since all famous tutorials and resources are outdated, though I found one good book to get started with v2.2.

"Hadoop: The Definitive Guide, Third Edition" by Tom White (Buy Here)

supports hadoop v2.2.

The source code is give on github https://github.com/tomwhite/hadoop-book

as mentioned on github, the code of the book is tested with

This version of the code has been tested with:
 * Hadoop 1.2.1/0.22.0/0.23.x/2.2.0
 * Avro 1.5.4
 * Pig 0.9.1
 * Hive 0.8.0
 * HBase 0.90.4/0.94.15
 * ZooKeeper 3.4.2
 * Sqoop 1.4.0-incubating
 * MRUnit 0.8.0-incubating

hope it helps..!!!

2条回答
爱情/是我丢掉的垃圾
2楼-- · 2020-02-26 14:09

I recommended you to start with hadoop-2.2.0 which gives good knowledge. Industry prefers YARN itself and in production 2.x only exists

查看更多
Melony?
3楼-- · 2020-02-26 14:18

There are a few active release series. The 1.x release series is a continuation of the 0.20 release series. A few weeks after 0.23 released, the 0.20 branch formerly known as 0.20.205 was renumbered 1.0. There is next to no functional difference between 0.20.205 and 1.0. This is just a renumbering.

The 0.23 includes several major new features includes a new MapReduce runtime, called MapReduce 2, implemented on a new system called YARN (Yet Another Resource Negotiator), which is a general resource management system for running distributed applications. Similarly, 2.x release is a continuation of the 0.23 release series. So the 2.2 also support YARN.

According to Hadoop 2.2 release note

  • 1.2.X - current stable version, 1.2 release

  • 2.2.X - current stable 2.x version

  • 0.23.X - similar to 2.X.X but missing NN HA.

I would suggest starting with Cloudera distribution since you just start learning. The CDH 4.5 includes the YARN feature you are looking for. You can also try HortonWorks distribution. The advantage of going with these vendors is that you do not need to worry about which version of components such as Hive, Pig to work with your Hadoop installation.

查看更多
登录 后发表回答