Experience with Hadoop?

2020-06-03 06:22发布

Have any of you tried Hadoop? Can it be used without the distributed filesystem that goes with it, in a Share-nothing architecture? Would that make sense?

I'm also interested into any performance results you have...

9条回答
smile是对你的礼貌
2楼-- · 2020-06-03 06:59

Hadoop MapReduce can run ontop of any number of file systems or even more abstract data sources such as databases. In fact there are a couple of built-in classes for non-HDFS filesystem support, such as S3 and FTP. You could easily build your own input format as well by extending the basic InputFormat class.

Using HDFS brings certain advantages, however. The most potent advantage is that the MapReduce job scheduler will attempt to execute maps and reduces on the physical machines that are storing the records in need of processing. This brings a performance boost as data can be loaded straight from the local disk instead of transferred over the network, which depending on the connection may be orders of magnitude slower.

查看更多
一纸荒年 Trace。
3楼-- · 2020-06-03 06:59

Parallel/ Distributed computing = SPEED << Hadoop makes this really really easy and cheap since you can just use a bunch of commodity machines!!!

Over the years disk storage capacities have increased massively but the speeds at which you read the data have not kept up. The more data you have on one disk, the slower the seeks.

Hadoop is a clever variant of the divide an conquer approach to problem solving. You essentially break the problem into smaller chunks and assign the chunks to several different computers to perform processing in parallel to speed things up rather than overloading one machine. Each machine processes its own subset of data and the result is combined in the end. Hadoop on a single node isn't going to give you the speed that matters.

To see the benefit of hadoop, you should have a cluster with at least 4 - 8 commodity machines (depending on the size of your data) on a the same rack.

You no longer need to be a super genius parallel systems engineer to take advantage of distributed computing. Just know hadoop with Hive and your good to go.

查看更多
Melony?
4楼-- · 2020-06-03 07:00

Great theoretical answers above.

To change your hadoop file system to local, you can change it in "core-site.xml" configuration file like below for hadoop versions 2.x.x.

 <property>
    <name>fs.defaultFS</name>
    <value>file:///</value>
  </property>

for hadoop versions 1.x.x.

 <property>
    <name>fs.default.name</name>
    <value>file:///</value>
  </property>
查看更多
登录 后发表回答