Experience with Hadoop?-第2页回答

Have any of you tried Hadoop? Can it be used without the distributed filesystem that goes with it, in a Share-nothing architecture? Would that make sense?

I'm also interested into any performance results you have...

标签： performance distributed hadoop shared-nothing

9条回答

smile是对你的礼貌

2楼-- · 2020-06-03 06:59

Hadoop MapReduce can run ontop of any number of file systems or even more abstract data sources such as databases. In fact there are a couple of built-in classes for non-HDFS filesystem support, such as S3 and FTP. You could easily build your own input format as well by extending the basic InputFormat class.

Using HDFS brings certain advantages, however. The most potent advantage is that the MapReduce job scheduler will attempt to execute maps and reduces on the physical machines that are storing the records in need of processing. This brings a performance boost as data can be loaded straight from the local disk instead of transferred over the network, which depending on the connection may be orders of magnitude slower.

0人赞添加讨论(0) 举报

一纸荒年 Trace。

3楼-- · 2020-06-03 06:59

Parallel/ Distributed computing = SPEED << Hadoop makes this really really easy and cheap since you can just use a bunch of commodity machines!!!

Over the years disk storage capacities have increased massively but the speeds at which you read the data have not kept up. The more data you have on one disk, the slower the seeks.

Hadoop is a clever variant of the divide an conquer approach to problem solving. You essentially break the problem into smaller chunks and assign the chunks to several different computers to perform processing in parallel to speed things up rather than overloading one machine. Each machine processes its own subset of data and the result is combined in the end. Hadoop on a single node isn't going to give you the speed that matters.

To see the benefit of hadoop, you should have a cluster with at least 4 - 8 commodity machines (depending on the size of your data) on a the same rack.

You no longer need to be a super genius parallel systems engineer to take advantage of distributed computing. Just know hadoop with Hive and your good to go.

0人赞添加讨论(0) 举报

Melony?

4楼-- · 2020-06-03 07:00

Great theoretical answers above.

To change your hadoop file system to local, you can change it in "core-site.xml" configuration file like below for hadoop versions 2.x.x.

 <property>
    <name>fs.defaultFS</name>
    <value>file:///</value>
  </property>

for hadoop versions 1.x.x.

 <property>
    <name>fs.default.name</name>
    <value>file:///</value>
  </property>

0人赞添加讨论(0) 举报

上一页 1 2

Experience with Hadoop?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间