Can apache spark run without hadoop?

Are there any dependencies between Spark and Hadoop?

If not, are there any features I'll miss when I run Spark without Hadoop?

标签： hadoop amazon-s3 apache-spark mapreduce mesos

9条回答

2楼-- · 2019-01-21 03:30

Spark can run without Hadoop but some of its functionality relies on Hadoop's code (e.g. handling of Parquet files). We're running Spark on Mesos and S3 which was a little tricky to set up but works really well once done (you can read a summary of what needed to properly set it here).

0人赞添加讨论(0) 举报

ゆ、 Hurt°

3楼-- · 2019-01-21 03:30

Yes, you can install the Spark without the Hadoop. That would be little tricky You can refer arnon link to use parquet to configure on S3 as data storage. http://arnon.me/2015/08/spark-parquet-s3/

Spark is only do processing and it uses dynamic memory to perform the task, but to store the data you need some data storage system. Here hadoop comes in role with Spark, it provide the storage for Spark. One more reason for using Hadoop with Spark is they are open source and both can integrate with each other easily as compare to other data storage system. For other storage like S3, you should be tricky to configure it like mention in above link.

But Hadoop also have its processing unit called Mapreduce.

Want to know difference in Both?

Check this article: https://www.dezyre.com/article/hadoop-mapreduce-vs-apache-spark-who-wins-the-battle/83

I think this article will help you understand

what to use,
when to use and
how to use !!!

0人赞添加讨论(0) 举报

祖国的老花朵

4楼-- · 2019-01-21 03:32

No. It requires full blown Hadoop installation to start working - https://issues.apache.org/jira/browse/SPARK-10944

0人赞添加讨论(0) 举报

仙女界的扛把子

5楼-- · 2019-01-21 03:33

Yes, Spark can run with or without Hadoop installation for more details you can visit -https://spark.apache.org/docs/latest/

0人赞添加讨论(0) 举报

仙女界的扛把子

6楼-- · 2019-01-21 03:38

Yes, spark can run without hadoop. All core spark features will continue to work, but you'll miss things like easily distributing all your files (code as well as data) to all the nodes in the cluster via hdfs, etc.

0人赞添加讨论(0) 举报

Emotional °昔

7楼-- · 2019-01-21 03:42

By default , Spark does not have storage mechanism.

To store data, it needs fast and scalable file system. You can use S3 or HDFS or any other file system. Hadoop is economical option due to low cost.

Additionally if you use Tachyon, it will boost performance with Hadoop. It's highly recommended Hadoop for apache spark processing.

0人赞添加讨论(0) 举报

1 2 下一页

Can apache spark run without hadoop?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间