Can apache spark run without hadoop?

2019-01-21 03:24发布

Are there any dependencies between Spark and Hadoop?

If not, are there any features I'll miss when I run Spark without Hadoop?

9条回答
戒情不戒烟
2楼-- · 2019-01-21 03:30

Spark can run without Hadoop but some of its functionality relies on Hadoop's code (e.g. handling of Parquet files). We're running Spark on Mesos and S3 which was a little tricky to set up but works really well once done (you can read a summary of what needed to properly set it here).

查看更多
ゆ 、 Hurt°
3楼-- · 2019-01-21 03:30

Yes, you can install the Spark without the Hadoop. That would be little tricky You can refer arnon link to use parquet to configure on S3 as data storage. http://arnon.me/2015/08/spark-parquet-s3/

Spark is only do processing and it uses dynamic memory to perform the task, but to store the data you need some data storage system. Here hadoop comes in role with Spark, it provide the storage for Spark. One more reason for using Hadoop with Spark is they are open source and both can integrate with each other easily as compare to other data storage system. For other storage like S3, you should be tricky to configure it like mention in above link.

But Hadoop also have its processing unit called Mapreduce.

Want to know difference in Both?

Check this article: https://www.dezyre.com/article/hadoop-mapreduce-vs-apache-spark-who-wins-the-battle/83

I think this article will help you understand

  • what to use,

  • when to use and

  • how to use !!!

查看更多
祖国的老花朵
4楼-- · 2019-01-21 03:32

No. It requires full blown Hadoop installation to start working - https://issues.apache.org/jira/browse/SPARK-10944

查看更多
仙女界的扛把子
5楼-- · 2019-01-21 03:33

Yes, Spark can run with or without Hadoop installation for more details you can visit -https://spark.apache.org/docs/latest/

查看更多
仙女界的扛把子
6楼-- · 2019-01-21 03:38

Yes, spark can run without hadoop. All core spark features will continue to work, but you'll miss things like easily distributing all your files (code as well as data) to all the nodes in the cluster via hdfs, etc.

查看更多
Emotional °昔
7楼-- · 2019-01-21 03:42

By default , Spark does not have storage mechanism.

To store data, it needs fast and scalable file system. You can use S3 or HDFS or any other file system. Hadoop is economical option due to low cost.

Additionally if you use Tachyon, it will boost performance with Hadoop. It's highly recommended Hadoop for apache spark processing. enter image description here

查看更多
登录 后发表回答