可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Are there any dependencies between Spark and Hadoop?

If not, are there any features I'll miss when I run Spark without Hadoop?

回答1:

Spark can run without Hadoop but some of its functionality relies on Hadoop's code (e.g. handling of Parquet files). We're running Spark on Mesos and S3 which was a little tricky to set up but works really well once done (you can read a summary of what needed to properly set it here).

回答2:

Spark is an in-memory distributed computing engine.

Hadoop is a framework for distributed storage (HDFS) and distributed processing (YARN).

Spark can run with or without Hadoop components (HDFS/YARN)

Distributed Storage:

Since Spark does not have its own distributed storage system, it has to depend on one of these storage systems for distributed computing.

S3 – Non-urgent batch jobs. S3 fits very specific use cases when data locality isn’t critical.

Cassandra – Perfect for streaming data analysis and an overkill for batch jobs.

HDFS – Great fit for batch jobs without compromising on data locality.

Distributed processing:

You can run Spark in three different modes: Standalone, YARN and Mesos

Have a look at the below SE question for a detailed explanation about both distributed storage and distributed processing.

Which cluster type should I choose for Spark?

回答3:

By default , Spark does not have storage mechanism.

To store data, it needs fast and scalable file system. You can use S3 or HDFS or any other file system. Hadoop is economical option due to low cost.

Additionally if you use Tachyon, it will boost performance with Hadoop. It's highly recommended Hadoop for apache spark processing.

回答4:

Yes, spark can run without hadoop. All core spark features will continue to work, but you'll miss things like easily distributing all your files (code as well as data) to all the nodes in the cluster via hdfs, etc.

回答5:

Yes, you can install the Spark without the Hadoop. That would be little tricky You can refer arnon link to use parquet to configure on S3 as data storage. http://arnon.me/2015/08/spark-parquet-s3/

Spark is only do processing and it uses dynamic memory to perform the task, but to store the data you need some data storage system. Here hadoop comes in role with Spark, it provide the storage for Spark. One more reason for using Hadoop with Spark is they are open source and both can integrate with each other easily as compare to other data storage system. For other storage like S3, you should be tricky to configure it like mention in above link.

But Hadoop also have its processing unit called Mapreduce.

Want to know difference in Both?

Check this article: https://www.dezyre.com/article/hadoop-mapreduce-vs-apache-spark-who-wins-the-battle/83

I think this article will help you understand

what to use,
when to use and
how to use !!!

回答6:

As per Spark documentation, Spark can run without Hadoop.

You may run it as a Standalone mode without any resource manager.

But if you want to run in multi-node setup, you need a resource manager like YARN or Mesos and a distributed file system like HDFS,S3 etc.

回答7:

Yes, of course. Spark is an independent computation framework. Hadoop is a distribution storage system(HDFS) with MapReduce computation framework. Spark can get data from HDFS, as well as any other data source such as traditional database(JDBC), kafka or even local disk.

回答8:

Yes, Spark can run with or without Hadoop installation for more details you can visit -https://spark.apache.org/docs/latest/

回答9:

No. It requires full blown Hadoop installation to start working - https://issues.apache.org/jira/browse/SPARK-10944

Can apache spark run without hadoop?

问题:

回答1:

回答2:

Distributed Storage:

Distributed processing:

回答3:

回答4:

回答5:

回答6:

回答7:

回答8:

回答9:

收藏的人(0)

Can apache spark run without hadoop?

问题:

回答1:

回答2:

Distributed Storage:

Distributed processing:

回答3:

回答4:

回答5:

回答6:

回答7:

回答8:

回答9:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮