可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am going to use spark-sql cli to replace the hive cli shell, and I run the spark-sql cli with following the command,(We are using on yarn Hadoop cluster, the hive-site.xml already copied to /conf)

.> spark-sql Then the shell is opened and works ok,

And I execute a query something like,

./spark-sql>select devicetype, count(*) from mytable group by devicetype;

The command execute successfully and the result is correct. But I notice the performance is very slow.

From the spark job ui, http://myhost:4040, I noticed that only 1 Executor marked used, so that’s maybe the reason.

And I try to modify the spark-sql script and add the –num-executors 500 in the exec command, but it does not help.

So anyone could help and explain why?

Thanks.

回答1:

Refer to the documentation: http://spark.apache.org/docs/latest/sql-programming-guide.html

spark-sql is an SQL CLI tool that works only in local mode, that is why you see only one executor

If you want to have a cluster version of SQL, you should start thriftserver and connect to it via JDBC using beeline tool (that goes with Spark), for example. You can find the description in chapter Running the Thrift JDBC/ODBC server of the official documentation http://spark.apache.org/docs/latest/sql-programming-guide.html

To start:

export HIVE_SERVER2_THRIFT_PORT=<listening-port>
export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
./sbin/start-thriftserver.sh \
  --master <master-uri> \
  ...

To connect:

./bin/beeline
beeline> !connect jdbc:hive2://<listening-host>:<listening-port>

回答2:

beeline \> !connect jdbc:hive2://localhost:10002/default;transportMode=http;httpPath=cliservice

10002 is my port for the spark thrift server.

change it to yours. you can find your thrift port from your thrift log.

Spark-sql CLI use only 1 executor when running que

问题:

回答1:

回答2:

收藏的人(0)

Spark-sql CLI use only 1 executor when running que

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮