spark - application returns different results base

2019-06-03 10:30发布

I am noticing some peculiar behaviour, i have spark job which reads the data and does some grouping ordering and join and creates an output file.

The issue is when I run the same job on yarn with memory more than what the environment has eg the cluster has 50 GB and i submit spark-submit with close to 60 GB executor and 4gb driver memory. My results gets decreased seems like one of the data partitions or tasks are lost while processing.

driver-memory 4g --executor-memory 4g --num-executors 12

I also notice the warning message on driver -

WARN util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.

but when i run with limited executors and memory example 15GB, it works and i get exact rows/data. no warning message.

driver-memory 2g --executor-memory 2g --num-executors 4

any suggestions are we missing some settings on cluster or anything? Please note my job completes successfully in both the cases. I am using spark version 2.2.

标签： apache-spark yarn

1条回答

叛逆

2楼-- · 2019-06-03 11:15

This is meaningless (except maybe for debugging) - the plan is larger when there are more executors involved and the warning is that it is too big to be converted into a string. if you need it you can set spark.debug.maxToStringFields to a larger number (as suggested in the warning message)

0人赞添加讨论(0) 举报

spark - application returns different results base

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间