spark - application returns different results base

2019-06-03 10:30发布

I am noticing some peculiar behaviour, i have spark job which reads the data and does some grouping ordering and join and creates an output file.

The issue is when I run the same job on yarn with memory more than what the environment has eg the cluster has 50 GB and i submit spark-submit with close to 60 GB executor and 4gb driver memory. My results gets decreased seems like one of the data partitions or tasks are lost while processing.

driver-memory 4g --executor-memory 4g --num-executors 12

I also notice the warning message on driver -

WARN util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf. 

but when i run with limited executors and memory example 15GB, it works and i get exact rows/data. no warning message.

driver-memory 2g --executor-memory 2g --num-executors 4

any suggestions are we missing some settings on cluster or anything? Please note my job completes successfully in both the cases. I am using spark version 2.2.

1条回答
叛逆
2楼-- · 2019-06-03 11:15

This is meaningless (except maybe for debugging) - the plan is larger when there are more executors involved and the warning is that it is too big to be converted into a string. if you need it you can set spark.debug.maxToStringFields to a larger number (as suggested in the warning message)

查看更多
登录 后发表回答