I heard about Whole-Stage Code Generation
for sql to optimize queries.
through p539-neumann.pdf & sparksql-sql-codegen-is-not-giving-any-improvemnt
But unfortunately no one gave answer to above question.
Curious to know about what are the scenarios to use this feature of Spark 2.0. But didn't get proper use-case after googling.
Whenever we are using sql, can we use this feature? if so, any proper use case to see this working?
When you are using Spark 2.0, code generation is enabled by default. This allows for most DataFrame queries you are able to take advantage of the performance improvements. There are some potential exceptions such as using Python UDFs that may slow things down.
Code generation is one of the primary components of the Spark SQL engine's Catalyst Optimizer. In brief, the Catalyst Optimizer engine does the following:
(1) analyzing a logical plan to resolve references,
(2) logical plan optimization
(3) physical planning, and
(4) code generation
![](https://www.manongdao.com/static/images/pcload.jpg)
A great reference to all of this are the blog posts
HTH!