I use spark to read from elasticsearch.Like
select col from index limit 10;
The problem is that the index is very large, it contains 100 billion rows.And spark generate thousands of tasks to finish the job.
All I need is 10 rows, even 1 tasks returns 10 rows that can finish the job.I don't need so many tasks.
Limit is very slow even limit 1.
Code:
sql = select col from index limit 10
sqlExecListener.sparkSession.sql(sql).createOrReplaceTempView(tempTable)