- what are the limitations of Apache Drill?
- where it fails when compared to Apache hive/impala?
问题:
回答1:
My view on drill holistically,
One of the main advantage of Apache drill is you can query across multiple databases. You just need to configure the sources & directly query them. Thats the biggest advantage of Apache drill. It was proved that its a best query among many other technologies.(check reference 2)
I cannot call it as limitations but since its a query engine like it just takes the sql query parse using calcite query engine & executes the query on the nodes. It won't take care of the failure/cancellation of a query execution. Your application needs to take care of this.
Since its evolving version it has many limitations like.
- There are not much of aggregate function just like oracle/mySql eg. MINUS,DECODE,TO_TIMESTAMP(very minimal), GREATEST,LEAST,
- Even on the user defined functions as well it very minimal things you can do.
- No hierarchical query support (connect by prior in oracle)
- It cannot read xml data (only json, csv, parquet..)
- No single-row sub query support.
- Limitations on Join
- It has no schema, so it might create some confusion.
Apache Drill is still an evolving version & all the issues/limitations are going to support in the next versions of Drill.
Hope it helps.
References:
- https://issues.apache.org/jira/browse/DRILL/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
- http://allegro.tech/2015/06/fast-data-hackathon.html
- https://drill.apache.org/docs/compiling-drill-from-source/
- https://drill.apache.org/docs/nested-data-limitations/
- http://www.dbta.com/BigDataQuarterly/Articles/The-Importance-of-Apache-Drill-to-the-Big-Data-Ecosystem-103000.aspx
- https://www.mapr.com/blog/top-10-reasons-using-apache-drill-now-part-mapr-distribution-including-hadoop