What are the limitations of apache drill?

2019-08-14 17:34发布

问题:

  • what are the limitations of Apache Drill?
  • where it fails when compared to Apache hive/impala?

回答1:

My view on drill holistically,

One of the main advantage of Apache drill is you can query across multiple databases. You just need to configure the sources & directly query them. Thats the biggest advantage of Apache drill. It was proved that its a best query among many other technologies.(check reference 2)

I cannot call it as limitations but since its a query engine like it just takes the sql query parse using calcite query engine & executes the query on the nodes. It won't take care of the failure/cancellation of a query execution. Your application needs to take care of this.

Since its evolving version it has many limitations like.

  1. There are not much of aggregate function just like oracle/mySql eg. MINUS,DECODE,TO_TIMESTAMP(very minimal), GREATEST,LEAST,
  2. Even on the user defined functions as well it very minimal things you can do.
  3. No hierarchical query support (connect by prior in oracle)
  4. It cannot read xml data (only json, csv, parquet..)
  5. No single-row sub query support.
  6. Limitations on Join
  7. It has no schema, so it might create some confusion.

Apache Drill is still an evolving version & all the issues/limitations are going to support in the next versions of Drill.

Hope it helps.

References:

  1. https://issues.apache.org/jira/browse/DRILL/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
  2. http://allegro.tech/2015/06/fast-data-hackathon.html
  3. https://drill.apache.org/docs/compiling-drill-from-source/
  4. https://drill.apache.org/docs/nested-data-limitations/
  5. http://www.dbta.com/BigDataQuarterly/Articles/The-Importance-of-Apache-Drill-to-the-Big-Data-Ecosystem-103000.aspx
  6. https://www.mapr.com/blog/top-10-reasons-using-apache-drill-now-part-mapr-distribution-including-hadoop