How to get table names from SQL query?

2019-02-25 03:57发布

问题:

I want to get all the tables names from a sql query in Spark using Scala.

Lets say user sends a SQL query which looks like:

select * from table_1 as a left join table_2 as b on a.id=b.id

I would like to get all tables list like table_1 and table_2.

Is regex the only option ?

回答1:

Thanks a lot @Swapnil Chougule for the answer. That inspired me to offer an idiomatic way of collecting all the tables in a structured query.

scala> spark.version
res0: String = 2.3.1

def getTables(query: String): Seq[String] = {
  val logicalPlan = spark.sessionState.sqlParser.parsePlan(query)
  import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
  logicalPlan.collect { case r: UnresolvedRelation => r.tableName }
}

val query = "select * from table_1 as a left join table_2 as b on a.id=b.id"
scala> getTables(query).foreach(println)
table_1
table_2


回答2:

Hope it will help you

Parse the given query using spark sql parser (spark internally does same). You can get sqlParser from session's state. It will give Logical plan of query. Iterate over logical plan of query & check whether it is instance of UnresolvedRelation (leaf logical operator to represent a table reference in a logical query plan that has yet to be resolved) & get table from it.

def getTables(query: String) : Seq[String] ={
    val logical : LogicalPlan = localsparkSession.sessionState.sqlParser.parsePlan(query)
    val tables = scala.collection.mutable.LinkedHashSet.empty[String]
    var i = 0
    while (true) {
      if (logical(i) == null) {
        return tables.toSeq
      } else if (logical(i).isInstanceOf[UnresolvedRelation]) {
        val tableIdentifier = logical(i).asInstanceOf[UnresolvedRelation].tableIdentifier
        tables += tableIdentifier.unquotedString.toLowerCase
      }
      i = i + 1
    }
    tables.toSeq
}


回答3:

If you want to list all the columns in your dataframe, this should do

val mydf_cols = df.describe()


回答4:

Since you need to list all the columns names listed in table1 and table2, what you can do is to show tables in db.table_name in your hive db.

val tbl_column1 = sqlContext.sql("show tables in table1");
val tbl_column2 = sqlContext.sql("show tables in table2");

You will get list of columns in both the table.

tbl_column1.show

name      
id  
data    


回答5:

unix did the trick, grep 'INTO\|FROM\|JOIN' .sql | sed -r 's/.?(FROM|INTO|JOIN)\s?([^ ])./\2/g' | sort -u

grep 'overwrite table' .txt | sed -r 's/.?(overwrite table)\s?([^ ])./\2/g' | sort -u