How to pass variables in spark SQL, using python?

2020-08-12 18:05发布

问题:

I am writing spark code in python. How do I pass a variable in a spark.sql query?

    q25 = 500
    Q1 = spark.sql("SELECT col1 from table where col2>500 limit $q25 , 1")

Currently the above code does not work? How do we pass variables?

I have also tried,

    Q1 = spark.sql("SELECT col1 from table where col2>500 limit q25='{}' , 1".format(q25))

回答1:

You need to remove single quote and q25 in string formatting like this:

Q1 = spark.sql("SELECT col1 from table where col2>500 limit {}, 1".format(q25))

Update:

Based on your new queries:

spark.sql("SELECT col1 from table where col2>500 order by col1 desc limit {}, 1".format(q25))

Note that the SparkSQL does not support OFFSET, so the query cannot work.

If you need add multiple variables you can try this way:

q25 = 500
var2 = 50
Q1 = spark.sql("SELECT col1 from table where col2>{0} limit {1}".format(var2,q25))


回答2:

All you need to do is add s (String interpolator) to the string. This allows the usage of variable directly into the string.

val q25 = 10
Q1 = spark.sql(s"SELECT col1 from table where col2>500 limit $q25)


回答3:

Another option if you're doing this sort of thing often or want to make your code easier to re-use is to use a map of configuration variables and the format option:

configs = {"q25":10,
           "TABLE_NAME":"my_table",
           "SCHEMA":"my_schema"}
Q1 = spark.sql("""SELECT col1 from {SCHEMA}.{TABLE_NAME} 
                  where col2>500 
                  limit {q25}
               """.format(**configs))