I just started working with the new Google Cloud Datalab and IPython last week (though I've been using BigQuery for a few months). The tutorials and samples in github are very helpful, but as my scripts and queries become more complex I'm wondering a few things. The first one is this: can I refer to queries defined in one %%sql module in a later %%sql module? The other, somewhat related question is can I somehow store the results from one %%sql module and then put that information into something like an IN clause in a subsequent %%sql module?
问题:
回答1:
Here's some things to try and see if they meet your needs. If they don't, I welcome you to file issues in github, as I think both of your scenarios are things we want to make sure work well.
For the first, it requires a combination of sql cells and code cells [for now]
Cell 1
%%sql --module m1
DEFINE QUERY q1
SELECT ...
Cell 2
%%sql --module m2
DEFINE QUERY q2
SELECT ... FROM $src ...
Cell 3
import gcp.bigquery as bq
compositequery = bq.Query(m2.q2, src = m1.q1)
Essentially, %%sql modules are turned into auto-imported python modules behind the scenes.
I used to split out queries per %%sql cell myself, but since the introduction of modules, I also depending on the scenario, define multiple queries within a single module, where you don't need a bit of python code stitching together. Depends on your scenario, which is better.
For your second question, again, if the queries are split across cells, you'll need some python glue in the middle. Execute one query, get its result, and use that as a parameter for the next query. This would work for general scalar values, but for IN clauses and tuples/lists of values, we have this issue we need to address: https://github.com/GoogleCloudPlatform/datalab/issues/615
For more ideas on how you can use JOINs in BigQuery to produce scalar results in one query that you consume in the next query, you can also see the query under Step 3 in the BigQuery tutorial notebook titled "SQL Query Composition".
Hope that helps.
As mentioned, if you hit specific issues where something didn't work as you expected, please do file an issue, and we can see if it makes sense to address, and possibly you or someone else might even step up to make a contribution. :)