I have a BigQuery table - day partitioned, and clustered. However, it still uses a lot of data when I run queries over it. How is this possible?
相关问题
- H2 database: clustered indexes support
- How to export crash-free users from firebase?
- Updating a value in an ARRAY in a BigQuery table
- Import json data with null values
- How to transform IP addresses into geolocation in
相关文章
- BigQuery - Concatenate multiple rows into a single
- Where do you get Google Bigquery usage info (mainl
- How do I configure Google BigQuery command line to
- In Azure Cosmos DB, can we change partition key la
- How can I make integration tests with google cloud
- How to get gcloud auth activate-service-account pe
-
Error in Google BigQuery
- CLI “bq load” - how to use non-printable character
Sometimes no partitions, or weekly/monthly/yearly partitions will work way better than having a daily partitioned table + clustering.
This because each cluster of data in BigQuery has a minimum size. If each day of data in a daily partitioned table has less than that amount of data, you won't see any benefits at all from clustering your table.
For example, let's create a table with 30+ years of weather. I will partition this table by month (to fit multiple years into one table):
Now, let's run a query over it - using the clustering field
name
:Now, let's do this over an identical table - partitioned by a fake date (so no partitioning really), and clustered by the same column:
Only 62.8 MB of data (vs 3.1GB) were processed!
This because clustering without partitions is much more efficient on tables that don't have a lot of GB per day.
Bonus: Clustered by geo: