Would it be possible to automatically split a table into several files based on column values if I don't know how many different key values the table contains? Is it possible to put the key value into the filename?
相关问题
- running headless chrome in an microsoft azure web
- Docker task in Azure devops won't accept "$(pw
- Register MicroServices in Azure Active Directory (
- Removing VHD's from Azure Resource Manager aft
- Cannot use the Knowledge academic API
相关文章
- SQL Azure Reset autoincrement
- How to cast Azure DocumentDB Document class to my
- Can't get azure web role to run locally using
- Azure WebApp - Unable to auto-detect the runtime s
- How to change region for Azure WebSite
- Azure webjob vs cloud service
- Azure data transfer Identity Column Seed Jumped by
- Download Azure web app?
There's a new feature in public preview:
You can add it at the beginning of the script, and the output data can be partitioned by the key you choose:
Another example can be found in article
Process more files than ever and use Parquet with Azure Data Lake Analytics
section "Putting it all together in a simple end-to-end example".
Great question! I'll be interested to see what Mr Rys responds with.
Apologies, but this is only half an answer.
My first thoughts are to partition an ADL table using your key value. But then I'm not sure how you'd deal with the separate outputs if a potential WHERE clause isn't deterministic. Maybe CROSS JOIN in every result and .... pass!
It would be nice to have a WHILE loop with some dynamic code!
Check out this post on the MS forums that talks about dynamic input datasets. Just as an FYI.
https://social.msdn.microsoft.com/Forums/en-US/aa475035-2d57-49b8-bdff-9cccc9c8b48f/usql-loading-a-dynamic-set-of-files?forum=AzureDataLake
This is our top ask (and has been previously asked on stackoverflow too :). We are currently working on it and hopefully have it available by summer.
Until then you have to write a script generator. I tend to use U-SQL to generate the script but you could do it with Powershell or T4 etc.
Here is an example:
Let's assume you want to write files for the column
name
in the following table/rowset@x
:You would write a script to generate the script like the following:
Then you take
genscript.usql
, prepend the calculation of@x
and submit it to get the data partitioned into the two files.