I am trying to generate a dynamic FROM clause in U-SQL so that we can extract data from different files based on a previous query outcome. That's something like this:
@filesToExtract = SELECT whatevergeneratesthepaths from @foo; <-- this query generates a rowset with all the file we want to extract like: [/path/file1.csv, /path/file2.csv]
SELECT * FROM @filesToExtract; <-- here we want to extract the data from file1 and file2
I'm afraid that this kind of dynamics queries are not supported yet, but can someone help pointing me out the way to achieve this? It seems that the only feasible approach is to generate another U-SQL script and execute it afterwards.
Thanks in advance.
It is not fully clear from your question if you want the file names to be dynamically retrieved and passed to an EXTRACT statement, or the name of tables/rowsets and passed to a SELECT's FROM clause. Or both.
In general, you cannot dynamically generate source names from your U-SQL expression. You may want to file a feature request here http://aka.ms/adlfeedback for dynamically or statically parameterizable sources.
Having said that, depending on your exact requirements, there may be some ways to achieve your goals without the work-around you describe.
For example, you could write your code as a parameterized table-valued function and then pass the different rowsets with different scripts, or - if you statically can decide which rowset to choose - you can use the IF statement.
Here is a pseudo-code example:
DECLARE EXTERNAL @someconditionparameter Boolean = true;
IF (@someconditionparameter) THEN
@data = EXTRACT a int, b string FROM @fileset1 USING Extractors.Csv();
ELSE
@data = EXTRACT a int, b string FROM @file2 USING ...;
END;
@results = MyTableValuedFunction (@data);
...
If your files are schematized differently, you may be able to use flexible column sets (currently in preview, see release notes) in the TVF to handle the variability of the rowset schema.