I need to get a list of files to then filter this set
DECLARE @input_file string = @"\data\{*}\{*}\{*}.avro";
@filenames = SELECT filename
FROM @input_file;
@filtered = SELECT filename FROM @filenames WHERE {condition}
Something like this if it's possible...
The way to do that is define virtual columns in your fileset. You can then extract and manipulate these virtual columns like they were data fields extracted from your file. Example:
DECLARE @input_file string = "/data/{_partition1}/{_partition2}/{filename}.avro";
@rowset =
EXTRACT column1 string,
column2 int,
columnN string,
_partition1 string,
_partition2 int,
_filename string
FROM @input_file
USING <Avro extractor>
@filtered =
SELECT column1, column2, columnN, _partition1, _partition2, _filename
WHERE filename <your condition>
U-SQL will also not even read files that don't match the WHERE clause, saving you some time. (Also, the underscore in the virtual column name is not necessary, but a useful way to remember which columns came from the file and which from the path). Hope this helps!