I am quite new to u-sql, trying to solve
str1=\global\europe\Moscow\12345\File1.txt
str2=\global.bee.com\europe\Moscow\12345\File1.txt
str3=\global\europe\amsterdam\54321\File1.Rvt str4=\global.bee.com\europe\amsterdam\12345\File1.Rvt
case1: how do i get just "\europe\Moscow\12345\File1.txt" from the strings variable str1 & str2, i want to just take ("\europe\Moscow\12345\File1.txt") from str1 and str2 then "Groupby(\global\europe\Moscow\12345)" and take the count of distinct files from the path (""\europe\Moscow\12345\")
so the output would be something like this:
distinct_filesby_Location_Date
to solve the above case i tried the below u-sql code but not quite sure whether i am writing the right script or not:
@inArray = SELECT new SQL.ARRAY<string>(
filepath.Contains("\\europe")) AS path
FROM @t;
@filesbyloc =
SELECT [ID],
path.Trim() AS path1
FROM @inArray
CROSS APPLY
EXPLODE(path1) AS r(location);
OUTPUT @filesbyloc
TO "/Outputs/distinctfilesbylocation.tsv"
USING Outputters.Tsv();
any help would you greatly appreciated.
Taking the liberty to format your input file as TSV file, and not knowing all the column semantics, here is a way to write your query. Please note that I made the assumptions as provided in the comments.
UPDATE AFTER RECEIVING SOME EXAMPLE DATA IN PRIVATE EMAIL:
Based on the data you sent me (after the extraction and counting of the cities that you either could do with the join as outlined in Bob's answer where you need to know your cities in advance, or with the taking the string from the city location in the path as in my example, where you do not need to know the cities in advance), you want to pivot the rowset
city, count, date
into the rowsetdate, city1, city2, ...
were each row contains the date and the counts for each city.You could easily adjust my example above by changing the calculations of
@res2
in the following way:Note that as in my example, you will need to enumerate all cities in the pivot statement by looking it up in the SQL.MAP column. If that is not known apriori, you will have to first submit a script that creates the script for you. For example, assuming your
city, count, date
rowset is in a file (or you could just duplicate the statements to generate the rowset in the generation script and the generated script), you could write it as the following script. Then take the result and submit it as the actual processing script.Now download the file and submit it.
One approach to this is to put all the strings you want to work with in a file, eg
strings.txt
and save it in your U-SQL input folder. Also have a file with the cities in you want to match, eg cities.txt. Then try the following U-SQL script:My results:
HTH