I would like to know if there are any tips and tricks to find error in data lake analytics jobs. The error message seems most of the time to be not very detailed.
When trying to extract from CSV file I often get error like this
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0] with >error: Vertex user code error.
Vertex failed with a fail-fast error
It seems that these error occur when trying to convert the columns to specified types.
The technique I found is to extract all columns to string and then do a SELECT that will try to convert the columns to the expected type. Doing that columns by columns can help find the specific column in error.
@data =
EXTRACT ClientID string,
SendID string,
FromName string,
FROM "wasb://..."
USING Extractors.Csv();
//convert some columns to INT, condition to skip header
@clean =
SELECT Int32.Parse(ClientID) AS ClientID,
Int32.Parse(SendID) AS SendID,
FromName,
FROM @data
WHERE !ClientID.StartsWith("ClientID");
Is it also possible to use something like a TryParse to return null or default values in case of a parsing error, instead of the whole job failing?
Thanks
Here is a solution without having to use code behind (although Codebehind will make your code a bit more readable):
Also, the problem you see regarding error message being cryptic has to do with a bug that should be fixed soon in returning so called inner error messages. The work around today is to do the following:
That should give you the exact error message.
Yes, you can use TryParse using U-SQL user defined functions. You can do this like:
In code behind:
In U-SQL Script:
It looks like you have some other issues, as I always get appropriate error in case of conversion problem, something like: