Scenario
I have a source file that contains blocks of JSON on each new line.
I then have a simple U-SQL extract as follows where [RawString] represents each new line in the file and the [FileName] is defined as a variable from the @SourceFile path.
@BaseExtract =
EXTRACT
[RawString] string,
[FileName] string
FROM
@SourceFile
USING
Extractors.Text(delimiter:'\b', quoting : false);
This executes without failure for the majority of my data and I'm able to parse the [RawString] as JSON further down in my script without any problems.
However, I seem to have an extra long row of data in a recent file that cannot be extracted.
Errors
Executing this both locally in Visual Studio and against my Data Lake Analytics service in Azure I get the following.
E_RUNTIME_USER_EXTRACT_COLUMN_CONVERSION_TOO_LONG
Value too long failure when attempting to convert column data.
Can not convert string to proper type. The resulting data length is too long.
See screen shots below.
Having checked this with other tools I can confirm the length of the longest line in the source file is 189,943 characters.
Questions
So my questions for you my friends...
- Has anyone else hit this limit?
- What is the defined char line limit?
- What is the best way to work round this?
- Is a custom extractor going to be required?
Other Things
A couple of other thoughts...
- As each new line in the file is a self contained JSON block of data I can't split the line.
- If a manually copy the single long line into a separate file and format the JSON USQL handles it as expected with the Newtonsoft.Json libraries.
- Currently I'm using VS2015 with Data Lake Tools version 2.2.7.
Thank you in advance for your support with this.