How can I log something in USQL UDO?

2019-04-11 17:47发布

问题:

I have custom extractor, and I'm trying to log some messages from it.

I've tried obvious things like Console.WriteLine, but cannot find where output is. However, I found some system logs in adl://<my_DLS>.azuredatalakestore.net/system/jobservice/jobs/Usql/.../<my_job_id>/.

How can I log something? Is it possible to specify log file somewhere on Data Lake Store or Blob Storage Account?

回答1:

A recent release of U-SQL has added diagnostic logging for UDOs. See the release notes here.

// Enable the diagnostics preview feature
SET @@FeaturePreviews = "DIAGNOSTICS:ON";


// Extract as one column
@input =
    EXTRACT col string
    FROM "/input/input42.txt"
    USING new Utilities.MyExtractor();


@output =
    SELECT *
    FROM @input;


// Output the file
OUTPUT @output
TO "/output/output.txt"
USING Outputters.Tsv(quoting : false);

This was my diagnostic line from the UDO:

Microsoft.Analytics.Diagnostics.DiagnosticStream.WriteLine(System.String.Format("Concatenations done: {0}", i));

This is the whole UDO:

using System.Collections.Generic;
using System.IO;
using System.Text;
using Microsoft.Analytics.Interfaces;

namespace Utilities
{
    [SqlUserDefinedExtractor(AtomicFileProcessing = true)]
    public class MyExtractor : IExtractor
    {
        //Contains the row
        private readonly Encoding _encoding;
        private readonly byte[] _row_delim;
        private readonly char _col_delim;

        public MyExtractor()
        {
            _encoding = Encoding.UTF8;
            _row_delim = _encoding.GetBytes("\n\n");
            _col_delim = '|';
        }

        public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
        {
            string s = string.Empty;
            string x = string.Empty;
            int i = 0;

            foreach (var current in input.Split(_row_delim))
            {
                using (System.IO.StreamReader streamReader = new StreamReader(current, this._encoding))
                {
                    while ((s = streamReader.ReadLine()) != null)
                    {
                        //Strip any line feeds
                        //s = s.Replace("/n", "");

                        // Concatenate the lines
                        x += s;
                        i += 1;

                    }

                    Microsoft.Analytics.Diagnostics.DiagnosticStream.WriteLine(System.String.Format("Concatenations done: {0}", i));

                    //Create the output
                    output.Set<string>(0, x);
                    yield return output.AsReadOnly();

                    // Reset
                    x = string.Empty;

                }
            }
        }
    }
}

And these were my results found in the following directory:

/system/jobservice/jobs/Usql/2017/10/20.../diagnosticstreams



回答2:

good question. I have been asking myself the same thing. This is theoretical, but I think it would work (I'll updated if I find differently).

One very hacky way is that you could insert rows into a table with your log messages as a string column. Then you can select those out and filter based on some log_producer_id column. You also get the benefit of logging if part of the script works, but later parts do not assuming the failure does not roll back. Table can be dumped at end as well to file.

For the error cases, you can use the Job Manager in ADLA to open the job graph and then view the job output. The errors often have detailed information for data-related errors (e.g. row number in file with error and a octal/hex/ascii dump of the row with issue marked with ###).

Hope this helps,

J

ps. This isn't a comment or an answer really, since I don't have working code. Please provide feedback if the above ideas are wrong.