Data Factory v2 - Generate a json file per row

2020-05-03 11:50发布

问题:

I'm using Data Factory v2. I have a copy activity that has an Azure SQL dataset as input and a Azure Storage Blob as output. I want to write each row in my SQL dataset as a separate blob, but I don't see how I can do this.

I see a copyBehavior in the copy activity, but that only works from a file based source.

Another possible setting is the filePattern in my dataset:

Indicate the pattern of data stored in each JSON file. Allowed values are: setOfObjects and arrayOfObjects.

setOfObjects - Each file contains single object, or line-delimited/concatenated multiple objects. When this option is chosen in an output dataset, copy activity produces a single JSON file with each object per line (line-delimited).

arrayOfObjects - Each file contains an array of objects.

The description talks about "each file" so initially I thought it would be possible, but now I've tested them it seems that setOfObjects creates a line separated file, where each row is written to a new line. The setOfObjects setting creates a file with a json array and adds each line as a new element of the array.

I'm wondering if I'm missing a configuration somewhere, or is it just not possible?

回答1:

What I did for now is to load the rows in to a SQL table and run a foreach for each record in the table. The I use a Lookup activity to have an array to loop in a Foreach activity. The foreach activity writes each row to a blob store.

For Olga's documentDb question, it would look like this:

In the lookup, you get a list of the documentid's you want to copy:

You use that set in your foreach activity

Then you copy the files using a copy activity within the foreach activity. You query a single document in your source:

And you can use the id to dynamically name your file in the sink. (you'll have to define the param in your dataset too):