Azure DataFactory Incremental BLOB copy

I've made a pipeline to copy data from one blob storage to another. I want to have incremental copy if it's possible, but haven't found a way to specify it. The reason is I want to run this on a schedule and only copy any new data since last run.

标签： azure azure-data-factory azure-data-factory-2

2条回答

Bombasti

2楼-- · 2019-08-17 13:31

If your blob name is well named with timestamp, you could follow this doc to copy partitioned data. You could use copy data tool to setup the pipeline. You could select tumbling window and then in file path filed input {year}/{month}/{day}/fileName and choose the right pattern. It will help you construct the parameters.
If you blob name is not well named with timestamp, you could use get metadata activity to check the last modified time. Please reference this post.

Event trigger is just one way to control when the pipeline should run. You could also use tumbling window trigger or schedule trigger in your scenarios.

0人赞添加讨论(0) 举报

【Aperson】

3楼-- · 2019-08-17 13:37

I'm going to presume that by 'incremental' you mean new blobs added to a container. There is no easy way to copy changes to a specific blob.

So, this is not possible automatically when running on a schedule since 'new' is not something the scheduler can know.

Instead, you can use a Blob created Event Trigger, then cache the result (Blob name) somewhere else. Then, when your schedule runs, it can read those names and copy only those blobs.

You have many options to cache. A SQL Table, another blob.

Note: The complication here is trying to do this on a schedule. If you can adjust the parameters to merely copy every new file, it's very, very easy because you can just copy the blob that created the trigger.

Another option is to copy the blob on create using the trigger to a temporary/staging container, then use a schedule to move those files to the ultimate destination.

0人赞添加讨论(0) 举报

Azure DataFactory Incremental BLOB copy

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间