I am now trying to copy data from cosmosdb to data lake store by data factory.
However, the performance is poor, about 100KB/s, and the data volume is 100+ GB, and keeps increasing. It will take 10+ days to finish, which is not acceptable.
Microsoft document https://docs.microsoft.com/en-us/azure/data-factory/data-factory-copy-activity-performance mentioned that the max speed from cosmos to data lake store is 1MB/s. Even this, the performance is still bad for us.
The cosmos migration tool doesn't work, no data exported, and no issue log.
Data lake analytics usql can extract external sources, but currently only Azure DB/DW and SQL Server are supported, no cosmosdb.
How/what tools can improve the copy performance?
According to your description, I suggest you could try to set high cloudDataMovementUnits to improve the performance.
Notice: Setting of 8 and above currently works only when you copy multiple files from Blob storage/Data Lake Store/Amazon S3/cloud FTP/cloud SFTP to Blob storage/Data Lake Store/Azure SQL Database.
So the max DMU you could set is 4.
Besides, if this speed doesn't match your current requirement.
I suggest you could write your own logic to copy the documentdb to data lake.
You could create multiple webjobs which could use parallel copy from the documentdb to data lake.
You could convert the document according to index range or partition, then you could make each webjob copy different part. In my opinion, this will be faster.
As far as I know, you could directly use the dmu, you could directly add the dmu value in the json file as below:
The webjob which could run programs or scripts in WebJobs in your Azure App Service web app in three ways: on demand, continuously, or on a schedule.
That means you could write a C# program(or using other code language) to run the programs or scripts to copy the data from documentdb to data lake(all of the logic should be written by yourself).