30Mb limit uploading to Azure DataLake using DataL

2020-02-14 02:54发布

问题:

I am receiving an error when using

_adlsFileSystemClient.FileSystem.Create(_adlsAccountName, destFilePath, stream, overwrite)

to upload files to a datalake. The error comes up with files over 30Mb. It works fine with smaller files.

The error is:

at Microsoft.Azure.Management.DataLake.Store.FileSystemOperations.d__16.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Azure.Management.DataLake.Store.FileSystemOperationsExtensions.d__23.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Azure.Management.DataLake.Store.FileSystemOperationsExtensions.Create(IFileSystemOperations operations, String accountName, String directFilePath, Stream streamContents, Nullable1 overwrite, Nullable1 syncFlag) at AzureDataFunctions.DataLakeController.CreateFileInDataLake(String destFilePath, Stream stream, Boolean overwrite) in F:\GitHub\ZutoDW\ADF_ProcessAllFiles\ADF_ProcessAllFiles\DataLakeController.cs:line 122

Has anybody else encountered this? Or observed similar behaviour? I am getting around this by splitting my files into 30Mb pieces and uploading them.

However this is impractical in the long term because the original file is 380Mb, and potentially quite a bit larger. I do not want to have 10-15 dissected files in my datalake in the long term. I would like to upload as a single file.

I am able to upload the exact same file to the datalake through the portal interface.

回答1:

Please have a try to use DataLakeStoreUploader to upload file or directory to DataLake, more demo code please refer to github sample. I test the demo and it works correctly for me. We can get the Microsoft.Azure.Management.DataLake.Store and Microsoft.Azure.Management.DataLake.StoreUploader SDK from the nuget. The following is my detail steps:

  1. Create a C# console application
  2. Add the following code

     var applicationId = "your application Id";
     var secretKey = "secret Key";
     var tenantId = "Your tenantId";
     var adlsAccountName = "adls account name";
     var creds = ApplicationTokenProvider.LoginSilentAsync(tenantId, applicationId, secretKey).Result;
     var adlsFileSystemClient = new DataLakeStoreFileSystemManagementClient(creds);
     var inputFilePath = @"c:\tom\ForDemoCode.zip";
     var targetStreamPath = "/mytempdir/ForDemoCode.zip";  //should be the '/foldername/' not the full path
     var parameters = new UploadParameters(inputFilePath, targetStreamPath, adlsAccountName, isOverwrite: true,maxSegmentLength: 268435456*2); // the default  maxSegmentLength is 256M, we can set by ourself.
     var frontend = new DataLakeStoreFrontEndAdapter(adlsAccountName, adlsFileSystemClient);
     var uploader = new DataLakeStoreUploader(parameters, frontend);
     uploader.Execute();
    
  3. Debug the application .

  4. Check from the azure portal

SDK info please refer to the packages.config file

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="Microsoft.Azure.Management.DataLake.Store" version="1.0.2-preview" targetFramework="net452" />
  <package id="Microsoft.Azure.Management.DataLake.StoreUploader" version="1.0.0-preview" targetFramework="net452" />
  <package id="Microsoft.IdentityModel.Clients.ActiveDirectory" version="3.13.8" targetFramework="net452" />
  <package id="Microsoft.Rest.ClientRuntime" version="2.3.2" targetFramework="net452" />
  <package id="Microsoft.Rest.ClientRuntime.Azure" version="3.3.2" targetFramework="net452" />
  <package id="Microsoft.Rest.ClientRuntime.Azure.Authentication" version="2.2.0-preview" targetFramework="net452" />
  <package id="Newtonsoft.Json" version="9.0.2-beta1" targetFramework="net452" />
</packages>


回答2:

It answered here.

Currently there is a size limit of 30000000 bytes. You can work around by creating an initial file and then append, both with stream size less than the limit.