Databricks: Download a dbfs:/FileStore File to my

2019-07-23 15:13发布

I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result.

I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all files to my local machine.

I have tried to use cURL, but I can't find the RestAPI command to download a dbfs:/FileStore file.

Question: How can I download a dbfs:/FileStore file to my Local Machine?

I am using Databricks Community Edition to teach an undergraduate module in Big Data Analytics in college. I have Windows 7 installed in my local machine. I have checked that cURL and the _netrc files are properly installed and configured as I manage to successfully run some of the commands provided by the RestAPI.

Thank you very much in advance for your help! Best regards, Nacho

2条回答
▲ chillily
2楼-- · 2019-07-23 15:18

Using browser, you can access to individual file in File Store. You cannot access or even list directories. So you first have to put some file into the file store. If you've got a file "example.txt" at "/FileStore/example_directory/", you can download it via the following URL:

https://community.cloud.databricks.com/files/example_directory/example.txt?o=###

In that URL, "###" has to be replaced by the long number you find at the end of your community edition URL (after you logged into your community edition account).

Add comment · Share

查看更多
Anthone
3楼-- · 2019-07-23 15:40

There are a few options for downloading FileStore files to your local machine.

Easier options:

  • Install the Databricks CLI, configure it with your Databricks credentials, and use the CLI's dbfs cp command. For example: dbfs cp dbfs:/FileStore/test.txt ./test.txt. If you want to download an entire folder of files, you can use dbfs cp -r.
  • From a browser signed into Databricks, navigate to https://<YOUR_DATABRICKS_INSTANCE_NAME>.cloud.databricks.com/files/. If you are using Databricks Community Edition then you may need to use a slightly different path. This download method described in more detail in the FileStore docs.

Advanced options:

  • Use the DBFS REST API. You can access file contents using the read API call. To download a large file, you may need to issue multiple read calls to access chunks of the full file.
查看更多
登录 后发表回答