-->

Using kaggle datasets into Google Colab

2019-01-17 02:21发布

问题:

Is it possible to use directly any datasets available in kaggle into Google Colab? I see Kaggle API in this link but apparently, it has just specific datasets and it's a bit confusable to me.

回答1:

Step-by-step --

  1. Create an API key in Kaggle.

    To do this, go to kaggle.com/ and open your user settings page.

  2. Next, scroll down to the API access section and click generate to download an API key. This will download a file called kaggle.json to your computer. You'll use this file in Colab to access Kaggle datasets and competitions.

  3. Navigate to https://colab.research.google.com/.

  4. Upload your kaggle.json file using the following snippet in a code cell:

    from google.colab import files files.upload()

  5. Install the kaggle API using !pip install -q kaggle

  6. Move the kaggle.json file into ~/.kaggle, which is where the API client expects your token to be located:

    !mkdir -p ~/.kaggle !cp kaggle.json ~/.kaggle/

  7. Now you can access datasets using the client, e.g., !kaggle datasets list.

Here's a complete example notebook of the Colab portion of this process: https://colab.research.google.com/drive/1DofKEdQYaXmDWBzuResXWWvxhLgDeVyl

This example shows uploading the kaggle.json file, the Kaggle API client, and using the Kaggle client to download a dataset.



回答2:

You should be able to access any dataset on Kaggle via the API. In this example, only the datasets for competitions are being listed. You can see that datasets you can access with this command:

kaggle datasets list

You can also search for datasets by adding the -s tag and then the search term you're interested in. So this would give you a list of datasets about dogs:

kaggle datasets list -s dogs

You can find more information on the API and how to use it in the documentation here.

Hope that helps! :)



回答3:

I have this tutorial to use Kaggle API on Google Colab directly without downloading and uploading the data set through your local machine. Kaggle API + Colaboratory



回答4:

Have a look at this.

It uses official kaggle api behind scene, but automates the process so you dont have to re-download manually every time your VM is taken away. Also, another issue i faced with using Kaggle API directly on Colab was the hassle of transferring Kaggle API token via Google Drive. Above method automates that as well.

Disclaimer: I am one of the creators of Clouderizer.



回答5:

after the steps (1-6) above, to use dataset from a particular competition in colab, you can use the command:

!kaggle competitions download -c elo-merchant-category-recommendation

( elo-merchant-category-recommendation is the name of the competition. )