Extract Google Drive zip from Google colab noteboo

2020-02-26 08:04发布

I already have a zip of (2K images) dataset on a google drive. I have to use it in a ML training algorithm. Below Code extracts the content in a string format:

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import io
import zipfile
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = '1T80o3Jh3tHPO7hI5FBxcX-jFnxEuUE9K' #-- Updated File ID for my zip
downloaded = drive.CreateFile({'id': file_id})
#print('Downloaded content "{}"'.format(downloaded.GetContentString(encoding='cp862')))

But I have to extract and store it in a separate directory as it would be easier for processing (as well as for understanding) of the dataset.

I tried to extract it further, but getting "Not a zipfile error"

dataset = io.BytesIO(downloaded.encode('cp862'))
zip_ref = zipfile.ZipFile(dataset, "r")
zip_ref.extractall()
zip_ref.close()

Google Drive Dataset

Note: Dataset is just for reference, I have already downloaded this zip to my google drive, and I'm referring to file in my drive only.

9条回答
叛逆
2楼-- · 2020-02-26 08:20

Instead of GetContentString(), use GetContentFile() instead. It will save the file instead of returning the string.

downloaded.GetContentFile('images.zip') 

Then you can unzip it later with unzip.

查看更多
Lonely孤独者°
3楼-- · 2020-02-26 08:21

To extract Google Drive zip from a Google colab notebook:

import zipfile
from google.colab import drive

drive.mount('/content/drive/')

zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()
查看更多
劫难
4楼-- · 2020-02-26 08:23

SIMPLE WAY TO CONNECT

1) You'll have to verify authentication

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()

2)To fuse google drive

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

3)To verify credentials

import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

4)Create a drive name to use it in colab ('gdrive') and check if it's working

!mkdir gdrive
!google-drive-ocamlfuse gdrive
!ls gdrive
!cd gdrive
查看更多
何必那么认真
5楼-- · 2020-02-26 08:26

TO unzip a file to a directory:

!unzip path_to_file.zip -d path_to_directory
查看更多
家丑人穷心不美
6楼-- · 2020-02-26 08:32

Mount GDrive:

from google.colab import drive
drive.mount('/content/gdrive')

Open the link -> copy authorization code -> paste that into the prompt and press "Enter"

Check GDrive access:

!ls "/content/gdrive/My Drive"

Unzip (q stands for "quiet") file from GDrive:

!unzip -q "/content/gdrive/My Drive/dataset.zip"
查看更多
Lonely孤独者°
7楼-- · 2020-02-26 08:34

You can use this simply this

!unzip file_location
查看更多
登录 后发表回答