I am working on image segmentation machine learning project and I would like to test it out on Google Colab.
For the training dataset, I have 700 images, mostly 256x256
, that I need to upload into a python numpy array for my project. I also have thousands of corresponding mask files to upload. They currently exist in a variety of subfolders on Google drive, but have been unable to upload to Google Colab for use in my project.
So far I have attempted using Google Fuse which seems to have very slow upload speeds and PyDrive which has given me a variety of authentication errors. I have been using the Google Colab I/O example code for the most part.
How should I go about this? Would PyDrive be the way to go? Is there code somewhere for uploading a folder structure or many files at a time?
You may want to try the
kaggle-cli
module, as discussed hereYou can put all your data into your google drive and then mount drive. This is how I have done it. Let me explain in steps.
Step 1: Transfer your data into your google drive.
Step 2: Run the following code to mount you google drive.
Step 3: Run the following line to check if you can see your desired data into mounted drive.
Step 4:
Now load your data into numpy array as follows. I had my exel files having my train and cv and test data.
I hope it can help.
Edit
For downloading the data into your drive from the colab notebook environment, you can run the following code.
Here are few steps to upload large dataset to Google Colab
1.Upload your dataset to free cloud storage like dropbox, openload, etc.(I used dropbox)
2.Create a shareable link of your uploaded file and copy it.
3.Open your notebook in Google Colab and run this command in one of the cell:
That's it!
You can compress your dataset in zip or rar file and later unizp it after downloading it in Google Colab by using this command:
Zip you file first then upload it to Google Drive.
See this simple command to unzip:
Example:
Step1: Mount the Drive, by running the following command:
This will output a link. Click on the link, hit allow, copy the authorization code and paste it the box present in colab cell with the text "Enter your authorization code:" written on top of it. This process is just giving permission for colab to access your Google Drive.
Step2: Upload your folder(zipped or unzipped depending on the size of the folder) to Google Drive
Step3: Now work your way into the Drive directories and files to locate your uploaded folder/zipped file.
This process may look something like this: The current working directory in colab when you start off will be /content/ Just to make sure, run the following command in the cell:
It will show you the current directory you are in. (pwd stands for "print working directory") Then use the commands like:
to list the directories and files in the directory you are in and the command:
to move into the directories to locate your uploaded folder or the uploaded .zip file.
And just like that, you are ready to get your hands dirty with your Machine Learning model! :)
Hopefully, these simple steps will prevent you from spending too much unnecessary time on figuring out how colab works when you should actually be spending the majority of your time figuring out the Machine learning model, its hyperparameters, pre-processing...