Google Colab: import data from google drive as pandas dataframe

Upasana | December 07, 2019 | 3 min read | 4,234 views

In this article, we will learn how to read file from drive in Google Colab:

Load data from Google Drive in Jupyter using pydrive
Import data as pandas dataframe using read_csv

Here, we will be assuming that you are familiar with the Jupyter Notebook. In case, you are not then you can follow this article.

Introduction to Google Colab Notebook

Google Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Here is youtube video link to get overview of Google Colaboratory.

Perks of using Google Colab: You can get to work on GPU for free.

Introduction to PyDrive

PyDrive is a wrapper library of google-api-python-client that simplifies many common Google Drive API tasks.

Here, we will be using Pydrive for authenticating and then read data directly from google drive itself.

Step 1 : Importing libraries & Google Authentication

Import necessary libraries to read data

Importing necessary for reading csv

import pandas as pd (1)
from pydrive.auth import GoogleAuth (2)
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

1	Importing Pandas for reading csv
2	Importing Pydrive and related libraries

After running cell containing above code, google will prompt you to visit a link. After clicking that link,

You will have to choose google account of which it will be accessing google drive.
Copy the verification code and paste it into notebook. Press enter after pasting verification code.

logging to avoid discovery_cache error

import logging
logging.getLogger('googleapiclient.discovery_cache').setLevel(logging.ERROR)

Step 2: Loading data from google drive

Now, you need to upload data to google drive and copy the id from there. You can see the id of folder in the link to file like in screenshot.

so here id is 1stjtV19iKK1BdrPasHYqDNOk98-MEdsR

This id is folder id which we can use to get id of files in this folder and then load those files.

Step 3: Get Id of data file and load data in Google Colab

Get id

file_list = drive.ListFile({'q': "'1stjtV19iKK1BdrPasHYqDNOk98-MEdsR' in parents and trashed=false"}).GetList()
for file1 in file_list:
  print('title: %s, id: %s' % (file1['title'], file1['id']))

After running the above code, you will see all the files in the folder listed down with their IDs.

Output

title: Untitled1.ipynb, id: 1j6iOyUA0NGSmwI6EuBX9mXKNIUIMqEnq

title: stack-overflow-data.csv, id: 1sCIPWY2yVYh3hwREVLwVlloEH1-WbUT4

title: Untitled0.ipynb, id: 1PwWlRHgIT2iRpaf_cMXAnTFmVmlZ_blr

Step 4: Import data as Pandas DataFrame with read_csv

Now, we will be getting content of file by using id. You can see that we have copied code from above and used here in drive.CreateFile

data_downloaded = drive.CreateFile({'id': '1sCIPWY2yVYh3hwREVLwVlloEH1-WbUT4'})
data_downloaded.GetContentFile('stack-overflow-data.csv')

Now, we can access the data with same file name and load it as pandas dataframe.

data = pd.read_csv('stack-overflow-data.csv',low_memory=False, lineterminator='\n')

Now, you are good to go and work on the data. Thanks for reading this guide.

ebook PDF - Cracking Java Interviews v3.5 by Munish Chandel

Book you may be interested in..

ebook PDF - Cracking Spring Microservices Interviews for Java Developers

Find more on this topic:

Machine Learning

Data science, machine learning, python, R, big data, spark, the Jupyter notebook, and much more

Last updated 1 week ago

Subscribe to Interview Questions

Do you like cookies? 🍪 We use cookies to ensure you get the best experience on our website. Learn more

Google Colab: import data from google drive as pandas dataframe

Introduction to Google Colab Notebook

Introduction to PyDrive

Step 1 : Importing libraries & Google Authentication

Step 2: Loading data from google drive

Step 3: Get Id of data file and load data in Google Colab

Step 4: Import data as Pandas DataFrame with read_csv

Top articles in this category:

Recommended books for interview preparation:

ebook PDF - Cracking Java Interviews v3.5 by Munish Chandel

ebook PDF - Cracking Spring Microservices Interviews for Java Developers

Find more on this topic:

Machine Learning

Subscribe to Interview Questions