Google Colab: import data from google drive as pandas dataframe

Upasana | December 07, 2019 | 3 min read | 4,234 views

In this article, we will learn how to read file from drive in Google Colab:

  • Load data from Google Drive in Jupyter using pydrive

  • Import data as pandas dataframe using read_csv

Here, we will be assuming that you are familiar with the Jupyter Notebook. In case, you are not then you can follow this article.

Introduction to Google Colab Notebook

Google Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Here is youtube video link to get overview of Google Colaboratory.

Perks of using Google Colab: You can get to work on GPU for free.

Introduction to PyDrive

PyDrive is a wrapper library of google-api-python-client that simplifies many common Google Drive API tasks.

Here, we will be using Pydrive for authenticating and then read data directly from google drive itself.

Step 1 : Importing libraries & Google Authentication

Import necessary libraries to read data

Importing necessary for reading csv
import pandas as pd (1)
from pydrive.auth import GoogleAuth (2)
from import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
1 Importing Pandas for reading csv
2 Importing Pydrive and related libraries

After running cell containing above code, google will prompt you to visit a link. After clicking that link,

  1. You will have to choose google account of which it will be accessing google drive.

  2. Copy the verification code and paste it into notebook. Press enter after pasting verification code.

logging to avoid discovery_cache error
import logging
importing libraries

Step 2: Loading data from google drive

Now, you need to upload data to google drive and copy the id from there. You can see the id of folder in the link to file like in screenshot.

get id of folder

so here id is 1stjtV19iKK1BdrPasHYqDNOk98-MEdsR

This id is folder id which we can use to get id of files in this folder and then load those files.

Step 3: Get Id of data file and load data in Google Colab

Get id
file_list = drive.ListFile({'q': "'1stjtV19iKK1BdrPasHYqDNOk98-MEdsR' in parents and trashed=false"}).GetList()
for file1 in file_list:
  print('title: %s, id: %s' % (file1['title'], file1['id']))

After running the above code, you will see all the files in the folder listed down with their IDs.

title: Untitled1.ipynb, id: 1j6iOyUA0NGSmwI6EuBX9mXKNIUIMqEnq

title: stack-overflow-data.csv, id: 1sCIPWY2yVYh3hwREVLwVlloEH1-WbUT4

title: Untitled0.ipynb, id: 1PwWlRHgIT2iRpaf_cMXAnTFmVmlZ_blr

Step 4: Import data as Pandas DataFrame with read_csv

Now, we will be getting content of file by using id. You can see that we have copied code from above and used here in drive.CreateFile

data_downloaded = drive.CreateFile({'id': '1sCIPWY2yVYh3hwREVLwVlloEH1-WbUT4'})

Now, we can access the data with same file name and load it as pandas dataframe.

data = pd.read_csv('stack-overflow-data.csv',low_memory=False, lineterminator='\n')

Now, you are good to go and work on the data. Thanks for reading this guide.

Top articles in this category:
  1. Python - Get Google Analytics Data
  2. Google Data Scientist interview questions with answers
  3. Connect to Postgresql with Python 3.x and get Pandas Dataframe
  4. Connect to Cassandra with Python 3.x and get Pandas Dataframe
  5. Connect to MySQL with Python 3.x and get Pandas Dataframe
  6. Top 100 interview questions on Data Science & Machine Learning
  7. Write a program to check if the given word is Isogram & Pair isogram in python

Recommended books for interview preparation:

Find more on this topic:
Buy interview books

Java & Microservices interview refresher for experienced developers.