Google Colab: import data from google drive as pandas dataframe

Carvia Tech | August 24, 2019 | 3 min read | 73 views


In this article, we will learn how to read file from drive in Google Colab:

  • Load data from Google Drive in Jupyter using pydrive

  • Import data as pandas dataframe using read_csv

Here, we will be assuming that you are familiar with the Jupyter Notebook. In case, you are not then you can follow this article.

Introduction to Google Colab Notebook

Google Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Here is youtube video link to get overview of Google Colaboratory.

Perks of using Google Colab: You can get to work on GPU for free.

Introduction to PyDrive

PyDrive is a wrapper library of google-api-python-client that simplifies many common Google Drive API tasks.

Here, we will be using Pydrive for authenticating and then read data directly from google drive itself.

Step 1 : Importing libraries & Google Authentication

Import necessary libraries to read data

Importing necessary for reading csv
import pandas as pd (1)
from pydrive.auth import GoogleAuth (2)
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
1 Importing Pandas for reading csv
2 Importing Pydrive and related libraries

After running cell containing above code, google will prompt you to visit a link. After clicking that link,

  1. You will have to choose google account of which it will be accessing google drive.

  2. Copy the verification code and paste it into notebook. Press enter after pasting verification code.

logging to avoid discovery_cache error
import logging
logging.getLogger('googleapiclient.discovery_cache').setLevel(logging.ERROR)
importing libraries

Step 2: Loading data from google drive

Now, you need to upload data to google drive and copy the id from there. You can see the id of folder in the link to file like in screenshot.

get id of folder

so here id is 1stjtV19iKK1BdrPasHYqDNOk98-MEdsR

This id is folder id which we can use to get id of files in this folder and then load those files.

Step 3: Get Id of data file and load data in Google Colab

Get id
file_list = drive.ListFile({'q': "'1stjtV19iKK1BdrPasHYqDNOk98-MEdsR' in parents and trashed=false"}).GetList()
for file1 in file_list:
  print('title: %s, id: %s' % (file1['title'], file1['id']))

After running the above code, you will see all the files in the folder listed down with their IDs.

Output
title: Untitled1.ipynb, id: 1j6iOyUA0NGSmwI6EuBX9mXKNIUIMqEnq

title: stack-overflow-data.csv, id: 1sCIPWY2yVYh3hwREVLwVlloEH1-WbUT4

title: Untitled0.ipynb, id: 1PwWlRHgIT2iRpaf_cMXAnTFmVmlZ_blr

Step 4: Import data as Pandas DataFrame with read_csv

Now, we will be getting content of file by using id. You can see that we have copied code from above and used here in drive.CreateFile

data_downloaded = drive.CreateFile({'id': '1sCIPWY2yVYh3hwREVLwVlloEH1-WbUT4'})
data_downloaded.GetContentFile('stack-overflow-data.csv')

Now, we can access the data with same file name and load it as pandas dataframe.

data = pd.read_csv('stack-overflow-data.csv',low_memory=False, lineterminator='\n')

Now, you are good to go and work on the data. Thanks for reading this guide.


Top articles in this category:
  1. Google Data Scientist interview questions with answers
  2. Top 100 interview questions on Data Science & Machine Learning
  3. Python coding challenges for interviews
  4. Python Flask Interview Questions
  5. Why use feature selection in machine learning
  6. Creating custom Keras callbacks in python
  7. Imbalanced classes in classification problem in deep learning with keras



Find more on this topic:
Machine Learning image
Machine Learning

Data science, machine learning, python, R, big data, spark, the Jupyter notebook, and much more

Last updated 1 week ago


Recommended books for interview preparation:

This website uses cookies to ensure you get the best experience on our website. more info