Google Sheets to Jupyter Notebooks

Jupyter notebooks are incredibly powerful. Unlike Google Sheets or Microsoft Excel, they can handle large amounts of data with ease. You can perform complex statistical operations or otherwise manipulate data in just a few lines of code. You can run complex for-loops to create Monte Carlo simulations without expensive add-ons like Crystal Ball.

But sometimes you need to combine this power with the simple and almost universally understood UI of a spreadsheet.

For instance, you may want to practice your fancy Python code on a small dataset that you can easily and visually manipulate in Google Sheets. Or you may have to output the fruits of your complex analysis to a spreadsheet so your non-coder CEO or client can read and understand it. Or you may find it easier to do some of your simple data work in a spreadsheet and only the most complex parts in Python.

I used to do this by completing my spreadsheet work in Google Sheets, downloading a CSV file, pulling the CSV file data into Jupyter, manipulating my data, exporting another CSV file, and uploading it back to Google Sheets. Rinse and repeat for every debug or new iteration of data. Yuck.

But then I realized that Google provides an API to connect sheets to any third party app you can dream of, including Jupyter notebooks. After tinkering for a bit, I figured out how to easily pull the most up-to-date data from my Google Sheets into Jupyter and output data from Jupyter back to Google sheets.

It takes a bit of one-time setup. But if you find yourself going back and forth between sheets and Jupyter or occasionally miss the ease of use of spreadsheets, or need to update the data in sheets from Jupyter quickly, then this tutorial is for you.

NOTE: I am using Python 2.7.14 and Anaconda version 4.3.30. You can see my full enviroment here and my full example notebook used in part 2 here.

Part One - Create your Google Developer Credentials

Before connecting our Jupyter notebooks to our google sheets, we first must create Google developer credentials with permission to access our Google Drive. This part is a bit long and tricky but you only have to do this once for all notebooks and sheets.

Create a Google Developer Project

Go to the Google Developer Console.

Click Create Project.

enter image description here

Enter a Project Name - I just use one project for all my notebooks instead of creating a new one for every project so I named my project “Jupyter and Google Sheets”. You can also rename the project ID if you like.

Click Create.

Enable the Google Drive API

After creating your project, Google will take you back to the Developer Console. I had to refresh the page to see my new project. Click on your new project.

enter image description here

Go to the API & Services dashboard by clicking the Hamburger menu on the left and select API & Services > Dashboard.

enter image description here

Click the Google Drive API.

enter image description here

Click Enable.
enter image description here

Okay. Now we’ve created our Google Drive project. Let’s get credentials to use it.

Create credentials to use the Google Drive API

After Enabling the Google Drive API, Google should take you to the Google Drive API console for your project.

Click Create credentials.

enter image description here

Select Other UI for “Where will you be calling the API from?”.

enter image description here

Select User Data for “What data will you be accessing?” and click What credentials do I need?.

enter image description here

Enter a name and click Create client ID.

enter image description here

Select your email address, enter a product name, and click Continue.

Download your credentials and save them to the folder where you intend to create your Jupyter Notebook.
enter image description here
enter image description here

Create Service Account credentials

The last step to set up our credentials is to create Service Account credentials so we can connect our client (what we just created) to our actual personal Google Drive.

Go back to the Credentials dashboard and create a Service account key.

enter image description here

Select New service account, set Service account name to Google Sheets, set the role to Service Account User, leave Key type as JSON, click Create.

enter image description here

Save the JSON file to the folder where you intend to create your Jupyter Notebook (make sure the filename doesn’t have spaces like this one).

enter image description here

Okay. Whew. We have our Google developer credentials with permissions to access our Google Drive and thus Google Sheets.

Part Two - Connect Jupyter to Google Sheets

Now let’s connect our Jupyter notebook to our Google Sheet.

Install the required packages using conda install and pip.


        conda install pandas jupyter
        pip install gspread oauth2client df2gspread
        
    

Share a Google sheet to your notebook.

All of the work above you’ll only have to do once. From now on, it’ll be easy to pull data from a Google Sheet into a Jupyter Notebook and vice-versa.

First, you’ll need a Google Sheet. This can be any sheet. We’ll use the candidate scoring data from my CEO of America article.

To allow your Jupyter notebook access to the Google Sheet, you’ll need to share the sheet with the Google developer credentials you created in part one. To do so, open the Service account credentials file in Jupyter or a text editor.

enter image description here

Copy the client_email attribute.

enter image description here

Go back to your Google Sheet and click Share.

Paste the client_email into people box and click Send.

enter image description here

You’ll need to repeat the step above for any sheet that you want to pull into Jupyter notebooks.

Pull Google sheet data into the Jupyter notebook

Open or create a Jupyter notebook. The full version of my notebook is available here.

Import libraries


    import pandas as pd
    import gspread
    from oauth2client.service_account import ServiceAccountCredentials
    

Connect to your service account


    scope = ['https://spreadsheets.google.com/feeds']
    credentials = ServiceAccountCredentials.from_json_keyfile_name('./<YOUR CREDENTIALS FILENAME>.json', scope)
    gc = gspread.authorize(credentials)
    

Import the data from your google spreadsheet into Jupyter

Find the Google Sheet key of the spreadsheet you want to import in the url of the spreadsheet as shown below.

enter image description here

Copy the key and paste it into the following code.


    spreadsheet_key = '1f0OwtmuTk1fTdhnn4tuvVcPCZjdb00D79dWw4RFhYs0'
    book = gc.open_by_key(spreadsheet_key)
    worksheet = book.worksheet("Candidate Data")
    table = worksheet.get_all_values()
    

Enter the name of the worksheet you want to import into the above code as well.

enter image description here

If you run the code, you’ll see the table variable now has the rows from the Google Sheet, like below.

enter image description here

Convert the sheet data into a pandas dataframe

Now that we have our sheet data in our Jupyter notebook, we’ll want to convert the data contained in the table variable into a clean pandas dataframe for easy manipulation.


    ##Convert table data into a dataframe
    df = pd.DataFrame(table[1:], columns=table[0])
    
    ##Only keep columns we need
    df = df[['Order', 'Candidate', 'Position', 'Start Date', 'End Date', 'Years of Experience', 'Points']]
    
    ##Convert number strings to floats and ints
    df = df.apply(pd.to_numeric, errors='ignore')
    
    ##Convert date strings to datetime format
    df['End Date'] = pd.to_datetime(df['End Date'],infer_datetime_format=True)
    df['Start Date'] = pd.to_datetime(df['Start Date'],infer_datetime_format=True)
    
    df.head()
    

And walla! You have converted your Google Sheet data into a nice, clean pandas dataframe. If you change the data in the sheet and want to update the dataframe, simply rerun all the code from:


    book = gc.open_by_key(spreadsheet_key) 
    

onward.

Push from Jupyter notebook to Google Sheet

We can also send data from the Jupyter notebook back to the Google Sheet.

First, let’s manipulate the data. The groupby capabilities of dataframes are particularly hard to replicate in a sheet, so let’s do that.


    candidate_groups = df.groupby('Candidate')
    scores_df = candidate_groups.sum()
    scores_df['Order'] = candidate_groups.first()
    scores_df.head()
    

That will create a new scores dataframe that looks like this…

enter image description here

To ouput this data back to the same Google Sheet under the worksheet named “Jupyter Manipulated Data” simply run the following code.


    from df2gspread import df2gspread as d2g
    wks_name = 'Jupyter Manipulated Data'
    d2g.upload(scores_df, spreadsheet_key, wks_name, credentials=credentials, row_names=True)
    

And boom. The Google sheet is updated with our Jupyter data…

enter image description here

Now pushing and pulling data back and forth between Google Sheets and Jupyter notebook is a snap. I use this every day and am constantly finding new functionality and workflows for it. I hope you’ll find it as useful as I do.

Credits

  1. Simple Google Spreadsheets to Pandas DataFrame in IPython
    Notebook
  2. df2gspread
  3. gspread
  4. Python Quickstart