by Kevin McLaughlin
Jupyter notebooks are incredibly powerful. Unlike Google Sheets or Microsoft Excel, they can handle large amounts of data with ease. You can perform complex statistical operations or otherwise manipulate data in just a few lines of code. You can run complex for-loops to create Monte Carlo simulations without expensive add-ons like Crystal Ball.
But sometimes you need to combine this power with the simple and almost universally understood UI of a spreadsheet.
For instance, you may want to practice your fancy Python code on a small dataset that you can easily and visually manipulate in Google Sheets. Or you may have to output the fruits of your complex analysis to a spreadsheet so your non-coder CEO or client can read and understand it. Or you may find it easier to do some of your simple data work in a spreadsheet and only the most complex parts in Python.
I used to do this by completing my spreadsheet work in Google Sheets, downloading a CSV file, pulling the CSV file data into Jupyter, manipulating my data, exporting another CSV file, and uploading it back to Google Sheets. Rinse and repeat for every debug or new iteration of data. Yuck.
But then I realized that Google provides an API to connect sheets to any third party app you can dream of, including Jupyter notebooks. After tinkering for a bit, I figured out how to easily pull the most up-to-date data from my Google Sheets into Jupyter and output data from Jupyter back to Google sheets.
It takes a bit of one-time setup. But if you find yourself going back and forth between sheets and Jupyter or occasionally miss the ease of use of spreadsheets, or need to update the data in sheets from Jupyter quickly, then this tutorial is for you.
Before connecting our Jupyter notebooks to our google sheets, we first must create Google developer credentials with permission to access our Google Drive. This part is a bit long and tricky but you only have to do this once for all notebooks and sheets.
Go to the Google Developer Console.
Click Create Project.
Enter a Project Name - I just use one project for all my notebooks instead of creating a new one for every project so I named my project “Jupyter and Google Sheets”. You can also rename the project ID if you like.
After creating your project, Google will take you back to the Developer Console. I had to refresh the page to see my new project. Click on your new project.
Go to the API & Services dashboard by clicking the Hamburger menu on the left and select API & Services > Dashboard.
Click the Google Drive API.
Okay. Now we’ve created our Google Drive project. Let’s get credentials to use it.
After Enabling the Google Drive API, Google should take you to the Google Drive API console for your project.
Click Create credentials.
Select Other UI for “Where will you be calling the API from?”.
Select User Data for “What data will you be accessing?” and click What credentials do I need?.
Enter a name and click Create client ID.
Select your email address, enter a product name, and click Continue.
Download your credentials and save them to the folder where you intend to create your Jupyter Notebook.
The last step to set up our credentials is to create Service Account credentials so we can connect our client (what we just created) to our actual personal Google Drive.
Go back to the Credentials dashboard and create a Service account key.
Select New service account, set Service account name to Google Sheets, set the role to Service Account User, leave Key type as JSON, click Create.
Save the JSON file to the folder where you intend to create your Jupyter Notebook (make sure the filename doesn’t have spaces like this one).
Okay. Whew. We have our Google developer credentials with permissions to access our Google Drive and thus Google Sheets.
Now let’s connect our Jupyter notebook to our Google Sheet.
conda install pandas jupyter pip install gspread oauth2client df2gspread
All of the work above you’ll only have to do once. From now on, it’ll be easy to pull data from a Google Sheet into a Jupyter Notebook and vice-versa.
To allow your Jupyter notebook access to the Google Sheet, you’ll need to share the sheet with the Google developer credentials you created in part one. To do so, open the Service account credentials file in Jupyter or a text editor.
Copy the client_email attribute.
Go back to your Google Sheet and click Share.
Paste the client_email into people box and click Send.
You’ll need to repeat the step above for any sheet that you want to pull into Jupyter notebooks.
Open or create a Jupyter notebook. The full version of my notebook is available here.
import pandas as pd import gspread from oauth2client.service_account import ServiceAccountCredentials
scope = ['https://spreadsheets.google.com/feeds'] credentials = ServiceAccountCredentials.from_json_keyfile_name('./<YOUR CREDENTIALS FILENAME>.json', scope) gc = gspread.authorize(credentials)
Find the Google Sheet key of the spreadsheet you want to import in the url of the spreadsheet as shown below.
Copy the key and paste it into the following code.
spreadsheet_key = '1f0OwtmuTk1fTdhnn4tuvVcPCZjdb00D79dWw4RFhYs0' book = gc.open_by_key(spreadsheet_key) worksheet = book.worksheet("Candidate Data") table = worksheet.get_all_values()
Enter the name of the worksheet you want to import into the above code as well.
If you run the code, you’ll see the table variable now has the rows from the Google Sheet, like below.
Now that we have our sheet data in our Jupyter notebook, we’ll want to convert the data contained in the table variable into a clean pandas dataframe for easy manipulation.
##Convert table data into a dataframe df = pd.DataFrame(table[1:], columns=table) ##Only keep columns we need df = df[['Order', 'Candidate', 'Position', 'Start Date', 'End Date', 'Years of Experience', 'Points']] ##Convert number strings to floats and ints df = df.apply(pd.to_numeric, errors='ignore') ##Convert date strings to datetime format df['End Date'] = pd.to_datetime(df['End Date'],infer_datetime_format=True) df['Start Date'] = pd.to_datetime(df['Start Date'],infer_datetime_format=True) df.head()
And walla! You have converted your Google Sheet data into a nice, clean pandas dataframe. If you change the data in the sheet and want to update the dataframe, simply rerun all the code from:
book = gc.open_by_key(spreadsheet_key)
We can also send data from the Jupyter notebook back to the Google Sheet.
First, let’s manipulate the data. The groupby capabilities of dataframes are particularly hard to replicate in a sheet, so let’s do that.
candidate_groups = df.groupby('Candidate') scores_df = candidate_groups.sum() scores_df['Order'] = candidate_groups.first() scores_df.head()
That will create a new scores dataframe that looks like this…
To ouput this data back to the same Google Sheet under the worksheet named “Jupyter Manipulated Data” simply run the following code.
from df2gspread import df2gspread as d2g wks_name = 'Jupyter Manipulated Data' d2g.upload(scores_df, spreadsheet_key, wks_name, credentials=credentials, row_names=True)
And boom. The Google sheet is updated with our Jupyter data…
Now pushing and pulling data back and forth between Google Sheets and Jupyter notebook is a snap. I use this every day and am constantly finding new functionality and workflows for it. I hope you’ll find it as useful as I do.