Get started exploring Google Analytics data with Python Pandas

  • BenoĂźt Pointet

The latest release of Pandas (v0.17.1) has brought the deprecation of the Google Analytics data reader submodule (pandas.io.ga). This deprecation decision is actually good news since this submodule had dependencies on packages that are not currently python 3 compatible and was, even under python 2.7, hard to get up and running.

After updating my system to the newest versions of pandas, I had to find a new connector to fetch Google Analytics data, and found an advantageous replacement in the google2pandas module from Panalysis.

This blogpost walks you through the setup of Pandas and google2pandas, and breifly introduces you to fetching and getting Google Analytics data into Pandas dataframes, for further exploration with Pandas.

Installation

The following instructions are aimed at OSX and Linux system users.

  1. Download and install Anaconda by following the official instructions. Although your machine might already have a Python copy pre-installed, I recommend you install Anaconda, a bundle of 300+ data science tools that comes with its own package manager. When it comes to data analysis, one tool never fits all needs and with Anaconda, you get in the good company of so many of them. If you do not want to install the whole Anaconda, follow installation instructions for Pandas.

  2. Install the connector module and its dependencies:

pip install git+https://github.com/panalysis/Google2Pandas
  1. Test both installations on an interactive python environment, the following statement should run quietly and not throw any ImportError:
import pandas
import google2pandas

Setting up the connection

Now the tricky part: setting up the authorized connection to Google Analytics API.

  1. Follow the step 1 in the official instructions for installed apps .
  2. Create ‘ga-creds' directory in your data analysis workspace/directory, and move in it the client_secrets.json you downloaded in the previous step. An alternative would bet to put those credentials in the google2pandas installation folder. I however recommend you save those credentials in a private project folder and always specify their path when establishing the connection (see how below). That will allow you to reinstall modules without loosing credentials and manage multiple credentials.
  3. Set correct permissions :
chmod 644 ga-creds/client_secrets.json
  1. Save the following script in a file (ga-conn-init.py):
from google2pandas import *
conn = GoogleAnalyticsQuery(secrets='./ga-creds/client_secrets.json', token_file_name='./ga-creds/analytics.dat')
  1. Run python ga-conn-init.py.

A browser window should open, with a usual Google Account authentication process, plus asking permission to View your Google Analytics data. Please allow. Then the browser window redirected to a barebone page displaying: The authentication flow has completed. Were you to face a _no data received (ERR_EMPTYRESPONSE) page, try python ga-conn-init.py --noauth_local_webserver. If that also does not work, rerun the same command in another terminal window while the previous one is still running, this has worked for me.

  1. Check that it created the ga-creds/analytics.dat file .
  2. Adjust its permissions :
chmod 664 ga-creds/analytics.dat

Take a deep breath. This authentication process done, you shouldn't need to do it again in a near future. Just remember to always point to both files when establishing connection.

Fetching Google Analytics data

Basic query

  1. Go to Google Analytics , choose a view from which you want to import data, click on the admin tab and note down the view ID :
  2. Replace your VIEW-ID in the following lines and run them . That should print a tabular listing with 10 rows and data for the following columns: pageviews, data, pagePath.
from google2pandas import *

conn = GoogleAnalyticsQuery(secrets='./ga-creds/client_secrets.json', token_file_name='./ga-creds/analytics.dat')

query = {\
  'ids' : 'VIEW-ID',
  'metrics' : 'pageviews',
  'dimensions' : ['date', 'pagePath'],
  'start_date' : '8daysAgo',
  'max_results' : 10
}

df, metadata = conn.execute_query(**query)
print(df)

Further queries

Both the official Google Analytics Core Reporting API and the Google2Pandas module documentation and code can be useful ressources for expressing (complex) queries:

Google2pandas module:

Google Analytics Core Reporting API Documentation:


Qu’en pensez-vous?