This post provides downloadable Google Tag Manager (GTM) scripts and a walkthrough of the necessary steps for getting clickstream data out of Google Analytics. This clickstream data already exists behind the scenes; every time a user takes a tracked action on your website, like viewing a page or clicking a button, the data about that action is sent to Google Analytics as a hit. But by default, Google Analytics doesn't expose this granular hit-level data -- instead all reports are aggregated together by location, marketing channel, and so on.
This makes sense within the Google Analytics interface where a firehose of raw data would be more of a hindrance than a help, but the underlying raw clickstream data has many use cases -- valuable enough that access to this data in BigQuery costs $150,000 annually via Google Premium (don't worry, of course we're going to get it for free here 😀 ).
THE VALUE OF CLICKSTREAM DATA
If you're here, you probably already know why you want clickstream data. It enables you to merge Google Analytics data with a CRM, integrate GA data into a data warehouse, perform cohort analysis, answer questions about lifetime value, and run all kinds of complex queries and advanced analytics that are inaccessible from the standard reports.
Simo Ahava gets a lot of credit as this system was inspired from his posts on improving data collection, using custom tasks, and leveraging the 'transport : beacon' field. So definitely read his posts for background information. This article slightly modifies, condenses, and simplifies the steps, and compiles the variable scripts into a downloadable container so you can get a quick start.
STEP 1: SET UP CUSTOM DIMENSIONS
In Google Analytics Admin, navigate to Admin > Property > Custom Dimensions > +New Custom Dimension.
In the new custom dimension interface, set up the following 4 custom dimensions:
- Name = Timestamp, Scope = Hit
- Name = GA Client ID, Scope = User
- Name = Session ID, Scope = Session
- [optional] Name = User ID, Scope = Hit. Use this if you're passing an internal ID to the data layer that identifies your logged-in users, as described here
Once these have been added, note the numbers assigned by Google Analytics in the Index column. We'll need these later, in Step 5.
STEP 2: DOWNLOAD THE SCRIPTS
Here I've compiled the scripts for these 4 dimensions into a downloadable container you can import into your own GTM account. Right-click the link and click 'Save link as' to save them to your own computer: GTM Clickstream Variables
For reference, this is the content of these scripts:
- JS Timestamp
function() { try { var timestamp = new Date(); var time = timestamp.toISOString(); return time; } catch(e) { return "unknown"; } }
- GA Client ID
function() { var customDimensionIndex = 4; return function(model) { model.set('dimension' + customDimensionIndex, model.get('clientId')); } }
- Session ID
function() { return new Date().getTime() + '.' + Math.random().toString(36).substring(5); }
- User ID
Data Layer Variable Name =userId
STEP 3: IMPORT CLICKSTREAM SCRIPTS INTO YOUR OWN GTM CONTAINER
Follow the directions below to import the scripts into your own GTM container without affecting anything you have there already.
In Google Tag Manager, navigate to Admin > Container and click Import Container
Click Choose Container File and import the json file from step 2
Choose New workspace. Name it Clickstream and click Save.
Choose the options Merge and then Rename conflicting tags, triggers, and variables
Click Confirm and complete the import. Following the above steps will ensure that none of your existing tags are changed or overwritten.
You will now have 4 new Variables in a workspace named Clickstream: {{JS - Timestamp}}, {{JS - Session ID}}, {{JS - GA Client ID}}, and {{DL - UserID}}.
STEP 4: ADJUST THE NEW VARIABLES
These variables are almost ready to go. You just need to make one modification:
In the {{JS - GA Client ID}} Variable, click into it and add in the index number assigned by Google Analytics in Step 1 to your GA Client ID custom dimension. In my case it was "4", but remember to use your own assigned index number.
STEP 5: ADD THE NEW VARIABLES TO YOUR TAGS
The most convenient way to add these new variables to your tag is to update your Google Analytics Settings variable (located at Variables > User-Defined Variables). Make the following changes:
- Add Field Name = customTask, Value = {{JS - GA Client ID}}
- Add Field Name = transport, Value = beacon
- For each of the other 3 new dimensions, add in the index number assigned by GA in Step 1, alongside the Variable name.
Tracking ID should of course be your own GA account ID. If you aren't using the Google Analytics Settings variable, you can find these same options by clicking into your tags and clicking the "Enable overriding settings in this tag" checkbox.
Note: the {{DL - UserID}} Variable requires that you are pushing your internal user ID to your site's data layer. You can leave it out if you're not using it. If you do include it as a custom dimension without actually pushing a user ID to the data layer, it just won't populate.
STEP 6: TEST AND PUBLISH
Always test to make sure everything looks good in Preview Mode before publishing to production. Click around your site and manually verify that the Variables are populating correctly, like this (note that JS - GA Client ID will contain a function rather than the actual ID):
Also click into your page view tag and verify that the Variables are being passed inside it, and are populating the correct index slot number.
CONCLUSION
Once you've followed the above steps, you'll have access to raw clickstream data from your GA reports. This is an example of what the data will look like (Click to import this custom report into your own GA account: https://analytics.google.com/analytics/web/template?uid=sbXi8C3HQGukKWmO_39GPg)
Of course you can also access this data via the API, or send elements of it directly to your CRM upon form submission (if your form links directly to your CRM), or programmatically send this data to your data warehouse daily so you can run SQL queries on top of it, or push the data into BI software like Tableau... having this data available opens up all kinds of custom possibilities.
Have you implemented clickstream reporting or enabled any useful integrations? Let me know below.
Additional Reading:
- Google's documentation on Integrating CRM Data with Google Analytics
- a complete example of how to pull the GA Client ID and pass it in a form
Great post, thank you. I have used GA free in this way to bring clickstream to Snowflake Db and report in Tableau. I havent hit limits with volume, I did have to split the API queries as I could only pull 7 dimensions at a time, and also they had to be phased hourly as there is an API limit of querying <500k Sessions. Are you aware of any other limitations we should be aware of if depending on GA (free) for clickstream?
Thanks for your time!
Those are really good points. Querying via the API does have several limits:
1) as you noted, you can supply a maximum of 7 dimensions in any query. (documentation) If you need more, you need to pull them separately and group them back together using a shared key like the client ID and/or session ID.
2) Not all dimensions and metrics can be queried together. Certain combinations of dimensions and metrics are invalid, so you may not be able to query all dimensions at the same time. The Dimensions and Metrics Explorer tool shows what combinations are valid.
3) Some combinations may be technically valid but won't make sense together, for example pages and events. This combination would just provide the pages that events occurred on, rather than a list of all pages and a list of all events. For a clickstream report you'd usually want both page views and events, so these need to be split into separate queries.
4) You can not send a query composed only of dimensions: Requests must specify at least one metric (maximum of ten).
5) The Analytics Core Reporting API returns a maximum of 100,000 rows per request, no matter how many you ask for.
6) I'm not sure about an API limit of querying 500k sessions, but there are various limits and quotas on API Requests, as described here: https://developers.google.com/analytics/devguides/reporting/core/v4/limits-quotas. The main one is a 10,000 limit on the number of requests per view per day.
Unless you have massive volume, you should be able to extract all your data daily with the limits of 10,000 requests a day and 100k rows in each request. Further, #2, #3, and #4 apply to the paid version as well. So none of the above should be deal breakers, though ultimately it will depend on how much work you want to put into extracting and manipulating the data vs. paying for Google to push it into BigQuery for you.
Great post Ana, thank you very much. I was able to follow it successfully and the variables are populating OK. I want to make a custom report that creates a row for each page visit, saving its corresponding Client ID, Session ID and the Timestamp, so I can track the path followed by each user every time they enter the site. Anyways I can't seem to do it correctly, any help?
Hey there, thanks for the comment. Have you tried this custom report? https://analytics.google.com/analytics/web/template?uid=sbXi8C3HQGukKWmO_39GPg
Yes, sorry I should have mentioned that I tried to import it but GA won't let me, I don't know why
Gotcha. In that case, please build a custom report as follows:
1) Navigate to Customization > Custom Reports > +New Custom Report
2) Choose Report Type = Flat Table
3) Choose Dimensions = Page, Timestamp, Client ID, and Session ID
4) Choose Metric = Pageviews
Let me know if that works for you.
That works! Thank you very much
Hi Ana,
Thank you for your post.
Just wondering if I publish custom dimensions today, would I be able to pull clickstreams in the past? If not, if I want to have one month data, would I need to wait a month?
You're welcome! This collection method isn't retroactive -- if you publish today, you'll need to wait a month to have a month's data.
Thank you!
Hi Ana,
How you compiled the scripts for these 4 dimensions into a downloadable container in Step 2. Can You Please provide me steps.
Hi there, I added them into a totally new container. Then I clicked Admin > Export Container. Hope that helps!
Thank You Ana, It Helps me a lot.
Ana, thx so much for this! When I try to Preview in GTM as you suggest in STEP 6, I can this for each of the 4 variables. Any idea what I am doing wrong?
Validate Container
The container has the following errors:
Type Location Description
Unknown variable name
GA Settings
Unknown variable “JS – GA Client ID” found in another variable. Edit the variable and remove the reference to the unknown variable.
Hey Gamliel, this means you are referencing a variable called
JS - GA Client ID
, but there is no variable with that name. Most likely you named it something else so that's why it's producing an error. So just make sure the variable containing your GA client ID has the exact nameJS - GA Client ID
, and the issue should be resolved.Hi Ana, thanks for this. I am going to test implementing it.
Are there any implications with data protection laws in Europe that you are aware of? There won't all the unique IDs will be provided by google so i assume that will be OK?
Thanks in advance
So for example if i implement only session IDs and not users I should be able to do it without any issues.
I don't think GA client ID would produce any issue for data protection laws as they are just random IDs assigned by Google, they don't tie back to any personally identifiable info at all. I guess a user ID could be tied back, but it's supposed to be hashed/anonymized before sending to GA. So user ID doesn't seem any different from transaction IDs, which are tracked by virtually every analytics solution and can also theoretically be connected a user. So I personally wouldn't be concerned. Though please keep in mind I'm not in the slightest bit qualified to talk about law 🙂
Thanks for your response
Really appreciate it.