Quickstart Guide to Google Analytics Clickstream Data

Comments:2

  1. Great post, thank you. I have used GA free in this way to bring clickstream to Snowflake Db and report in Tableau. I havent hit limits with volume, I did have to split the API queries as I could only pull 7 dimensions at a time, and also they had to be phased hourly as there is an API limit of querying <500k Sessions. Are you aware of any other limitations we should be aware of if depending on GA (free) for clickstream?
    Thanks for your time!

    1. Those are really good points. Querying via the API does have several limits:
      1) as you noted, you can supply a maximum of 7 dimensions in any query. (documentation) If you need more, you need to pull them separately and group them back together using a shared key like the client ID and/or session ID.
      2) Not all dimensions and metrics can be queried together. Certain combinations of dimensions and metrics are invalid, so you may not be able to query all dimensions at the same time. The Dimensions and Metrics Explorer tool shows what combinations are valid.
      3) Some combinations may be technically valid but won’t make sense together, for example pages and events. This combination would just provide the pages that events occurred on, rather than a list of all pages and a list of all events. For a clickstream report you’d usually want both page views and events, so these need to be split into separate queries.
      4) You can not send a query composed only of dimensions: Requests must specify at least one metric (maximum of ten).
      5) The Analytics Core Reporting API returns a maximum of 100,000 rows per request, no matter how many you ask for.
      6) I’m not sure about an API limit of querying 500k sessions, but there are various limits and quotas on API Requests, as described here: https://developers.google.com/analytics/devguides/reporting/core/v4/limits-quotas. The main one is a 10,000 limit on the number of requests per view per day.

      Unless you have massive volume, you should be able to extract all your data daily with the limits of 10,000 requests a day and 100k rows in each request. Further, #2, #3, and #4 apply to the paid version as well. So none of the above should be deal breakers, though ultimately it will depend on how much work you want to put into extracting and manipulating the data vs. paying for Google to push it into BigQuery for you.

Leave a Reply

Your email address will not be published.