DOM Scraping into Data Layer & Custom JS Variables

While Google Tag Manager is best used in conjunction with a data layer, there are many cases where it will make sense to more rapidly deploy tags by scraping the DOM instead. Generally this is because DOM scraping means using the elements that already exist on the page — the page titles, classes, IDs, URLs, etc. — rather than waiting for developers to collect information into a structured data layer. (Google Analytics Event Tracking in Google Tag Manager describes the pros and cons of using GTM’s built-in tracking vs. a data layer in more detail.)

This post walks through how to traverse and access all the HTML DOM nodes for any element you’ve clicked on — useful when you need to access a specific value that you can add to GTM via a Custom JavaScript or Data Layer Variable. The latter method allows you to pull values out of the DOM without needing to write any code.

NEW! API CONNECTOR ADD-ON FOR GOOGLE SHEETS

Check out my new API Connector Add-on to import data from thousands of platforms (e.g. Shopify, Harvest, Mailchimp, ActiveCampaign, VWO, YouTube, etc.) directly into Google Sheets.

DOM SCRAPING OVERVIEW

Traversing up and down the DOM path is one of the most useful techniques for extracting the value of a specific element on the page for analytics tracking. These nodes include:

  • parentNode
  • childNodes[nodenumber]
  • firstChild
  • lastChild
  • nextSibling
  • previousSibling

In most cases you’ll want to access the node containing the information you’re looking for, and then return the inner text or HTML.

While you could do this by simply inspecting the element you clicked on (and you will still need to start here), in practice it can be hard to keep track of exactly where you are in relation to the click. Therefore it’s often useful to use a script like this, to verify that you’re pulling in the right value. This is the script to see the entire DOM path for any clicked element.

Open the dev console (F12 on Windows/Linux or option + ⌘ + J on OSX), select the Console tab, and paste the script in like this:

dom-traversal-img1

Once you’ve hit return, click on any element; you’ll see the entire node tree, which you can then expand and navigate around to find your exact target element. Once you find it, hover over the item to get the path to your element, and then copy it to your clipboard for use in GTM.

DATA LAYER VARIABLE CONSTRUCTION EXAMPLE

For this example I’ll use jcrew.com. Let’s say we want to create a Google Analytics event tag containing the product listing section sub-header name (“Secret Wash”) every time someone clicks on a product tile, to record which sub-category was clicked.

dom-traversal-img2

By inspecting the page, we can see that the listing name occupies the h4 header, so we need to find the relationship between the clicked element (class = js-product__image product-tile__image–small) and the h4 tag text.

dom-traversal-img4

Ctrl-left click the link to stay on the page while it opens in another tab. As long as we’ve pasted in the JavaScript snippet above, the full node path now opens up in the Console, under the name MouseEvent. “Target” refers to the element you just clicked on, so scroll down the list and expand the target object.

dom-traversal-img3

Following the path shown in the page source above shows that the path from clicked product to subheader can be accessed as follows:

dom-traversal-img5

Right-click and “Copy property path” to save this path to your clipboard, or highlight the path using your mouse and click Ctrl-C to copy it to your clipboard. Convert ‘target’ into the GTM equivalent, ‘gtm.element’, and you can now use it in a Data Layer Variable, like this:

DATA LAYER VARIABLE:
gtm.element.parentElement.parentElement.parentElement.parentElement.parentElement.parentElement.parentElement.previousSibling.innerText

dom-traversal-img7

 

CUSTOM JAVASCRIPT VARIABLE CONSTRUCTION EXAMPLE

It is definitely not advisable to string together 7 levels of parent elements to access a particular place in the DOM. It looks ridiculous… and all it takes is one part of the path to change and the whole thing breaks. So as an alternative to the above, you could instead write a script like this:

Once you’ve verified that it works by testing in the Dev Tools console, modify it so it uses GTM’s syntax and add it into a Custom Javascript Variable:

dom-traversal-img6

The script crawls up the DOM until it finds the first element with class ‘product__list’, at which point it crawls down to the nested header and h4 elements, and then returns the inner text.

CONCLUSION

For a long term solution this kind of tracking should be done via a data layer push, or at least through a more developed script using, for example, the above Element.closest method or the great DOM crawler function provided by Simo Ahava here. Stringing together multiple nodes into a Data Layer Variable should usually be a last resort — but with that said, not all DOM traversals will be so complex, and that method is notable for working without needing to write a single line of code. So the methods presented here are all useful for the web analyst’s toolkit, as examples of how to quickly scrape data off the page for constructing, testing, or patching a tag, for all levels of technical expertise.

P.S. Track Number of Search Results in Google Analytics with GTM and DOM Scraping Together a Datalayer for Google Analytics Ecommerce Tracking provide more examples of DOM scraping for use in Google Tag Manager.

GOOGLE TAG MANAGER CONSULTING

Get your own tag manager! Click for information on my Google Tag Manager consulting service.

Comments:2

  1. When I follow this step: Right-click and “Copy property path” to save this path to your clipboard. I am not getting the full path you’re showing above. Instead, I only get .target.innerText. What am I doing wrong?

    1. Hey Sara, good question. I just tested it and it doesn’t work for me anymore either, so maybe something changed in a recent version of Dev Tools.
      Luckily it’s still pretty easy to get it: just use your mouse to highlight the full path in the Console, then click Ctrl-C to copy it and Ctrl-V to paste (just as you would in any other text application). I’ll update the post. Please let me know if you have any problems!

Leave a Reply

Your email address will not be published.