API Connector Add-On for Google Sheets
Check out my API Connector Add-on to easily connect and pull data from thousands of platforms (e.g. Shopify, Harvest, Mailchimp, ActiveCampaign, VWO, YouTube, etc.) directly into Google Sheets.
Using regex to match specific numbers of sub-directories in a URL can be very helpful for Google Analytics. When I configure a new Google Analytics view, I’ll usually set up Content Grouping so we can see traffic by page type rather than just to a specific page. Ideally there’s a value in the data layer that we can use for this purpose; failing that, I look for a certain keywords in the URL in order to use GA’s “Group using rule definitions” functionality. For example, if all the blog pages are grouped into a sub-directory called /blog/, it’s easy enough to add a rule definition like “Page contains /blog/”. This also applies to GA’s URL destination goal setup, which also accepts string matches.
Unfortunately real life scenarios are often not that clean. There are many cases where there’s neither a data layer value nor specific keywords in the URL. In those cases there’s another potential approach: count up the number of sub-directories and match on those with a regular expression (regex). For example, an e-commerce site may have URLs like www.site.com/clothing/jeans/low-rise-jeans-12345A/. In that case you could use some logic like, 3 sub-directories = product details page, two sub-directories = sub-category page, 1 sub-directory = main category page.
This post will provide the regex for matching specific numbers of sub-directories in a URL path, for a few different cases.
VARIATION 1: EXACTLY X NUMBERS OF SUB-DIRECTORIES, WITH TRAILING SLASH
This variation assumes each subdirectory ends in a trailing slash.
Regex for exactly one sub-directory
example matching URL path: /retail/
Regex for exactly two sub-directories
example matching URL path: /retail/clothing/
Regex for exactly three sub-directories
example matching URL path: /retail/clothing/jeans/
Regex for exactly four sub-directories
example matching URL path: /retail/clothing/jeans/low-rise-jeans-12345A/
(and so on…)
VARIATION 2: NO TRAILING SLASH
The above works in the case that all URLs end in a trailing slash. If they don’t, the regex needs to be altered as follows.
Regex for exactly one sub-directory + text
example matching URL path: /retail/clothing
Regex for exactly two sub-directories + text
example matching URL path: /retail/clothing/jeans
(and so on…)
VARIATION 3: AT LEAST X NUMBER OF SUBDIRECTORIES
If you want to match AT LEAST some number of sub-directories, just remove the initial caret. So regex to match at least one sub-directory would be: /[^/]+/$, regex to match at least two sub-directories would be /[^/]+/[^/]+/$, etc.
VARIATION 4: PATH SEGMENTS STARTING WITH A NUMBER
This is useful for the case where you have URL path segments that always start with a number. I see this a lot with product detail pages.
Regex for 1 subdirectory + path segment starting with a number
example matching URL path: /t-shirt/1234567-a-new-design
TEST VIA THE ALL PAGES REPORT
Before updating your Content Grouping or Goal Destination settings, test your regex condition. To do this, navigate to Behavior > Site Content > All Pages and click “advanced”.
Once the advanced filter opens up, enter your regex condition like this:
Click Apply and manually verify that the results look as expected.