Regex to Match Number of Subdirectories in a URL

Using regex to match specific numbers of sub-directories in a URL can be very helpful for Google Analytics. When I configure a new Google Analytics view, I'll usually set up Content Grouping so we can see traffic by page type rather than just to a specific page. Ideally there's a value in the data layer that we can use for this purpose; failing that, I look for a certain keywords in the URL in order to use GA's "Group using rule definitions" functionality. For example, if all the blog pages are grouped into a sub-directory called /blog/, it's easy enough to add a rule definition like "Page contains /blog/".  This also applies to GA's URL destination goal setup, which also accepts string matches.

Unfortunately real life scenarios are often not that clean. There are many cases where there's neither a data layer value nor specific keywords in the URL. In those cases there's another potential approach: count up the number of sub-directories and match on those with a regular expression (regex). For example, an e-commerce site may have URLs like www.site.com/clothing/jeans/low-rise-jeans-12345A/. In that case you could use some logic like, 3 sub-directories = product details page, two sub-directories = sub-category page, 1 sub-directory = main category page.

This post will provide the regex for matching specific numbers of sub-directories in a URL path, for a few different cases.

VARIATION 1: EXACTLY X NUMBERS OF SUB-DIRECTORIES, WITH TRAILING SLASH
VARIATION 2: NO TRAILING SLASH
VARIATION 3: AT LEAST X NUMBER OF SUBDIRECTORIES
VARIATION 4: PATH SEGMENTS STARTING WITH A NUMBER

VARIATION 1: EXACTLY X NUMBERS OF SUB-DIRECTORIES, WITH TRAILING SLASH

This variation assumes each subdirectory ends in a trailing slash.

Regex for exactly one sub-directory

^/[^/]+/$

example matching URL path: /retail/

Regex for exactly two sub-directories

^/[^/]+/[^/]+/$

example matching URL path: /retail/clothing/

Regex for exactly three sub-directories

^/[^/]+/[^/]+/[^/]+/$

example matching URL path: /retail/clothing/jeans/

Regex for exactly four sub-directories

^/[^/]+/[^/]+/[^/]+/[^/]+/$

 example matching URL path: /retail/clothing/jeans/low-rise-jeans-12345A/

(and so on...)

VARIATION 2: NO TRAILING SLASH

The above works in the case that all URLs end in a trailing slash. If they don't, the regex needs to be altered as follows.

Regex for exactly one sub-directory + text

^/[^/]+/[^/]+[a-zA-Z0-9]$

 example matching URL path: /retail/clothing

Regex for exactly two sub-directories + text

^/[^/]+/[^/]+/[^/]+[a-zA-Z0-9]$

 example matching URL path: /retail/clothing/jeans

(and so on...)

 

VARIATION 3: AT LEAST X NUMBER OF SUBDIRECTORIES

If you want to match AT LEAST some number of sub-directories, just remove the initial caret. So regex to match at least one sub-directory would be: /[^/]+/$, regex to match at least two sub-directories would be /[^/]+/[^/]+/$, etc.

 

VARIATION 4: PATH SEGMENTS STARTING WITH A NUMBER

This is useful for the case where you have URL path segments that always start with a number. I see this a lot with product detail pages.

Regex for 1 subdirectory + path segment starting with a number

^/[^/]+/[^/][0-9]

 example matching URL path: /t-shirt/1234567-a-new-design

TEST VIA THE ALL PAGES REPORT

Before updating your Content Grouping or Goal Destination settings, test your regex condition. To do this, navigate to Behavior > Site Content > All Pages and click "advanced".

subdirectory regex-img1

Once the advanced filter opens up, enter your regex condition like this:

subdirectory regex-img2

Click Apply and manually verify that the results look as expected.

27 thoughts on “Regex to Match Number of Subdirectories in a URL”

  1. Hi, thanks for writing out this article. It's been super helpful and is almost exactly what I'm looking for.

    However, I'd like to take "Variation 1" one step further.

    With the example, Regex for exactly one sub-directory:
    ^/[^/]+/$

    This will match any top-level directory, for example: /retail/.

    I'd like to do this, plus 2 and 3 directories deeper, but ideally I want to specify exactly what that directory is. In this example, matching only the /retail/ directory, and all subsequent subdirectories.

    Would love if you'd be able to explain that!

    • That's great, glad to hear it was useful!

      If you want to specify a specific directory, you should be able to type it in like this:

      ^/(retail)+/[^/]+/$ (2 subdirectories, including /retail/)
      ^/(retail)+/[^/]+/[^/]+/$ (3 subdirectories, including /retail/)
      ^/(retail)+/[^/]+/[^/]+/[^/]+/$ (4 subdirectories, including /retail/)

      Please try that and let me know if there's any issue.

      • Hi there! Thanks so much for this article. It's incredibly helpful. I'm trying the above without luck. In my example there is an underscore in the first subdirectory I'd like to group the content by.
        Example:
        Goal: group all content containing two sub directories after /retail_store/, beginning with /retail_store/
        ^/(retail_store)+/[^/]+/$

        Any idea what I might be doing wrong?

      • Thank you for your comment, much appreciated!

        If you have 2 subdirectories after /retail_store/, you'll have a total of 3 subdirectories. So for that case you should use the following:
        ^/(retail_store)+/[^/]+/[^/]+/$

        It shouldn't matter if there's an underscore or not. Please try it out and let me know how it goes!

  2. Thanks so much, Ana! I'm still having some issues with this.

    For context, I'm trying to group content in GA using this regex. I want to group views to two different sub-directories:
    1. https://website.com/retail_store/123
    2. http://website.com/retail_store/123/456
    Pages with exactly two subdirectories starting with retail_store is one category, and pages with exactly three subdirectories starting with retail_store is the second category. Any other advice you might have would be awesome - I so appreciate your help!

  3. Super helpful post! Do you also have something in mind for filtering a Landing Page URL based on the amount of certain symbols. E.x. filter out URLs that countain 3 or more "_" ?

    • I think you can use the following regex to capture URLs with 3 or more underscores:
      \_.*\_.*\_
      Please check it out and let me know if that works for you!

    • Hi Roberto, if you just want to group by those specific pages in GA I don't think you need regex at all, you can just make conditions like Page contains /landingpage and Page contains /usermanagement. Let me know if I've misunderstood what you're looking for.

  4. Hi Ana - great article! Question.

    Is there a way to grab the beginning and end by required URL string, but disregard the amount of subdirectory paths?

    Example: website.com/abc/d-e/123 and website.com/abc/123

    I want to grab the URL string if both /abc/ and /123 are True, regardless of how many subdirectory paths there are. Does this make sense? Thanks in advance for any direction you can provide!

    • Hey AF, sure, you can use .* as a wildcard.

      So if you want to match on /abc/ and /123, you'd use the following expression: /abc/.*123

      (I left the slash off the 123 so the regex will work even if it's immediately following /abc/.You can test it in your All Pages report to make sure it pulls in the URLs you're looking for.)

    • Thank you for sharing! Though, your regex will only work for the specific case where the URL ends with some known string (like your '.aspx' example). If you want something a little more generalizable I think you could use this to match no directories: ^/[^/]+[a-zA-Z0-9]$

  5. I am trying to set up goals in ga and want to track the following way
    Suppose I have http://www.abc/Category/Subcategory/Products1
    http://www.abc/Category/Subcategory/Products2
    http://www.abc/Category/Subcategory-2/Product-1
    http://www.abc/Category/Subcategory-2/Product-2
    I want to track Only Category , Subcategory , Products
    I want to Track Home Page --> Category Pages ---> Subcategory Pages ---> Products To be configured as goals in Ga . I need to know in my destination url how Do I do this so I can include multiple categories excluding sub then
    multiple subs excluding products or categories and like wise

    • I'm not sure I totally understand your question, but GA doesn't support negative lookahead regex, so in general you need to choose simple conditions that include the text you want, and exclude the text you don't. So in your example, a regex condition of "/category/" would naturally exclude "/subcategory-2/" since the string doesn't include "-2" in it.

  6. Thank you so much for this article! I had a few headaches getting my head around how to do this properly, but thanks to your examples I've managed to get a clean grouping in Google Analytics.

    However... the headaches are starting again now I want to use it in Datastudio. I've changed the regex to match datastudio's regex version (Google RE2). However, the regex doesn't exclude url strings with more than 3 directories. It now works as a minimum of 3 directories. I'd like to exclude url strings with 4 or more directories, so I only have the url strings with 3 directories.

    Do you happen to have an idea how to do this in Google RE2 for Datastudio?

    Thanks for your help! 🙂

Comments are closed.