Use of regular expressions in Google Analytics

Jose Vicente

Written by Jose Vicente

Non-technical people may not be familiar with the use of regular expressions. For those who are not familiar with them, regular expressions are sequences of characters that form search patterns. These search patterns are useful in many sections of Google Analytics as they allow us to have a given functionality run in a large number of scenarios without having to list or know every single one of them.

Regular expressions are sequences of characters that form search patterns.

Let’s say as an example that we want to list in an Analytics table product URLs that start with directories like:

  • /product/shoes/women/boots/
  • /product/shoes/men/boots/
  • /product/shoes/children/boots/

A regular expression or search pattern such as “/boots/” would be fulfilled in all three cases (since all three URLs contain the literal “/boots/”), but imagine we have other URLs such as “/blog/boots/” that would also fulfill this condition and we do not want it to appear in our data table. We must therefore further narrow down the search or further detail the regular expression to return as a result the URLs we are interested in. We could do this for example with the regular expression “^/product/shoes/(.+)/boots/”. We understand the regular expression as follows:

  • The circumflex accent ^ determines that the string must begin with the literal that follows it, in this case “/product/shoes/”.
  • The (.+) part would be allowing a directory with any string since the dot is interpreted as “any character” and the plus symbol which must be repeated 1 or more times.
  • The string could continue with the literal “/boots/” and after that any string since we have not set an end-of-string limiter, which in regular expressions is determined by the dollar sign $.

In short, this regular expression would return data from pages in the boots directory that are inside any directory starting with /product/shoes/.
If you want more information about regular expressions, there are countless tutorials that will help you to understand them and in many cases to test them, although there are features in Analytics such as filters in tables that will allow you to test them in real time.

Regular expressions in traffic view filters

Filters allow us to perform certain actions on the data recorded by a traffic view. Although the predefined filters allow us to select some types of comparisons in a simple way, there are cases in which we will have to resort to regular expressions.
Let’s take for example a website where we have the content in Spanish in the /es/ directory, a blog in Spanish in the /blog/ directory, the content translated into English in the /en/ directory and the content translated into French in the /fr/ directory.
Example of the use of regular expressions in Google Analytics
And we want to obtain in one view all the traffic of the Spanish content (home page + directory /en/ + /blog/). If we create a filter that allows the content of the home page.
Example of a filter in Analytics
Another one that allows traffic to directories starting with /en/.
Example of a filter in Analytics
And another equivalent for the /blog/ directory.
Example of a filter in Analytics
If we have the three filters in this order the result will be the traffic to the home page, this is due to the order of application of the filters. In this case only the home page will meet the first filter and the rest of the pages will be discarded.
Another possibility would be to exclude the /en/ and /fr/ directories, and in this way we would be left with only the traffic of the Spanish parts. But we run the risk of traffic being recorded from parties we don’t have control over or new directories being created without realizing that the analytics need to be reviewed.
So the most elegant solution to this problem is to use a regular expression. The way to solve it would be with a regular expression that allows traffic to the home page, or to the /blog/ directory, or to the English content /es/.
Filtering in Google Analytics with regular expressions
As you can see in the screenshot, we would have to use a custom filter that will include the visit to the pages whose URI complies with the regular expression (^/$)|(^/en/)|(^/blog/). This regular expression allows the 3 options separated by the slash “|”, this slash is used to separate the different options:

  • (^/$): that of the home page, the URI must begin (symbol ^) and end (symbol $) with a slash /.
  • (^/en/): the directory /en/, the URI must start (symbol ^) with the directory /en/.
  • (^/blog/): the /blog/ directory, the URI must start (symbol ^) with the /blog/ directory.

With this solution, even if the content of the rest of the site changes, we will only record traffic from these 3 URI types.

Regular expressions in table filters of reports

In the filters of the report tables we have a problem similar to that of the filters for the traffic views. In this case only the use of the logical operator “and” is allowed to define conditions that fulfill all the rules.
Example of filter in Google Anaytics with regular expressions
If we wanted, following the previous example, to define filters that display data from pages whose URL matches /, starts with /en/ or starts with /blog/, we could not do so without resorting to a regular expression. Because if we try to include the three options with the available logical operator “y”, the result would be an empty data table because no URL matches exactly with /, starts with /en/ and /blog/ simultaneously.
Example of a filter in Google Analytics
In this case the solution would be the same regular expression already seen.
Regular expression in Google Analytics
In the filters of the customized reports we encounter the same problem, it only allows us to select the logical operator “y”. In addition, we only have the “Exact” and “Regular expression” comparators, so options such as “Contains” or “Starts with” disappear and force us to know how to use regular expressions if we want to configure a moderately advanced filter.
Regular expressions in Google Analytics

Advanced segments

In the advanced segments we would not have the same problem to solve the query given as an example because we have the “or” operator (or “or else” as it is called in GA). Then we could solve it with the configuration that we can see in the following screenshot.
Example of advanced segment without regular expressions
But there are other options in the configuration of our GA account in which we can make use of regular expressions such as the definition of targets or content groupings.

Regular expressions are a basic tool that any analyst should know.

As you have seen in the examples presented in this article, knowing how these search patterns work makes it easier in some cases to configure certain queries and in other cases they are the only way to make them correctly. Therefore, regular expressions are a basic tool that any analyst should know.

  •  | 
  • Published on
Jose Vicente
Jose Vicente
Head of the SEO department and consultant at Human Level. Graduated in Computer Engineering. Expert in SEO and web analytics with certification in Google Analytics. Professor of the Master of SEO-SEM Professional Kschool.

What do you think? Leave a comment

Just in case, your email will not be shown ;)

Related Posts