Written by Jose Vicente
Those who do not have technical background are probably not familiar with use of regular expressions, also frequently known as regex. If you belong to this group, we could define regular expressions as sequences of characters making up search patterns. These search patterns are useful in many sections of Google Analytics, as they allow specific functionalities to be executed in many different scenarios, without listing or being familiar with each of them.
Regular expressions, aka regex, are sequences of characters making up search patterns.
For example, if we want to list product URLs belonging to specific directories in an Analytics table:
A regular expression or a search pattern such as “/boots/” would be used in all three instances (because all three URLs contain “/boots”). Imagine we have other URLs as well, for example, “/blog/boots/”, which would also fall under this condition, but we don’t want it to appear in our table. Thus, we must narrow down our search, or to make our regex more precise, so that, as a result, we get URLs we are actually interested in. We could do this with the following Google Analytics regular expression “^/product/footwear/(.+)/boots/”. We read this regex the following way:
- The circumflex accent ^ determines that this chain must start with the sequence ahead of it, in this case, “/product/footwear/”.
- The (.+) combination would enable any subdirectory to follow this chain, as the dot “.” is interpreted to mean “any character”, and the plus symbol “+” means it must be repeated one or more times.
- The chain could continue with the sequence “/boots/”, and after that could go anything, as we haven’t placed an end-of-chain limit. In Google Analytics regex we can do so by using the $ symbol.
In summary, this regular expression would return all the information of pages within the boots subdirectory, which, in turn, is within any subdirectory that starts with /product/footwear/.
If you would like to know more about Google Analytics regular expressions, there are tons of tutorials which will help you to understand them, and even to try them out in many cases, although there are Google Analytics functionalities, such as table filters, which will allow you to try them in real time.
Regular expressions in traffic view filters
Filters allow us to execute certain actions on data registered by a traffic view. Even though pre-defined filters allow us to easily select certain comparison types, there are cases in which we simply must use regular expressions. Let’s see it with an example of a website, in which content in Spanish is located under an /es/ subdirectory, a blog in Spanish is under a /blog/ subdirectory, content translated to English is under an /en/ subdirectory, and content translated to French is under an /fr/ subdirectory.
And what we want to do is to see all traffic that our content in Spanish has received (home page + /es/ subdirectory + /blog/).
If we create a filter that includes home page content:
Another one that includes traffic to subdirectories starting with /es/:
And another one for the /blog/ subdirectory.
If we have all three filters listed in this order, the result we will get is traffic to the home page only. Why? This is due to filter application order. In this case, the home page will fulfil the conditions stated in the first filter, and the remaining pages will be discarded. Another possibility would be to exclude /en/ and /fr/ subdirectories, this way we would be left with traffic to the Spanish version only. However, this way we are in danger of registering traffic to parts of our website that we aren’t monitoring, or new directories we might create later and forget we need to measure just as well.
The most elegant solution to this problem is to use a regular expression. This regular expression will allow traffic to the home page, to the blog subdirectory, or to the Spanish version /es/.
As you can see in the screen capture, we would have to use a custom filter that will include visits to pages whose URI will be subject to conditions of the regular expression (^/$)|(^/es/)|(^/blog/). This regex allows all three options, separated by the “|” bar, as this bar allows us to separate different conditions.
- (^/$): this is the home page, meaning the URI must begin (^ symbol) with a “/” slash bar and end ($ symbol) with a “/”.
- (^/es/): this is the /es/ subdirectory, and its URI must begin (^ symbol) with the /es/ subdirectory.
- (^/blog/): this is the blog directory, and its URI must begin (^ symbol) with the /blog/ subdirectory.
Thanks to this solution we will only register traffic for these three types of URI, even if other content on our website changes.
Regular expressions in report table filters
Report table filters present a similar issue to that of traffic view filters. In this case, only logical operator “and” is allowed to define conditions that are subject to all rules.
Continuing with the previous example, if we wanted to place filters that display data of pages whose URL coincides with /, starts with /es/ or /blog/, we wouldn’t be able to do so without a regular expression. This is because if we try to include all three options with the available “and” logical operator, the result would be an empty table, as no URL is going to match all specified conditions at the same time:
For that reason, to solve this it is best to use the same regular expression we’ve used previously.
In custom report filters the issue persists, as it only allows us to select the “and” logical operator. Besides, we only have “Exact” and “Regex” options, whilst “Containing” or “Begins with” disappear, and we’re forced to learn to use regular expressions if we want to configure a more or less advanced filter.
Advanced segments do not present the same problem, so we wouldn’t require regular expressions to solve it. In this case we have the “or” logical operator, and this means we will be able to solve it with the configuration pictured in the screen capture below:
There are, however, other settings in our GA account where we will be able to make use of regular expressions to define objectives or content groups.
Google Analytics regular expressions are a basic tool any analyst must know how to use.
As you’ve seen throughout the examples provided in this post, sometimes knowledge of these search patterns makes it easier to configure certain queries, and in other cases they are the only correct way to go. This proves that regular expressions are a basic tool any analyst must know how to use.