November 25, 2011

How Regular Expressions Makes Segmentation and Web Analytics Diagnostics Easier

The next time you look at your keywords in Google Analytics, take notice of repeated phrases and characters.  Maybe they’re words about direction - North Dakota, South Carolina, or the phrase “new” for New Jersey, New York, New Brunswick, etc.  Sometimes its a brand name and product combination - Google Chrome, Google Android, Google Adwords, and the list goes on for many brand in many industries.

These patterns sometimes need to be sifted to detect trends, but repeatedly parsing data by eyeball and keystoke can impede speedy analysis.  To reduce the effort, try inserting a regular expression in your report filter the next time you review a Google Analytics report or plan a goal in Piwik.

Regular expressions are characters meant to return a string or pattern of text in a query.  Regular expressions appear in Javascript and a number of coding languages.  Within a web analytics solution, regular expressions are used for identifying keywords with a similar spelling or related meaning. Expressions are also handy in other filters such as returning a string of IP addresses, as well as URL segments that appear repeated, be it characters from a tagged campaign or a subdirectory.

Because of their versatility, regular expressions can be added to a number of analytics solution settings.  In Google Analytics, for example, the regular expression string can be saved in the advanced segmentation wizard for repeated usage, saving time to recreate a group with each use of an analytics session.

Parenthesis () identify a group of characters, usually applied with there are versions of a group of words. (az) captures the lowercase letters a or z.  So say you have a product called jeansa and jeansz - you can create a regular expression jeans(az) to capture both in a query.  You can cover long sequential ranges with a hyphen, such as (a-z) to cover the lowercase alphabet.

Brackets [ ] can also capture a group, The difference between the brackets and parenthesis is that brackets are for matching a sole character.  So while (xyz) would return  with parentheses, a bracket [xyz] captures appearances of x, y, or z in a string.
A hyphen can also be sued to denote a sequential range of characters.

A period . is typically a wild card, used for unknown characters in a text.

What if your desired sifted text string contain a period or a question mark?  You can use a backslash    - it permits an exact usage of the character that follows it. In coding language this means the backslash “escapes” the code to treat a character as exactly as it appears.

A caret ^ return queries in which characters appear at the beginning of the string.  So ^New would return “New York Knicks” but not “The New York Knicks”

A dollar sign $ requires characters to appearance at the end.  So from the Knicks example, the word $Knicks would select both “New York Knicks”  and “The New York Knicks”

The pipe, |, dictates an either/or.  So if your are interested in keywords about cars, for example, “Toyota|Ford” can return Toyota or Ford, while “Toyota|Chrysler|Ford” returns a a choice from a series of makes.

A great way to use regular expressions is imagine the string as a strainer. Ultimately you want the query to show what you want sifted out from a “flow” of characters.  Keeping the concept simple will help you be able to develop regular expressions.

What if you’re not comfortable with coding?  You can think about the desired queries pairings, then seek help from someone experienced with Javascript to translate the idea into a regular expression.

Keep in mind that a regular expression filter may be adjusted over time, particularly as keywords that are used in a blog posts change or with whatever change that occurs for a website.

Learning sources for regular expressions:

Luna Metrics, a web analytics consultancy based in Pittsburgh, has a wonderful user-friendly guide on the key regular expressions used in Google Analytics.  You can download it at the Luna Metrics blog.

Google offers a page covering the basics of regular expression. You can connect to the page here.

Because regular expressions are based on Javascript code structure, W3C schools offers a few pages on the subject, which can give an overall arc on how expressions are used with respect to a website.

What is the benefit of regular expressions to your online presence?

Regular Expressions help your analysis in a few time-saving ways:

  • Reduced analysis time -- By saving regular expression in an advance segmentation wizard, you remove the need to recreate a segment set with each report.
  • When used on keyword, RegEx can provide a refined focus on keyword trends within a site, leading to better decisions on PPC, content, and other marketing sources.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram