Using Regular Expressions in Zookeeper

» Back to Documentation

A regular expression is a token-based language used to build patterns that can match and optionally transform any string into any other string. Zookeeper uses regular expressions to allow power users to specify text search queries that select objects in a powerful way.

There is a lot of material online covering regular expressions and many introduction tutorials. Here we will be covering the very basics of Regex and how it applies to searching entities in Zookeeper.

Searching

A regular expression string consists of one or more tokens. Each token is designed to match some part of an object's name. This is very similar to wildcard-based search where you have a star (*) character that matches any part of a string. For example, in wildcard syntax "Sph*" will match "Sphere" and "Sphynx" because * character matches both "ere" and "ynx" (or any other string). The only requirement is the presence of "Sph".

Much like wildcard, regular expressions contain tokens that match cartain parts of a string. There is an elaborate syntax that is designed to match any type of characters. Characters is important here because essentially that is what regex matches- groups of characters. First a character group is specified and then the number of such characters that you expect. A character group is enclosed with square brackes [] and whatever is inside is included in a group. Later a number of expected characters is specified, exclosed in figure brackets {} as a range.

Consider the following example: "[a-c]{1-2}"
This is saying: Match one or two characters 'a' through 'c'
This will match: ant, acrobat, balcony, colony- because they all contain 1 or 2 'a','b', or 'c' characters
This will NOT match: goose, Denver, house- because neither of them contain the specified character group

The square brackets can contain case-sensitive character ranges (ex. [a-z], [A-Z], [a-zA-Z]), number ranges (ex. [0-9]), special characters preceded by '\' (ex. [\[\.\(]) or any combination of these (ex. [a-z0-9\.]). You can also have groups that contain all characters except those specified if you start a group with ^ (ex. [^g] specifies all characters except 'g').

To make things simpler special symbols are defined to represent certain character groups and can be used instead of them. Most common of these are:

\d - matches all digits (same as [0-9])
\w - matches all 'word' characters, including digits (same as [a-zA-Z0-9])
. - matches 'any' character

If you just want to match a single character (instead of a set) you do not need to use square brackets. For example "Sph" will match one instance of 'S', 'p', 'h' characters consequtively thus matching both "Sphere" and "Sphynx" since they both contain that string.

The range is surrounded by figure brackets {} and is two comma separated integers. However, there are special symbols defined to make this simpler:

? - Zero or one instances (same as {0,1})
* - Zero or more instances (same as {0,})
+ - One or more instances (same as {1,})

Now we can understand that using a regular expression like "\d+" will match any string where there is one or more digit. For example it will match "Sphere01" but not "SphereOne". Likewise, "Sphere\d?" will match any string starting with "Sphere" and ending with zero or one digit. It will match "Sphere" and "Sphere1" but not "SphereOne".

Combining these tokens together can yield surprisingly strong syntax for custom selections that cover pretty much any pattern you can imagine.

Replacing/Renaming

Zookeeper provides a rename dialog for renaming objects that can use regular expression syntax. Just as in searching, Regex provides a powerful way of modifying/renaming strings. Here we will only skin the surface and cover the basics. If you like what you see please consider reading a deeper regular expressions manual.

The basic idea is that you split your search regex pattern into groups. Then you can manipulate and use these groups in the rename pattern. A group in regex is defined using normal brackets (). Consider the following example:

"(Sph)(ere)"

This takes a string "Sphere" and splits it into two groups "Sph" and "ere". In the rename pattern we then type any string that contains these groups somewhere in it. Groups are always marked by a dollar sign character $ followed by the group's index. In this case "Sph" is a group with index 1 and "ere" has index 2.

Using "$1_$2" will insert an underscore between our groups and produce "Sph_ere". Using "$2$1" will reverse the order and produce "ereSph". As we learned earlier searching pattern can match any character set of a string instead of just specific characters. So, for example, having a search pattern "([a-zA-Z]+)(\d+)" will match any name starting with letter characters and ending with digits. Moreover it will place the letter characters into first group and digit characters into the second. Therefore, replacing with a pattern like "$1_$2" will separate the digits and letters by an underscore. For example, "Torus01" becomes "Torus_01", "Sphere5" becomes "Sphere_5" and so on.

We have covered the basics. However, deeper investigation into regular expression will reveal many more powerful gems such as look ahead and look behind groups, and named groups. Please consult a more appropriate Regex reference if you have interest at this point.