Regular Expressions

Regular Expressions
1and1 placeholder image

Regular expressions is the technique of defining a sequence of characters to be used in a search pattern. Regular expressions are written in a specific syntax and then usually applied on a larger string of text to see if the string meets the conditions defined in the regular expression. Regular expressions have the general syntax of a pattern and modifier, like so:

	
    /pattern/modifiers
	

The pattern is the sequence of characters and the modifier is a single letter which changes the entire behavior of the regular expression.

Matching a Single Character

Let's start with matching a single character (searching a string of text to see if it contains it) as it is the most basic of search patterns. We'll use the United States Pledge of Allegiance as our sample text. Let's say we wanted to see if the character a existed:

	
    var text = "I pledge allegiance to the Flag of the United States of America, and to the Republic for which it stands, one Nation under God, indivisible, with liberty and justice for all.";
    var regex = /a/;

    console.log(text.match(regex));
	
	
    [
        "a",
        index: 9,
        input: "I pledge allegiance to the Flag of the United States of America, and to the Republic for which it stands, one Nation under God, indivisible, with liberty and justice for all."
    ]
	

The resulting log is the pattern used, the index where it was first found, and then the entire input text used. In other words, we found our pattern, a, at the 9th index in our text.

Using Meta Characters

With regular expressions we aren't limited to just straight text for our search patterns. We can utilize something called meta characters which are characters that have a special meaning. For example, a dot . will match any character:

	
    var text = "Staten Island isn't a real borough.";
    var regex = /rea./;

    console.log(text.match(regex));
	
	
    [
        "real",
        index: 22,
        input: "Staten Island isn't a real borough."
    ]
	

Notice how even though the search pattern didn't include the full word real, the . filled in for the last letter and found real at index 22.

Here is a list of meta characters you can use in your regular expressions:

  • .: Any character
  • ?: Makes the previous character optional
  • \w: A word character
  • \W: A non-word character
  • \d: A digit
  • \D: A non-digit character
  • \s: A whitespace character
  • \S: A non-whitespace character
  • \b: A match at the beginning/end of a word
  • \B: A match not at the beginning/end of a word
  • \0: A NUL character
  • \n: A new line character
  • \f: A form feed character
  • \r: A carriage return character
  • \t: A tab character
  • \v: A vertical tab character
udemy2 placeholder image

Using Pattern Modifiers

So far, our regular expressions have just been matching the first occurrence of our pattern. Sometimes that is just fine, but sometimes you want to find all occurrences of it. This is where modifiers come into play. You can use the global modifier to make your pattern search all of the text.

	
    var text = "Fifteen minutes could save you fifteen percent or more on car insurance.";
    var regex = /fifteen/gi;

    console.log(text.match(regex));
	
	
    (2) ["Fifteen", "fifteen"]
	

In this case, we searched for the word fifteen using the global g modifier. However, we also used the case-insensitive modifier so that we could also match Fifteen despite having a capital F. As expected, thanks to both modifiers, we found the word twice.

Here is the list of modifiers you can use:

  • i: This makes the searching case-insensitive
  • g: This makes the searching global which prevents it from stopping after the first match
  • m: This makes the searching multiline instead of a single line

Matching using Sets

In our previous examples, we saw how to match against text and meta characters. We can, however, also match against sets, which are characters enclosed inside brackets. For example, the set [123] will match either 1, 2, or 3:

	
    var text = "I think that 15% is a lot.";
    var regex = /[123]/;

    console.log(text.match(regex));
	
	
    [
        "1",
        index: 13,
        input: "I think that 15% is a lot."
    ]
	

This regular expression matched the 1 at index 13. We can also negate this set with the ^ character:

	
    var text = "I think that 15% is a lot.";
    var regex = /[^123]/;

    console.log(text.match(regex));
	
	
    [
        "I",
        index: 0,
        input: "I think that 15% is a lot."
    ]
	

Because the first character I, is NOT either 1, 2, or 3, it was matched right away. Another way to represent [123] is to create a range, like [1-3]. This is useful for trying to match any number or any letter:

	
    var number = /[0-9]/;
    var letter = /[a-z]/;
	

Matching Words and Sentences

In addition to matching sets and characters, you can also match words and sentences by taking advantage of new symbols that denote repetition. After all, words and sentences are just repetitions of letters, with some punctuation.

Let's break down what it would mean to match sentences. A sentence starts off with a capital letter, so our regular expression can look like this:

	
    var regex = /[A-Z]/g;
	

After the capital letter, we don't really care what comes after that, so we can use a . to match anything after that.

	
    var regex = /[A-Z]./g;
	

Now, we need to use repetition to repeat that . as often as needed. We do that using the + sign, which simply repeats the previous set or character.

	
    var regex = /[A-Z].+/g;
	

Now we must define what ends a sentence. For the most part, the end of a sentence is the presence of either a period, question mark, or exclamation point. To represent "period, or question mark, or exclamation point", we can add this to our regular expression:

	
    (\.|?|!)
	

Our regular expression is almost done and now looks like this:

	
    var regex = /[A-Z].+(\.|\?|!)/g;
	

There is just one final problem left unresolved. The way we have currently defined the regular expression, it will match all our text as one giant sentence, because technically it still begins with a capital letter and ends with either a period, question mark or exclamation point. We need to signal that we as soon as there is a match, to take it and move on. We do this with a question mark meta character.

Our final regular expression looks like this:

	
    var text = "Hello. I am sample text! How are you?"
    var regex = /[A-Z].+?(\.|\?|!)/g;

    console.log(text.match(regex));
	
	
    0: "Hello."
    1: "I am sample text!"
    2: "How are you?"
	

Here are all the repetition-related symbols, called quantifiers:

  • +: This repeats the previous character or set one or more times
  • *: This repeats the previous character or set zero or more times
  • ?: This repeats the previous character or set zero or one time
  • {a}: This repeats the previous character exactly a number of times
  • {a, b}: This repeats the previous character any number between a and b
namecheap placeholder image

Testing and Validation

Another popular use case for regular expressions comes from the test function, which, when used on a string, returns a boolean indicating whether or not a match was found. This is great for validating whether or not a string fits a specific format like a URL, email address, username, password, and much more. To test an entire line using a regular expression, you must indicate that with a ^ character at the start and a $ character at the end of the regular expression.

Let's practice by validating a username. For example, let's say your usernames must follow the following two rules:

  1. Alphanumeric characters plus dots and dashes only
  2. 6 to 12 characters in length

You can fulfill the first case using this:

	
    var regex = /^[a-zA-Z0-9.-]$/;
	

What we did here was create a set that only includes lowercase a-z, uppercase A-Z, the numbers 0-9 and then the two extra characters . and -. Then to indicate that we want this to apply to the entire string, we wrapped it with a ^ and $.

Now we enforce the size using one of the quantifiers we learned about above:

	
    var regex = /^[a-zA-Z0-9.-]{6,12}$/;
	

We added {6,12} so that the regex only matches when the length is within those constraints. Now let's actually test this out:

	
    var regex = /^[a-zA-Z0-9.-]{6,12}$/;
    var usernames = ['Username',
                    'U.sername',
                    'U53RN4M3',
                    'user',
                    '[email protected]',
                    'usernamezzzzz'];
    for (var name in usernames) {
        console.log(usernames[name] + ": " + regex.test(usernames[name]));
    }
	
	
    Username: true
    U.sername: true
    U53RN4M3: true
    user: false
    [email protected]: false
    usernamezzzzz: false
	
  • JS
Run

When the username contained only the characters we defined and was between 6 and 12 characters in length, the test method returned true and otherwise returned false.

Search and Replace

One last popular use for regular expressions is that they allow you to define powerful search and replace operations. Instead of the standard replacing one string with another, you can instead use regular expression to precisely match the text you want to replace, and then, using a callback function, precisely define what you would like to happen when text is matched.

To do this, we must learn one last new concept with regular expressions. If you wrap a pattern inside parenthesis, that defines a new group. The first group can then be referenced using $1, the second group, if there is one, with $2, and so on.

Here's a basic example of using groups to reverse the order of 3 words.

	
    var text = "i love you"
    var regex = /(\w+) (\w+) (\w+)/;

    console.log(text.replace(regex, "$3 $2 $1"));
	
	
    you love i
	

We created three groups, each one matching to a single word which was mapped to $1, $2, and $3. Then in our replace method, we referenced them in reverse, effectively flipping all the words around.

Finally, let's try using the replace method by passing in a callback function which will perform whatever operation we want on the matched text. Let's capitalize the word love.

	
    var text = "i love you"
    var regex = /(\w+) (\w+) (\w+)/;

    var result = text.replace(regex, function(string, group1, group2, group3) {
        return group1 + " " + group2.toUpperCase() + " " + group3;
    });

    console.log(result);
	
	
    i LOVE you
	

The regular expression itself remained the same. We're still dividing up i love you into individual groups. However, this time we passed in a callback function. This function contained four parameters, the entire original string, and then the three defined groups. From there we just construct our desired output by applying the uppercase method on the word love and then combining all three words together to form a new string.

Conclusion

The power of regular expressions is great and there is a quite a lot more you can still learn regarding them. As a closer for this lesson, here's a few more examples of usages that might be useful to you:

Find all words between five and seven characters

	
    var text = "I pledge allegiance to the Flag of the United States of America, and to the Republic for which it stands, one Nation under God, indivisible, with liberty and justice for all.";
    var regex = /\b\w{5,7}\b/g;

    console.log(text.match(regex));
	
	
    (10) ["pledge", "United", "States", "America", "which", "stands", "Nation", "under", "liberty", "justice"]
	

Find all words longer than eight characters

	
    var text = "I pledge allegiance to the Flag of the United States of America, and to the Republic for which it stands, one Nation under God, indivisible, with liberty and justice for all.";
    var regex = /\b\w{9,}\b/g;

    console.log(text.match(regex));
	
	
    (3) ["allegiance", "Republic", "indivisible"]
	

Find all words exactly five characters long

	
    var text = "I pledge allegiance to the Flag of the United States of America, and to the Republic for which it stands, one Nation under God, indivisible, with liberty and justice for all.";
    var regex = /\b\w{5}\b/g;

    console.log(text.match(regex));
	
	
    (2) ["which", "under"]
	
1and1 placeholder image

Resources