Regular Expressions
Table of Contents
Regular expressions is the technique of defining a sequence of characters to be used in a search pattern. Regular expressions are written in a specific syntax and then usually applied on a larger string of text to see if the string meets the conditions defined in the regular expression. Regular expressions have the general syntax of a pattern and modifier, like so:
HTML/pattern/modifiers
The pattern is the sequence of characters and the modifier is a single letter which changes the entire behavior of the regular expression.
Matching a Single Character
Let's start with matching a single character (searching a string of text to see if it contains it) as it is the most basic of search patterns. We'll use the United States Pledge of Allegiance as our sample text. Let's say we wanted to see if the character a
existed:
JAVASCRIPTconst text = "I pledge allegiance to the Flag of the United States of America, and to the Republic for which it stands, one Nation under God, indivisible, with liberty and justice for all.";
const regex = /a/;
console.log(text.match(regex));
HTML[
"a",
index: 9,
input: "I pledge allegiance to the Flag of the United States of America, and to the Republic for which it stands, one Nation under God, indivisible, with liberty and justice for all."
]
The resulting log is the pattern used, the index where it was first found, and then the entire input text used. In other words, we found our pattern, a
, at the 9th index in our text.
Using Meta Characters
With regular expressions we aren't limited to just straight text for our search patterns. We can utilize something called meta characters which are characters that have a special meaning. For example, a dot .
will match any character:
JAVASCRIPTconst text = "Staten Island isn't a real borough.";
const regex = /rea./;
console.log(text.match(regex));
HTML[
"real",
index: 22,
input: "Staten Island isn't a real borough."
]
Notice how even though the search pattern didn't include the full word real
, the .
filled in for the last letter and found real
at index 22
.
Here is a list of meta characters you can use in your regular expressions:
.
: Any character?
: Makes the previous character optional\w
: A word character\W
: A non-word character\d
: A digit\D
: A non-digit character\s
: A whitespace character\S
: A non-whitespace character\b
: A match at the beginning/end of a word\B
: A match not at the beginning/end of a word\0
: A NUL character\n
: A new line character\f
: A form feed character\r
: A carriage return character\t
: A tab character\v
: A vertical tab character
Using Pattern Modifiers
So far, our regular expressions have just been matching the first occurrence of our pattern. Sometimes that is just fine, but sometimes you want to find all occurrences of it. This is where modifiers come into play. You can use the global modifier to make your pattern search all of the text. For example:
JAVASCRIPTconst text = "Fifteen minutes could save you fifteen percent or more on car insurance.";
const regex = /fifteen/gi;
console.log(text.match(regex));
HTML(2) ["Fifteen", "fifteen"]
In this case, we searched for the word fifteen
using the global g
modifier. However, we also used the case-insensitive modifier so that we could also match Fifteen
despite having a capital F
. As expected, thanks to both modifiers, we found the word twice.
Here is the list of modifiers you can use:
i
: This makes the searching case-insensitiveg
: This makes the searching global which prevents it from stopping after the first matchm
: This makes the searching multiline instead of a single line
Matching using Sets
In our previous examples, we saw how to match against text and meta characters. We can, however, also match against sets, which are characters enclosed inside brackets. For example, the set [123]
will match either 1
, 2
, or 3
:
JAVASCRIPTconst text = "I think that 15% is a lot.";
const regex = /[123]/;
console.log(text.match(regex));
HTML[
"1",
index: 13,
input: "I think that 15% is a lot."
]
This regular expression matched the 1
at index 13. We can also negate this set with the ^
character:
JAVASCRIPTconst text = "I think that 15% is a lot.";
const regex = /[^123]/;
console.log(text.match(regex));
HTML[
"I",
index: 0,
input: "I think that 15% is a lot."
]
Because the first character I
, is NOT either 1
, 2
, or 3
, it was matched right away. Another way to represent [123]
is to create a range, like [1-3]
. This is useful for trying to match any number or any letter:
JAVASCRIPTconst number = /[0-9]/;
const letter = /[a-z]/;
Matching Words and Sentences
In addition to matching sets and characters, you can also match words and sentences by taking advantage of new symbols that denote repetition. After all, words and sentences are just repetitions of letters, with some punctuation.
Let's break down what it would mean to match sentences. A sentence starts off with a capital letter, so our regular expression can look like this:
JAVASCRIPTconst regex = /[A-Z]/g;
After the capital letter, we don't really care what comes after that, so we can use a .
to match anything after that.
JAVASCRIPTconst regex = /[A-Z]./g;
Now, we need to use repetition to repeat that .
as often as needed. We do that using the +
sign, which simply repeats the previous set or character.
JAVASCRIPTconst regex = /[A-Z].+/g;
Now we must define what ends a sentence. For the most part, the end of a sentence is the presence of either a period, question mark, or exclamation point. To represent "period, or question mark, or exclamation point", we can add this to our regular expression:
JAVASCRIPT(\.|?|!)
Our regular expression is almost done and now looks like this:
JAVASCRIPTconst regex = /[A-Z].+(\.|\?|!)/g;
There is just one final problem left unresolved. The way we have currently defined the regular expression, it will match all our text as one giant sentence, because technically it still begins with a capital letter and ends with either a period, question mark or exclamation point. We need to signal that we as soon as there is a match, to take it and move on. We do this with a question mark
meta character.
Our final regular expression looks like this:
JAVASCRIPTconst text = "Hello. I am sample text! How are you?"
const regex = /[A-Z].+?(\.|\?|!)/g;
console.log(text.match(regex));
HTML0: "Hello."
1: "I am sample text!"
2: "How are you?"
Here are all the repetition-related symbols, called quantifiers:
+
: This repeats the previous character or set one or more times*
: This repeats the previous character or set zero or more times?
: This repeats the previous character or set zero or one time{a}
: This repeats the previous character exactlya
number of times{a, b}
: This repeats the previous character any number betweena
andb
Testing and Validation
Another popular use case for regular expressions comes from the test
function, which, when used on a string, returns a boolean indicating whether or not a match was found. This is great for validating whether or not a string fits a specific format like a URL, email address, username, password, and much more. To test an entire line using a regular expression, you must indicate that with a ^
character at the start and a $
character at the end of the regular expression.
Let's practice by validating a username. For example, let's say your usernames must follow the following two rules:
- Alphanumeric characters plus dots and dashes only
- 6 to 12 characters in length
You can fulfill the first case using this:
JAVASCRIPTconst regex = /^[a-zA-Z0-9.-]$/;
What we did here was create a set that only includes lowercase a-z
, uppercase A-Z
, the numbers 0-9
and then the two extra characters .
and -
. Then to indicate that we want this to apply to the entire string, we wrapped it with a ^
and $
.
Now we enforce the size using one of the quantifiers we learned about above:
JAVASCRIPTconst regex = /^[a-zA-Z0-9.-]{6,12}$/;
We added {6,12}
so that the regex only matches when the length is within those constraints. Now let's actually test this out:
JAVASCRIPTconst regex = /^[a-zA-Z0-9.-]{6,12}$/;
const usernames = ['Username',
'U.sername',
'U53RN4M3',
'user',
'usern@me',
'usernamezzzzz'];
for (let name in usernames) {
console.log(usernames[name] + ": " + regex.test(usernames[name]));
}
HTMLUsername: true
U.sername: true
U53RN4M3: true
user: false
usern@me: false
usernamezzzzz: false
- JavaScript
When the username contained only the characters we defined and was between 6 and 12 characters in length, the test
method returned true
and otherwise returned false
.
Search and Replace
One last popular use for regular expressions is that they allow you to define powerful search and replace operations. Instead of the standard replacing one string with another, you can instead use regular expression to precisely match the text you want to replace, and then, using a callback function, precisely define what you would like to happen when text is matched.
To do this, we must learn one last new concept with regular expressions. If you wrap a pattern inside parenthesis, that defines a new group. The first group can then be referenced using $1
, the second group, if there is one, with $2
, and so on.
Here's a basic example of using groups to reverse the order of 3 words.
JAVASCRIPTconst text = "i love you"
const regex = /(\w+) (\w+) (\w+)/;
console.log(text.replace(regex, "$3 $2 $1"));
HTMLyou love i
We created three groups, each one matching to a single word which was mapped to $1
, $2
, and $3
. Then in our replace
method, we referenced them in reverse, effectively flipping all the words around.
Finally, let's try using the replace
method by passing in a callback function which will perform whatever operation we want on the matched text. Let's capitalize the word love
.
JAVASCRIPTconst text = "i love you"
const regex = /(\w+) (\w+) (\w+)/;
const result = text.replace(regex, function(string, group1, group2, group3) {
return group1 + " " + group2.toUpperCase() + " " + group3;
});
console.log(result);
HTMLi LOVE you
The regular expression itself remained the same. We're still dividing up i love you
into individual groups. However, this time we passed in a callback function. This function contained four parameters, the entire original string, and then the three defined groups. From there we just construct our desired output by applying the uppercase method on the word love
and then combining all three words together to form a new string.
Conclusion
The power of regular expressions is great and there is a quite a lot more you can still learn regarding them. As a closer for this lesson, here's a few more examples of usages that might be useful to you:
Find all words between five and seven characters
JAVASCRIPTconst text = "I pledge allegiance to the Flag of the United States of America, and to the Republic for which it stands, one Nation under God, indivisible, with liberty and justice for all.";
const regex = /\b\w{5,7}\b/g;
console.log(text.match(regex));
HTML(10) ["pledge", "United", "States", "America", "which", "stands", "Nation", "under", "liberty", "justice"]
Find all words longer than eight characters
JAVASCRIPTconst text = "I pledge allegiance to the Flag of the United States of America, and to the Republic for which it stands, one Nation under God, indivisible, with liberty and justice for all.";
const regex = /\b\w{9,}\b/g;
console.log(text.match(regex));
HTML(3) ["allegiance", "Republic", "indivisible"]
Find all words exactly five characters long
JAVASCRIPTconst text = "I pledge allegiance to the Flag of the United States of America, and to the Republic for which it stands, one Nation under God, indivisible, with liberty and justice for all.";
const regex = /\b\w{5}\b/g;
console.log(text.match(regex));
HTML(2) ["which", "under"]
Resources
- Getting Started with Solid
- Getting Started with Svelte
- Getting Started with Express
- Create an RSS Reader in Node
- How to deploy a Deno app using Docker
- How to deploy an Express app using Docker
- Getting Started with Sass
- Using Puppeteer and Jest for End-to-End Testing
- Getting Started with Handlebars.js
- Creating a Twitter bot with Node.js
- Getting Started with Vuex: Managing State in Vue
- How To Create a Modal Popup Box with CSS and JavaScript