What is a Regular Expression (Regex)?
A regular expression (regex) is a pattern that specifies a set of strings. In Java, regular expressions are used for pattern matching within strings. They provide a powerful and flexible way to perform tasks like searching, extracting, and replacing text.
Java supports regular expressions through the java.util.regex package, which includes the following important classes:
Pattern: A compiled representation of a regular expression.
Matcher: Used to perform the matching operations on an input string.
Basic Syntax of Java Regular Expressions
A regex pattern in Java can consist of literals, metacharacters, and quantifiers that define the string matching behavior. Below is a breakdown of the basic syntax and examples:
1. Literals: A literal character matches itself.
Example:
//LiteralExample.java file
import java.util.regex.*;
public class LiteralExample {
public static void main(String[] args) {
String input = "hello world!";
// Regex: Match the literal "hello"
String regex = "hello";
// Create a Pattern object
Pattern pattern = Pattern.compile(regex);
// Create a Matcher object
Matcher matcher = pattern.matcher(input);
// Find and print matches
if (matcher.find()) {
System.out.println("Found literal: " + matcher.group()); // Output: "hello"
} else {
System.out.println("No match found.");
}
}
}
Output:
Explanation: The regex “hello” matches the literal string “hello” in the input “hello world!”.
2. Metacharacters: Special characters that have a specific meaning in regex. These are:
. (dot) — Matches any single character except newline.
Example: a.c will match abc, axc, etc., but not ac.
//DotExample.java file
import java.util.regex.*;
public class DotExample {
public static void main(String[] args) {
String input = "abc acd axd";
// Regex: Match any character between "a" and "c"
String regex = "a.c";
// Create a Pattern object
Pattern pattern = Pattern.compile(regex);
// Create a Matcher object
Matcher matcher = pattern.matcher(input);
// Find and print matches
while (matcher.find()) {
System.out.println("Found match: " + matcher.group()); // Output: "abc", "acd", "axd"
}
}
}
Output:
Found match: acd
Found match: axd
Explanation: The . matches any single character, so “a.c” matches abc, acd, and axd.
^ (caret) — Anchors the match to the beginning of a string.
Example:
//CaretExample.java file
import java.util.regex.*;
public class CaretExample {
public static void main(String[] args) {
String input = "hello world";
// Regex: Match "hello" at the start of the string
String regex = "^hello";
// Create a Pattern object
Pattern pattern = Pattern.compile(regex);
// Create a Matcher object
Matcher matcher = pattern.matcher(input);
// Find and print matches
if (matcher.find()) {
System.out.println("Found match: " + matcher.group()); // Output: "hello"
}
}
}
Output:
Explanation: The ^ ensures that “hello” must appear at the start of the string.
$ (dollar) — Anchors the match to the end of a string.
Example: abc$ will match “abc” only if it is at the end of the string.
//DollarExample.java file
import java.util.regex.*;
public class DollarExample {
public static void main(String[] args) {
String input = "hello world";
// Regex: Match "world" at the end of the string
String regex = "world$";
// Create a Pattern object
Pattern pattern = Pattern.compile(regex);
// Create a Matcher object
Matcher matcher = pattern.matcher(input);
// Find and print matches
if (matcher.find()) {
System.out.println("Found match: " + matcher.group()); // Output: "world"
}
}
}
Output:
Explanation: The $ ensures that “world” must appear at the end of the string.
(asterisk) — Matches zero or more occurrences of the preceding element.
Example: a*b matches b, ab, aab, aaab, etc.
(plus) — Matches one or more occurrences of the preceding element.
Example: a+b matches ab, aab, aaab, but not b.
? (question mark) — Matches zero or one occurrence of the preceding element.
Example: a?b matches b and ab.
{} (curly braces) — Specifies the exact number of occurrences.
Example: a{2} matches exactly two as, i.e., aa.
[] (square brackets) — Matches any one of the characters inside the brackets.
Example: [abc] matches either a, b, or c.
3. Character Classes:
\d: Matches any digit (0-9).
Example 1: \d – Matches a digit
//DigitExample.java file
import java.util.regex.*;
public class DigitExample {
public static void main(String[] args) {
String input = "There are 123 apples";
// Regex: Match digits
String regex = "\\d";
// Create a Pattern object
Pattern pattern = Pattern.compile(regex);
// Create a Matcher object
Matcher matcher = pattern.matcher(input);
// Find and print matches
while (matcher.find()) {
System.out.println("Found digit: " + matcher.group()); // Output: "1", "2", "3"
}
}
}
Output:
Found digit: 2
Found digit: 3
Explanation: The \d matches any digit from 0-9.
\D: Matches any non-digit character.
\w: Matches any word character (letters, digits, or underscore).
Example: \w – Matches a word character (letters, digits, underscores)
//WordCharacterExample.java file
import java.util.regex.*;
public class WordCharacterExample {
public static void main(String[] args) {
String input = "user_123";
// Regex: Match word characters
String regex = "\\w";
// Create a Pattern object
Pattern pattern = Pattern.compile(regex);
// Create a Matcher object
Matcher matcher = pattern.matcher(input);
// Find and print matches
while (matcher.find()) {
System.out.println("Found word character: " + matcher.group()); // Output: "u", "s", "e", "_", "1", "2", "3"
}
}
}
Output:
Found word character: s
Found word character: e
Found word character: _
Found word character: 1
Found word character: 2
Found word character: 3
Explanation: The \w matches any letter, digit, or underscore.
\W: Matches any non-word character.
\s: Matches any whitespace character (space, tab, newline).
Example 3: \s – Matches whitespace characters
//WhitespaceExample.java file
import java.util.regex.*;
public class WhitespaceExample {
public static void main(String[] args) {
String input = "Hello world! How are you?";
// Regex: Match whitespace characters
String regex = "\\s";
// Create a Pattern object
Pattern pattern = Pattern.compile(regex);
// Create a Matcher object
Matcher matcher = pattern.matcher(input);
// Find and print matches
while (matcher.find()) {
System.out.println("Found whitespace: " + matcher.group()); // Output: " " (space)
}
}
}
Output:
Found whitespace:
Found whitespace:
Found whitespace:
Explanation: The \s matches any whitespace character like space, tab, or newline.
\S: Matches any non-whitespace character.
4. Groups and Alternation
() (parentheses) — Groups patterns.
Example: () (Group) – Groups multiple characters
//GroupExample.java file
import java.util.regex.*;
public class GroupExample {
public static void main(String[] args) {
String input = "cat bat mat";
// Regex: Group and match "at"
String regex = "(cat|bat|mat)";
// Create a Pattern object
Pattern pattern = Pattern.compile(regex);
// Create a Matcher object
Matcher matcher = pattern.matcher(input);
// Find and print matches
while (matcher.find()) {
System.out.println("Found group match: " + matcher.group()); // Output: "cat", "bat", "mat"
}
}
}
Output:
Found group match: bat
Found group match: mat
Explanation: The (cat|bat|mat) group matches any of the three words cat, bat, or mat.
| (pipe) — Alternation, matches either the pattern before or after the pipe.
Example: | (Alternation) – Matches either one pattern or another
//AlternationExample.java file
import java.util.regex.*;
public class AlternationExample {
public static void main(String[] args) {
String input = "John Mark Tom";
// Regex: Match either "John" or "Tom"
String regex = "John|Tom";
// Create a Pattern object
Pattern pattern = Pattern.compile(regex);
// Create a Matcher object
Matcher matcher = pattern.matcher(input);
// Find and print matches
while (matcher.find()) {
System.out.println("Found alternation match: " + matcher.group()); // Output: "John", "Tom"
}
}
}
Output:
Found alternation match: Tom
Explanation: The John|Tom alternation matches either John or Tom in the string.