Java Regex

What is a Regular Expression (Regex)?

A regular expression (regex) is a pattern that specifies a set of strings. In Java, regular expressions are used for pattern matching within strings. They provide a powerful and flexible way to perform tasks like searching, extracting, and replacing text.

Java supports regular expressions through the java.util.regex package, which includes the following important classes:

Pattern: A compiled representation of a regular expression.

Matcher: Used to perform the matching operations on an input string.

Basic Syntax of Java Regular Expressions

A regex pattern in Java can consist of literals, metacharacters, and quantifiers that define the string matching behavior. Below is a breakdown of the basic syntax and examples:

1. Literals: A literal character matches itself.

Example:


//LiteralExample.java file
import java.util.regex.*;

public class LiteralExample {
    public static void main(String[] args) {
        String input = "hello world!";
        
        // Regex: Match the literal "hello"
        String regex = "hello";
        
        // Create a Pattern object
        Pattern pattern = Pattern.compile(regex);
        
        // Create a Matcher object
        Matcher matcher = pattern.matcher(input);
        
        // Find and print matches
        if (matcher.find()) {
            System.out.println("Found literal: " + matcher.group());  // Output: "hello"
        } else {
            System.out.println("No match found.");
        }
    }
}

Output:

Found literal: hello

Explanation: The regex “hello” matches the literal string “hello” in the input “hello world!”.

2. Metacharacters: Special characters that have a specific meaning in regex. These are:

. (dot) — Matches any single character except newline.

Example: a.c will match abc, axc, etc., but not ac.


//DotExample.java file
import java.util.regex.*;

public class DotExample {
    public static void main(String[] args) {
        String input = "abc acd axd";
        
        // Regex: Match any character between "a" and "c"
        String regex = "a.c";
        
        // Create a Pattern object
        Pattern pattern = Pattern.compile(regex);
        
        // Create a Matcher object
        Matcher matcher = pattern.matcher(input);
        
        // Find and print matches
        while (matcher.find()) {
            System.out.println("Found match: " + matcher.group());  // Output: "abc", "acd", "axd"
        }
    }
}

Output:

Found match: abc
Found match: acd
Found match: axd

Explanation: The . matches any single character, so “a.c” matches abc, acd, and axd.

^ (caret) — Anchors the match to the beginning of a string.

Example:


//CaretExample.java file
import java.util.regex.*;

public class CaretExample {
    public static void main(String[] args) {
        String input = "hello world";
        
        // Regex: Match "hello" at the start of the string
        String regex = "^hello";
        
        // Create a Pattern object
        Pattern pattern = Pattern.compile(regex);
        
        // Create a Matcher object
        Matcher matcher = pattern.matcher(input);
        
        // Find and print matches
        if (matcher.find()) {
            System.out.println("Found match: " + matcher.group());  // Output: "hello"
        }
    }
}

Output:

Found match: hello

Explanation: The ^ ensures that “hello” must appear at the start of the string.

$ (dollar) — Anchors the match to the end of a string.

Example: abc$ will match “abc” only if it is at the end of the string.


//DollarExample.java file
import java.util.regex.*;

public class DollarExample {
    public static void main(String[] args) {
        String input = "hello world";
        
        // Regex: Match "world" at the end of the string
        String regex = "world$";
        
        // Create a Pattern object
        Pattern pattern = Pattern.compile(regex);
        
        // Create a Matcher object
        Matcher matcher = pattern.matcher(input);
        
        // Find and print matches
        if (matcher.find()) {
            System.out.println("Found match: " + matcher.group());  // Output: "world"
        }
    }
}

Output:

Found match: world

Explanation: The $ ensures that “world” must appear at the end of the string.

(asterisk) — Matches zero or more occurrences of the preceding element.

Example: a*b matches b, ab, aab, aaab, etc.

(plus) — Matches one or more occurrences of the preceding element.

Example: a+b matches ab, aab, aaab, but not b.

? (question mark) — Matches zero or one occurrence of the preceding element.

Example: a?b matches b and ab.

{} (curly braces) — Specifies the exact number of occurrences.

Example: a{2} matches exactly two as, i.e., aa.

[] (square brackets) — Matches any one of the characters inside the brackets.

Example: [abc] matches either a, b, or c.

3. Character Classes:

\d: Matches any digit (0-9).

Example 1: \d – Matches a digit


//DigitExample.java file
import java.util.regex.*;

public class DigitExample {
    public static void main(String[] args) {
        String input = "There are 123 apples";
        
        // Regex: Match digits
        String regex = "\\d";
        
        // Create a Pattern object
        Pattern pattern = Pattern.compile(regex);
        
        // Create a Matcher object
        Matcher matcher = pattern.matcher(input);
        
        // Find and print matches
        while (matcher.find()) {
            System.out.println("Found digit: " + matcher.group());  // Output: "1", "2", "3"
        }
    }
}

Output:

Found digit: 1
Found digit: 2
Found digit: 3

Explanation: The \d matches any digit from 0-9.

\D: Matches any non-digit character.

\w: Matches any word character (letters, digits, or underscore).

Example: \w – Matches a word character (letters, digits, underscores)


//WordCharacterExample.java file
import java.util.regex.*;

public class WordCharacterExample {
    public static void main(String[] args) {
        String input = "user_123";
        
        // Regex: Match word characters
        String regex = "\\w";
        
        // Create a Pattern object
        Pattern pattern = Pattern.compile(regex);
        
        // Create a Matcher object
        Matcher matcher = pattern.matcher(input);
        
        // Find and print matches
        while (matcher.find()) {
            System.out.println("Found word character: " + matcher.group());  // Output: "u", "s", "e", "_", "1", "2", "3"
        }
    }
}

Output:

Found word character: u
Found word character: s
Found word character: e
Found word character: _
Found word character: 1
Found word character: 2
Found word character: 3

Explanation: The \w matches any letter, digit, or underscore.

\W: Matches any non-word character.

\s: Matches any whitespace character (space, tab, newline).

Example 3: \s – Matches whitespace characters


//WhitespaceExample.java file
import java.util.regex.*;

public class WhitespaceExample {
    public static void main(String[] args) {
        String input = "Hello world! How are you?";
        
        // Regex: Match whitespace characters
        String regex = "\\s";
        
        // Create a Pattern object
        Pattern pattern = Pattern.compile(regex);
        
        // Create a Matcher object
        Matcher matcher = pattern.matcher(input);
        
        // Find and print matches
        while (matcher.find()) {
            System.out.println("Found whitespace: " + matcher.group());  // Output: " " (space)
        }
    }
}

Output:

Found whitespace:
Found whitespace:
Found whitespace:
Found whitespace:

Explanation: The \s matches any whitespace character like space, tab, or newline.

\S: Matches any non-whitespace character.

4. Groups and Alternation

() (parentheses) — Groups patterns.

Example: () (Group) – Groups multiple characters


//GroupExample.java file
import java.util.regex.*;

public class GroupExample {
    public static void main(String[] args) {
        String input = "cat bat mat";
        
        // Regex: Group and match "at"
        String regex = "(cat|bat|mat)";
        
        // Create a Pattern object
        Pattern pattern = Pattern.compile(regex);
        
        // Create a Matcher object
        Matcher matcher = pattern.matcher(input);
        
        // Find and print matches
        while (matcher.find()) {
            System.out.println("Found group match: " + matcher.group());  // Output: "cat", "bat", "mat"
        }
    }
}

Output:

Found group match: cat
Found group match: bat
Found group match: mat

Explanation: The (cat|bat|mat) group matches any of the three words cat, bat, or mat.

| (pipe) — Alternation, matches either the pattern before or after the pipe.

Example: | (Alternation) – Matches either one pattern or another


//AlternationExample.java file
import java.util.regex.*;

public class AlternationExample {
    public static void main(String[] args) {
        String input = "John Mark Tom";
        
        // Regex: Match either "John" or "Tom"
        String regex = "John|Tom";
        
        // Create a Pattern object
        Pattern pattern = Pattern.compile(regex);
        
        // Create a Matcher object
        Matcher matcher = pattern.matcher(input);
        
        // Find and print matches
        while (matcher.find()) {
            System.out.println("Found alternation match: " + matcher.group());  // Output: "John", "Tom"
        }
    }
}

Output:

Found alternation match: John
Found alternation match: Tom

Explanation: The John|Tom alternation matches either John or Tom in the string.