What Is a Regular Expression?
A regular expression (regex) is a sequence of characters that defines a search pattern. Use it to find, match, validate, or replace text. Regex is built into virtually every programming language and text editor — once you know it, you'll find uses for it constantly.
The learning curve is real. Regex looks like someone's cat walked across a keyboard: ^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$. But there's logic to it, and the most useful patterns are simpler than they look.
Regex Cheat Sheet: Core Syntax
| Symbol | Meaning | Example | Matches |
|--------|---------|---------|---------|
| . | Any character except newline | a.c | abc, a1c, a c |
| * | 0 or more of previous | ab*c | ac, abc, abbc |
| + | 1 or more of previous | ab+c | abc, abbc (not ac) |
| ? | 0 or 1 of previous | colou?r | color, colour |
| ^ | Start of string | ^Hello | Only if string starts with Hello |
| $ | End of string | world$ | Only if string ends with world |
| \d | Any digit (0–9) | \d{3} | 123, 456 |
| \w | Word character (a–z, A–Z, 0–9, _) | \w+ | hello, user_123 |
| \s | Whitespace | \s+ | Spaces, tabs, newlines |
| [abc] | Any of these characters | [aeiou] | Any vowel |
| [^abc] | None of these characters | [^0-9] | Anything that's not a digit |
| {n,m} | Between n and m repetitions | \d{2,4} | 12, 123, 1234 |
| (...) | Capture group | (foo)+ | foo, foofoo |
| \| | Or | cat\|dog | cat or dog |
These 14 patterns cover roughly 90% of what you'll actually use.
Real-World Regex Patterns
Here's where regex earns its place. Patterns for common validation tasks:
Email address (basic):
^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$
Matches: user@example.com, first.last@company.org
Doesn't match: notanemail, missing@domain
US phone number (flexible):
^\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$
Matches: 555-867-5309, (555) 867-5309, 5558675309
URL (simple):
https?://[\w.-]+\.[a-zA-Z]{2,}(/[\w./?=&%-]*)?
Matches: https://example.com, http://sub.domain.org/path?q=1
IP address (IPv4):
^(\d{1,3}\.){3}\d{1,3}$
Matches: 192.168.1.1, 10.0.0.1 (note: this matches numbers above 255 — full validation requires more complexity)
Hex color code:
^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
Matches: #FF5733, #fff, #a3b2c1
The Greedy vs. Lazy Problem
By default, regex quantifiers are greedy — they match as much as possible. This bites people constantly.
Given the string: <b>first</b> and <b>second</b>
The pattern <b>.*</b> matches the entire string from the first <b> to the last </b>. Not what you wanted.
The lazy version <b>.*?</b> (note the ? after *) matches each <b>...</b> pair separately.
Whenever you're matching something between delimiters, add ? after your quantifier to make it lazy. Saves a lot of debugging.
Three Ways to Write Better Regex
1. Test Before You Ship
Never write a regex pattern and assume it works. Edge cases are where regex fails. Use a regex tester to validate against both matching strings and strings that should not match. A pattern that catches user@example.com but also accepts not@valid@email.com is broken. Testing takes 60 seconds and saves debugging sessions.
2. Use Named Capture Groups for Readability
Instead of (\d{4})-(\d{2})-(\d{2}) for a date pattern, use named groups:
(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})
Now when you access the match in code, match.group('year') is self-documenting. Six months later, you won't have to figure out what group 2 was.
3. Comment Complex Patterns
Most regex engines support a verbose or extended mode where whitespace and comments are allowed inside the pattern. In Python's re module, pass re.VERBOSE as a flag. The same monstrous email pattern becomes readable when broken into annotated lines. Complex patterns you can't explain in a comment are patterns you'll break when you edit them.
When Not to Use Regex
Regex is powerful but not always the right tool. Parsing HTML with regex is notoriously unreliable — use a proper HTML parser instead. Validating complex data structures (JSON, dates with calendar logic, nested formats) is better handled with dedicated validators. And anything requiring stateful parsing — like balanced parentheses — can't be done with standard regex at all.
The rule of thumb: if your regex has more than 3 levels of nesting or needs to handle context across different parts of the string, you're probably past what regex should be doing.
Common Regex Mistakes That Waste Hours
Forgetting to escape special characters. The dot . matches any character, so if you write 192.168.1.1 as a pattern, it'll also match 192x168y1z1. You need 192\.168\.1\.1 to match a literal IP address. Same goes for parentheses, brackets, plus signs, and question marks — if you mean the literal character, escape it with \.
Anchors missing from validation patterns. Without ^ and $, your "email validation" pattern will happily match xxxuser@example.comxxx because the valid email exists somewhere in the string. Always anchor validation patterns to the start and end of the input.
Catastrophic backtracking. Some patterns cause the regex engine to try an exponential number of paths. The classic example is (a+)+b tested against aaaaaaaaaaaaaaac — the engine backtracks millions of times trying every possible way to split those as between the inner and outer groups. In a web application, this can hang your server. If a pattern takes noticeably longer on non-matching inputs, you've probably got a backtracking problem. Rewrite the pattern to be more specific or use atomic groups if your engine supports them.
Testing only happy paths. Your email regex matches user@example.com? Great. But does it correctly reject user@@example.com, @example.com, or user@.com? A regex that accepts valid input is half the job. The other half is rejecting invalid input without false positives. Always test both sides.
Try It Yourself
The fastest way to learn regex is to test patterns against real strings and see exactly what matches. Our Regex Tester lets you write a pattern, paste your test string, and see all matches highlighted in real time — no code required.
Regex is a foundational developer skill — check out our Developer Daily Toolkit for the full set of tools and techniques that make up daily development work. If you're working with structured data alongside text processing, our JSON Formatter pairs well for when you need to inspect or clean the data you're running patterns against.