This is a lab exercise on developing secure software. For more information, see the introduction to the labs.
Learn how to create simple regular expressions for input validation.
Regular expressions (regexes) are a widely-used notation for expressing text patterns. Regexes can be used to validate input; when used correctly, they can counter many attacks.
Different regex languages have slightly different notations, but they have much in common. Here are some basic rules for regex notations:
We want to use regexes to validate input. That is, the input should completely match the regex pattern. In regexes you can do this by using its default mode (not a "multiline" mode), prepending some symbol, and appending a different symbol. Unfortunately, different platforms use different regex symbols for performing a complete match to an input. The following table shows a summarized version of what you should prepend and append for many different platforms (for their default regex system).
Platform | Prepend | Append |
---|---|---|
POSIX BRE, POSIX ERE, and ECMAScript (JavaScript) | “^” | “$” |
Java, .NET, PHP, Perl, and PCRE | “^” or “\A” | “\z” |
Golang, Rust crate regex, and RE2 | “^” or “\A” | “$” or “\z” |
Python | “^” or “\A” | “\Z” (not “\z”) |
Ruby | “\A” | “\z” |
For example, to validate in ECMAScript (JavaScript) that an input is must be either “ab” or “de”, use the regex “^(ab|de)$”. To validate the same thing in Python, use “^(ab|de)\Z” or “\A(ab|de)\Z” (note that the regex pattern is slightly different).
More information is available in the OpenSSF guide Correctly Using Regular Expressions for Secure Input Validation.
Use the “hint” and “give up” buttons if necessary.
Create a regular expression, for use in ECMAScript (JavaScript), that only matches the letters "Y" or "N".
Create a regular expression, for use in ECMAScript (JavaScript), that only matches one or more uppercase Latin letters (A through Z).
Create a regular expression, for use in ECMAScript (JavaScript), that only matches the words "true" or "false".
Create a regular expression that only matches one or more uppercase Latin letters (A through Z). However, this time, do it for Python (not JavaScript).
Create a regular expression that only matches one Latin letter (A through Z), followed by a dash ("-"), followed by one or more digits. This time, do it for Ruby (not JavaScript or Python).
This lab was developed by David A. Wheeler at
The Linux Foundation.