Avoid Incomplete ‘deny lists’ that can lead to security vulnerabilities such as cross-site scripting (XSS) by using ‘allow lists’ instead.
The noncompliant01.py
code demonstrates the difficult handling of exclusion lists in a multi language support use case. UTF-8
has 1,112,064 mappings between 8-32
bit values and printable characters such as 生
known as “code points”.
The noncompliant01.py
filterString()
method attempts to search for disallowed inputs and fails to find the script
tag due to the non-English character 生
in <script生>
.
# SPDX-FileCopyrightText: OpenSSF project contributors
# SPDX-License-Identifier: MIT
"""Compliant Code Example"""
import re
import sys
if sys.stdout.encoding.lower() != "utf-8":
sys.stdout.reconfigure(encoding="UTF-8")
def filter_string(input_string: str):
"""Normalize and validate untrusted string
Parameters:
input_string(string): String to validate
"""
# TODO Canonicalize (normalize) before Validating
# validate, exclude dangerous tags:
for tag in re.findall("<[^>]*>", input_string):
if tag in ["<script>", "<img", "<a href"]:
raise ValueError("Invalid input tag")
#####################
# attempting to exploit above code example
#####################
names = [
"YES 毛泽东先生",
"YES dash-",
"NOK <script" + "\ufdef" + ">",
"NOK <script生>",
]
for name in names:
print(name)
filter_string(name)
The compliant01.py
uses an allow list instead of a deny list and prevents the use of unwanted characters by raising an exception even without canonicalization. The missing canonicalization in compliant01.py
according to CWE-180: Incorrect Behavior Order: Validate Before Canonicalize must be added in order to make logging or displaying them safe!
# SPDX-FileCopyrightText: OpenSSF project contributors
# SPDX-License-Identifier: MIT
"""Compliant Code Example"""
import re
import sys
if sys.stdout.encoding.lower() != "utf-8":
sys.stdout.reconfigure(encoding="UTF-8")
def filter_string(input_string: str):
"""Normalize and validate untrusted string
Parameters:
input_string(string): String to validate
"""
# TODO Canonicalize (normalize) before Validating
# validate, only allow harmless tags
for tag in re.findall("<[^>]*>", input_string):
if tag not in ["<b>", "<p>", "</p>"]:
raise ValueError("Invalid input tag")
# TODO handle exception
#####################
# attempting to exploit above code example
#####################
names = [
"YES 毛泽东先生",
"YES dash-",
"NOK <script" + "\ufdef" + ">",
"NOK <script生>",
]
for name in names:
print(name)
filter_string(name)
The compliant01.py
detects the unallowed character correctly and throws a ValueError
exception. An actual production solution would also need to canonicalize and handle the exception correctly.
Example compliant01.py output:
/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py
$ python3 compliant01.py
YES 毛泽东先生
YES dash-
NOK <script>
Traceback (most recent call last):
File "/workspace/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py", line 38, in <module>
filter_string(name)
File "/workspace/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py", line 23, in filter_string
raise ValueError("Invalid input tag")
ValueError: Invalid input tag
According to Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b], \uFFFD
is usually unproblematic, as a replacement for unwanted or dangerous characters. That is, \uFFFD
will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available.
Tool | Version | Checker | Description |
---|---|---|---|
Bandit | 1.7.4 on Python 3.10.4 | Not Available | |
Flake8 | 8-4.0.1 on Python 3.10.4 | Not Available |
[Unicode 2024] | Unicode 16.0.0 [online]. Available from: https://www.unicode.org/versions/Unicode16.0.0/ [accessed 20 March 2025] |
[Davis 2008b] | Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 “Deletion of Code Points” [online]. Available from: https://www.unicode.org/reports/tr36/ [accessed 20 March 2025] |
[Davis 2008b] | Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 “Deletion of Code Points” [online]. Available from: https://www.unicode.org/reports/tr36/ [accessed 20 March 2025] |