The pickle module is known to be vulnerable [docs.python.org 2023] against unwanted code execution during deserialization and should only be used if there is no architectural text-based alternative.
Even if data has been created from a trusted source we need to verify that it has not been tampered with during transport.
Security-related concerns during object serialization and deserialization include:
JSON or YAML if possible.Base64 encoding for binary dataxmlrpc.client for network operations that are already XML based.hmac.The noncompliant01.py  code demonstrates arbitrary code execution [Checkoway Oct 2013] using os.system to launch a program during unpickling when pickle.loads().
""" Non-Compliant Code Example """
import platform
import pickle
class Message(object):
    """Sample Message Object"""
    sender_id = 42
    text = "Some text"
    def printout(self):
        """prints content to stdout to demonstrate active content"""
        print(f"Message:sender_id={self.sender_id} text={self.text}")
class Preserver(object):
    """Demonstrating deserialisation"""
    def can(self, _message: Message) -> bytes:
        """Serializes a Message object.
            Parameters:
                _message (Message): Message object
            Returns:
                _jar (bytes): pickled jar as string
        """
        return pickle.dumps(_message)
    def uncan(self, _jar) -> Message:
        """De-serializes a Message object.
            Parameters:
                _jar (String): Pickled jar
            Returns:
                (Message): Message object
        """
        return pickle.loads(_jar)
# serialization of a normal package
p1 = Preserver()
message = Message()
message.printout()
jar = p1.can(message)
# sending or storing would happen here
p2 = Preserver()
message = None
message = p2.uncan(jar)
message.printout()
#####################
# exploiting above code example
#####################
print("-" * 10)
print("Attacker trying to read the message")
message = pickle.loads(jar)
message.printout()
print("-" * 10)
if platform.system() == "Windows":
    PAYLOAD = b"""cos
system
(S'calc.exe'
tR."""
else:
    PAYLOAD = b"""cos
system
(S'whoami;uptime;uname -a;ls -la /etc/shadow'
tR."""
print("Attacker trying to inject PAYLOAD")
p3 = Preserver()
message = None
message = p3.uncan(PAYLOAD)
The deserializating Preserver.uncan() method has no solution to verify the content prior to unpickling it and runs the PAYLOAD even before turning it into an object. On Windows you have calc.exe  and on Unix a bunch of commands such as uname -a and ls -la /etc/shadow.
[!CAUTION] The
example01.pycode only demonstrates integrity protection with hmac. The pickled object is not encrypted and key-handling is inappropriate! Consider using proper key management withx509and encryption [pyca/cryptography 2023].
""" Compliant Code Example """
import hashlib
import hmac
import platform
import pickle
import secrets
class Message(object):
    """Sample Message Object"""
    sender_id = 42
    text = "Some text"
    def printout(self):
        """prints content to stdout to demonstrate active content"""
        print(f"Message:sender_id={self.sender_id} text={self.text}")
class Preserver(object):
    """Demonstrating deserialisation"""
    def __init__(self, _key):
        self._key = _key
    def can(self, _message: Message) -> tuple:
        """Serializes a Message object.
            Parameters:
                _message (Message): Message object
            Returns:
                _digest (String): HMAC digest string
                _jar (bytes): pickled jar as string
        """
        _jar = pickle.dumps(_message)
        _digest = hmac.new(self._key, _jar, hashlib.sha256).hexdigest()
        return _digest, _jar
    def uncan(self, _expected_digest, _jar) -> Message:
        """Verifies and de-serializes a Message object.
            Parameters:
                _expected_digest (String): Message HMAC digest
                _jar (bytes): Pickled jar
            Returns:
                (Message): Message object
        """
        _digest = hmac.new(self._key, _jar, hashlib.sha256).hexdigest()
        if _expected_digest != _digest:
            raise ValueError("Integrity of jar compromised")
        return pickle.loads(_jar)
# serialization of a normal package
key = secrets.token_bytes()
print(f"key={key}")
p1 = Preserver(key)
message = Message()
message.printout()
digest, jar = p1.can(message)
# sending or storing would happen here
p2 = Preserver(key)
message = None
message = p2.uncan(digest, jar)
message.printout()
#####################
# exploiting above code example
#####################
print("-" * 10)
print("Attacker trying to read the message")
message = pickle.loads(jar)
message.printout()
print("-" * 10)
if platform.system() == "Windows":
    PAYLOAD = b"""cos
system
(S'calc.exe'
tR."""
else:
    PAYLOAD = b"""cos
system
(S'whoami;uptime;uname -a;ls -la /etc/shadow'
tR."""
print("Attacker trying to inject PAYLOAD")
p3 = Preserver(b"dont know")
message = None
message = p3.uncan(digest, PAYLOAD)
The integrity verification in compliant01.py throws an exception ValueError: Integrity of jar compromised prior to deserializationunpickling to prevent the PAYLOAD executed.
Text-based formats, such as JSON and YAML, should always be preferred. They have a lower set of capabilities and reduce the attack surface [python.org comparison-with-json 2023] when compared to pickle.
The compliant01.py  code only allows serializing and deserialization of object data and not object methods as in noncompliant01.py or compliant01.py.
Consider converting binary data into text using Base64 encoding for performance and size irrelevant operations.
""" Compliant Code Example """
import platform
import json
 
 
class Message(object):
    """Sample Message Object"""
    sender_id = int()
    text = str()
 
    def __init__(self):
        self.sender_id = 42
        self.text = "Some text"
 
    def printout(self):
        print(f"sender_id: {self.sender_id}\ntext: {self.text}")
 
 
class Preserver(object):
    """Demonstrating deserialisation"""
 
    def can(self, _message: Message) -> str:
        """Serializes a Message object.
            Parameters:
                _message (Message): Message object
            Returns:
                _jar (bytes): jar as string
        """
        return json.dumps(vars(_message))
 
    def uncan(self, _jar) -> Message:
        """Verifies and de-serializes a Message object.
            Parameters:
                _jar (String): Pickled jar
            Returns:
                (Message): Message object
        """
        j = json.loads(_jar)
        _message = Message()
        _message.sender_id = int(j["sender_id"])
        _message.text = str(j["text"])
        return _message
 
 
# serialization of a normal package
p1 = Preserver()
message = Message()
jar = p1.can(message)
print(jar)
print(type(json.loads(jar)))
 
# sending or storing would happen here
p2 = Preserver()
message = None
message = p2.uncan(jar)
message.printout()
print(message.sender_id)
 
#####################
# exploiting above code example
#####################
print("-" * 10)
print("Attacker trying to read the message")
print(jar)
message.printout()
 
print("-" * 10)
if platform.system() == "Windows":
    PAYLOAD = b"""cos
system
(S'calc.exe'
tR."""
else:
    PAYLOAD = b"""cos
system
(S'whoami;uptime;uname -a;ls -la /etc/shadow'
tR."""
print("Attacker trying to inject PAYLOAD")
p3 = Preserver()
message = None
message = p3.uncan(PAYLOAD)
The compliant02.py stops with the unpacking with a json.decoder.JSONDecodeError.
Serialized data from a trusted input source does not require sanitization, provided that the code clearly documents that it relies on the input source being trustworthy. For example, if a library is being audited, a routine of that library may have a documented precondition that its callers pre-sanitize any passed-in serialized data or confirm the input source as trustworthy.
| Tool | Version | Checker | Description | 
|---|---|---|---|
| Bandit | 1.7.4 | B301 | Pickle and modules that wrap it can be unsafe when used to de-serialize untrusted data, possible security issue.Bandit can only detect a pickle module in use and is unable to detect an acceptable implementation code that combines pickle with hmacand proper key management. | 
| Product | CVE | Description | CVSS Rating | Comment | 
|---|---|---|---|---|
| TensorFlow using the pickle module | CVE-2021-37678 | TensorFlow machine learning platform allows code execution when de-serializing a Keras model from YAMLformat. | v3.1: 8.8 High | |
| NVFLARE < 2.1.4 | CVE-2022-34668 | Deserialization of Untrusted Data with Pickle may allow an unprivileged network attacker to cause Remote Code Execution (RCE). | v3.1: 9.8 Critical | Exploit available on exploit-db.com | 
| Graphite 0.9.5 through 0.9.10 | CVE-2013-5093 | The renderLocalView function in render/views.py uses the pickle Python module unsafely, which allows remote attackers to execute arbitrary code via a crafted serialized object | n/a | Exploit available on exploit-db.com | 
| Superset prior to 0.23 | CVE-2018-8021 | TUnsafe load method from the pickle library to deserialize data leading to possible RCE | v3.1: 9.8 Critical | Exploit available on exploit-db.com | 
| rpc.py through 0.6.0 | CVE-2022-35411 | HTTP HEADERS set to "serializer: pickle"triggersrcp.pyto de-serialize withpickleinstead of the defaultJSONallowing Allows Remote Code Execution | v3.1:9.8 Critical | Exploit available on https://github.com/ | 
| [docs.python.org 2023] | pickle — Python object serialization. Available from: https://docs.python.org/3.9/library/pickle.html [Accessed 07 May 2024] | 
| [python.org comparison-with-json 2023] | pickle - Comparison with JSON. Available from: https://docs.python.org/3.9/library/pickle.html#comparison-with-json [Accessed 07 May 2024] | 
| [pyca/cryptography 2023] | Welcome to pyca/cryptography. Available from: https://cryptography.io/en/latest/ [Accessed 07 May 2024] |