Context
Serialization involves converting complex data structures, such as objects and their fields, into a simpler format that can be transmitted as a stream of bytes. This process greatly simplifies writing complex data to various storage sources and facilitates the sending of complex data over networks and between different parts of an application.
When an object is serialized, its state is preserved, meaning that its attributes and their values are also stored. To restore this object to its original state, deserialization is used to recreate the initial object from the stream of bytes.
Many programming languages support serialization, and the exact method of serialization depends on the language used. Objects can be serialized in binary or string formats, with different degrees of human readability. It is important to note that all attributes of the original object are stored in the serialized data stream, including private fields, unless they are explicitly marked as "transient" in the class declaration.
When working with different programming languages, you may encounter terms such as "marshalling" (for Ruby) or "pickling" (for Python) which are synonymous with serialization in this context.
Unsecure Deserialization
Unsecure deserialization is a security vulnerability that can have serious consequences for websites. This vulnerability occurs when user-controllable data is deserialized, meaning it is transformed into objects or data structures from a binary stream or string.
When such deserialization is performed without sufficient security checks, an attacker can exploit this vulnerability to inject malicious data into the application. This injection can be done by manipulating serialized objects, which can result in unauthorized code execution, privilege escalation, unauthorized access to sensitive data, or even a denial of service attack.
The magnitude of the potential consequences of unsecure deserialization is due to several factors. Firstly, deserializing an object can allow the attacker to modify the object's state and thus manipulate the application code. In addition, when an object is deserialized, all attributes of the object are also deserialized, including private fields. This means that the attacker can exploit objects from any class available on the website, even if that class is not expected by the application.
Another important factor is the difficulty of validating deserialized data. Data can be validated and cleaned before deserialization, but it is difficult to anticipate all eventualities. Controls performed after deserialization may also be ineffective as the attack can be launched before validation or cleaning is performed.
Furthermore, unsecure deserialization is often caused by a lack of understanding on the part of developers of the importance of deserialization security. Website owners may also think they are protected because they have implemented additional security controls on deserialized data. However, these controls are often ineffective as it is almost impossible to anticipate all eventualities and clean all malicious data.
To prevent unsecure deserialization, it is important to implement appropriate security controls, such as validating and cleaning data before deserialization, limiting the classes that can be deserialized, and implementing access controls for deserialized objects. It is also important to raise awareness among developers of the importance of deserialization security and to provide adequate training in this regard.
How to prevent vulnerabilities
Deserialization of user input should be avoided as much as possible, except in cases of absolute necessity. The risks of this operation are high and the benefits are often minimal.
If you must deserialize data from untrusted sources, it is essential to implement strong measures to verify that the data has not been tampered with. For example, you can implement a digital signature to verify data integrity. However, it is crucial to remember that these checks must take place before the deserialization process begins. Otherwise, they will be of no use.
Whenever possible, avoid using generic deserialization methods. Data serialized using these methods contain all attributes of the original object, including private fields, which may contain sensitive information. Instead, you can create class-specific serialization methods to at least control the exposed fields.
Finally, remember that the vulnerability lies in the deserialization of user input and not in the presence of gadget chains that process the data afterwards. It is not practical to try to eliminate all identified gadget chains during testing, as there is often a network of dependencies between libraries on your website. Additionally, there is always a risk of vulnerability caused by publicly documented memory corruption exploits, which means that your application may still be vulnerable even if you have taken all necessary precautions.
In summary, deserialization of user input should be used with caution and, if possible, avoided. If absolutely necessary, you should implement strong security measures to protect against potential vulnerabilities.