You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have noticed a number of parsing differences between the Masterminds/html5-php parser and the HTML5 specification. We think that the root cause of those issues drills down to the use of PHP’s default parser, loadHTML, DOMImplementation, etc. The lack of HTML5 support by PHP is known and we contacted them asking to make it more clear in the documentation in order to raise awareness for these security issues.
This behavior becomes security-relevant when HTML sanitizers use the Masterminds/html5-php parser. We have come across multiple PHP sanitizers that are vulnerable to bypasses due to using Masterminds/html5-php.
Exploitation
Here are examples of the differentials, and how attackers can leverage these in order to bypass sanitizers.
Comments:
According to the XML specification (XHTML), comments must end with the characters -->.
On the other hand, the HTML specification states that a comment's text 'must not start with the string >, nor start with the string ->'.
When parsing the following string in a browser, the comment will end before the p tag. But when parsing with Masterminds/html5-php the p tag will be considered a comment:
An attacker can input the following payload <!---><xss>-->. While the parser considers the xss tag as a comment, the browser will end the comment right before and render the xss tag as expected.
Processing instructions (PI) elements (known, but we encounter sanitizer bypasses due to this)
Processing instructions elements exist in XML specification but in HTML5 the characters <? opens a comment and ends it at the first occurrence of greater than >.
Attackers can create the following Processing Instruction <?xml >s<img src=x onerror=alert(1)> ?> and while no img tag is rendered in Masterminds/html5-php the browser will create a comment and end it at the first > character, rendering the img tag.
Foreign content elements
HTML5 introduced two foreign elements (math and svg) which follow different parsing specifications than HTML. Masterminds/html5-php doesn’t take it into account, causing other parsing differentials and sanitizers bypass such as:
<svg><p><style><!--</style><xss>--></style>
noscript element
Depending if scripting is enabled (enabled by default in browsers) the noscript element parses its content differently:
If scripting is enabled, then the content is rendered as raw data
If scripting is disabled, then the content is rendered as HTML
Masterminds/html5-php parses according to disabled scripting, which is different than the default browsers’ parsing.
This is not wrong per se, but still can cause some mXSS such as: <noscript><p alt="</noscript><img src=x onerror=alert(1)>">
The text was updated successfully, but these errors were encountered:
I'm not familiar with the code base but I think there are some simple mitigations such as
For the noscript element: it might be as simple as changing the type here to RAW_TEXT
Disabling the Processing instructions feature
The other two issues are more complicated especially the Foreign content. For example, you can change the elements here to not include RAW_TEXT but I haven't seen any consideration of HTML integration points or other parsing rules (feel free to take a look at the mxss-cheatsheet for some quirks of parsing that might be relevant).
AFAIK there is an initiative from PHP to support HTML5. Unfortunately, I don't have the capacity to provide a comprehensive fix :/
Observations
We have noticed a number of parsing differences between the Masterminds/html5-php parser and the HTML5 specification. We think that the root cause of those issues drills down to the use of PHP’s default parser, loadHTML, DOMImplementation, etc. The lack of HTML5 support by PHP is known and we contacted them asking to make it more clear in the documentation in order to raise awareness for these security issues.
This behavior becomes security-relevant when HTML sanitizers use the Masterminds/html5-php parser. We have come across multiple PHP sanitizers that are vulnerable to bypasses due to using Masterminds/html5-php.
Exploitation
Here are examples of the differentials, and how attackers can leverage these in order to bypass sanitizers.
Comments:
According to the XML specification (XHTML), comments must end with the characters
-->
.On the other hand, the HTML specification states that a comment's text 'must not start with the string
>
, nor start with the string->
'.When parsing the following string in a browser, the comment will end before the
p
tag. But when parsing withMasterminds/html5-php
thep
tag will be considered a comment:<!---><p>
<!----><p></p>
<!---><p>-->
An attacker can input the following payload
<!---><xss>-->
. While the parser considers thexss
tag as a comment, the browser will end the comment right before and render thexss
tag as expected.Processing instructions (PI) elements (known, but we encounter sanitizer bypasses due to this)
Processing instructions elements exist in XML specification but in HTML5 the characters
<?
opens a comment and ends it at the first occurrence of greater than>
.Attackers can create the following Processing Instruction
<?xml >s<img src=x onerror=alert(1)> ?>
and while noimg
tag is rendered in Masterminds/html5-php the browser will create a comment and end it at the first>
character, rendering theimg
tag.Foreign content elements
HTML5 introduced two foreign elements (math and svg) which follow different parsing specifications than HTML. Masterminds/html5-php doesn’t take it into account, causing other parsing differentials and sanitizers bypass such as:
<svg><p><style><!--</style><xss>--></style>
noscript
elementDepending if scripting is enabled (enabled by default in browsers) the
noscript
element parses its content differently:Masterminds/html5-php parses according to disabled scripting, which is different than the default browsers’ parsing.
This is not wrong per se, but still can cause some mXSS such as:
<noscript><p alt="</noscript><img src=x onerror=alert(1)>">
The text was updated successfully, but these errors were encountered: