More and more, the data passed around the Internet is "rich", meaning that it contains markup and the data is intended to be parsed, rendered, and sometimes executed. Ensuring that this rich data does not contain any malicious instructions is extremely difficult. Nowhere is this problem more significant than in HTML, the worst scramble of code and data of all time.
In this page, you can enter whatever rich markup you like in the textarea. When you submit it, it is returned for rendering on the right. This page a very weak filter that disallows the string <script>. Try entering some markup to see if you can bypass this filtering.
EXAMPLE: <script >alert(document.cookie)</script> - note the space before the >
EXAMPLE: </textarea><script>alert(document.cookie)</script>
ESAPI protects against scripts embedded in rich data in a new way. Rather than trying to search through the input for dangerous characters and patterns, ESAPI uses another OWASP project called AntiSamy, which fully parses the rich content and has an extensive set of rules for which tags and attributes are allowed.
The difficulty of verifying whether rich content contains attacks is increasing rapidly as we use more and more complex formats. Using a robust parser and a whitelist set of rules is the right approach for detecting and preventing attacks. In addition to HTML, AntiSamy supports CSS, which is particularly challenging to parse and validate.
To use ESAPI to validate rich HTML data, use the following approach:
String safeMarkup = ESAPI.validator().getValidSafeHTML( "input",
request.getParameter( "input" ), 2500, true );
// store, use, or render the safeMarkup