Is URLEncoder.encode(string, “UTF-8”) a poor valid

2019-06-02 23:47发布

问题:

In a portion of my J2EE/java code, I do a URLEncoding on the output of getRequestURI() to sanitize it to prevent XSS attacks, but Fortify SCA considers that poor validation.

Why?

回答1:

The key point is that you need to convert HTML special characters to HTML entities. This is also called "HTML escaping" or "XML escaping". Basically, the characters <, >, ", & and ' needs to be replaced by &lt;, &gt;, &quot;, &amp; and &#39;.

URL encoding does not do that. URL encoding converts URL special characters to percent-encoded values. This is not HTML escaping.

In case of web applications, HTML escaping is normally to be done in the view side, exactly there where you're redisplaying user-controlled input. In case of a Java EE web applications, that depends on the view technology you're using.

  1. If the webapp is using modern Facelets view technology, then you don't need to escape it yourself. Facelets will already implicitly do that.

  2. If the webapp is using legacy JSP view technology, then you need to ensure that you're using JSTL <c:out> tag or fn:escapeXml() function to redisplay user-controlled input.

    <c:out value="${bean.foo}" />
    <input type="text" name="foo" value="${fn:escapeXml(param.foo)}" />
    
  3. If the webapp is very legacy or bad designed and using servlets or scriptlets to print HTML, then you've a bigger problem. There are no builtin tags or functions, let alone Java methods which can escape HTML entities. You should either write some escape() method yourself or use the Apache Commons Lang StringEscapeUtils#escapeHtml() for this. Then you need to ensure that you're using it everywhere you're printing user-controlled input.

    out.print("<p>" + StringEscapeUtils.escapeHtml(request.getParameter("foo")) + "</p>");
    

    Much better would be to redesign that legacy webapp to use JSP with JSTL.



回答2:

URL encoding does not affect certain significant characters including single quote (') and parentheses, so URL encoding will pass through unchanged certain payloads.

For example,

onload'alert(String.fromCharCode(120))'

will be treated by some browsers as a valid attribute that can result in code execution when injected inside a tag.

The best way to avoid XSS is to treat all untrusted inputs as plain text, and then when composing your output, properly encode all plain text to the appropriate type on output.

If you want to filter inputs as an additional layer of security, make sure your filter treats all quotes (including back-tick) and parentheses as possible code, and disallow them unless the make sense for that input.