可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I use Owasp Anti samy with Ebay policy file to prevent XSS attacks on my website.

I also use Hibernate search to index my objects.

When I use this code:

String html = "special word: été";    

// use the Ebay configuration file    
Policy policy = Policy.getInstance(xssPolicyFile.getInputStream());

AntiSamy as = new AntiSamy();
CleanResults cr = as.scan(html, policy);

// result is now : "special word: &eacute;t&eacute;"
result = cr.getCleanHTML();

As you can see all chars "é" has been transformed to their html entity equivalent "é"

My page is on UTF-8, so I don't need this transformation. Moreover, when I index this text with Hibernate Search, it indexes the word with html entities, so I can't find word "été" on my index.

How can I force antisamy to not transform special chars to their html entity equivalent ?

thanks

PS: an issue has been opened : http://code.google.com/p/owaspantisamy/issues/detail?id=99

回答1:

I ran into the same problem this morning.

I have encapsulated antisamy in a class and I use apache StringEscapeUtil from apache common-lang to restore special characters.

 CleanResults cleanResults = antiSamy.scan(taintedHtml);
 cleanedHtml = cleanResults.getCleanHTML();  
 return StringEscapeUtils.unescapeHtml(cleanedHtml)

The result is a cleaned up HTML without the HTML escaping of special characters.

Hope this helps.

回答2:

Like Mohamad said it in a comment, Antisamy has just released a new directive named : entityEncodeIntlChars

here is the detail : http://code.google.com/p/owaspantisamy/source/detail?r=240

It seems that this directive solves the problem.

回答3:

After scouring the AntiSamy source code, I found no way of changing this behavior apart from modifying AntiSamy.

回答4:

Check out this one: http://code.google.com/p/owaspantisamy/source/browse/#svn/trunk/dotNet/current/source/owaspantisamy/html/scan

Grab the source and notice that key classes (AntiSamyDOMScanner, CleanResults) use standard framework objects (like XmlDocument). Compile and run with the binary you compiled - so that you can see everything in a debugger - as in which of the major classes actually corrupts your data. With that in hand you'll be able to either change a few properties on major objects to make it stop or inject your own post-processing to revert the wrongdoing (say with a regexp). Latter you can expose that as additional top-level property, say one named NoMess :-)

Chances are that behavior in that respect is different between languages (there's 3 in that trunk) but the same tactics will work no matter which one you have to deal with.