I'm using JTextPane as simple html editor.
jtp=new JTextPane();
jtp.setContentType("text/html;charset=UTF-8");
jtp.setEditorKit(new HTMLEditorKit());
When I call jtp.getText() I get nice html code with all special chars escaped. But I don't want escape national characters (polish) but only special html chars like &, <, >
When I enter in editor
<foo>ą ś &
I get
<foo>ą ś &
but I would like get
<foo>ą ś &
How it is possile?
That's not possible, unfortunately.
There's a flaw inside javax.swing.text.html.HTMLWriter -- it is hardcoded to convert any symbol that is not ASCII to its numeric representation:
default:
if (chars[counter] < ' ' || chars[counter] > 127) {
if (counter > last) {
super.output(chars, last, counter - last);
}
last = counter + 1;
// If the character is outside of ascii, write the
// numeric value.
output("&#");
output(String.valueOf((int)chars[counter]));
output(";");
}
break;
}
This logic cannot be controlled in any way.
BUT If you really really need that functionality you could do the crazy stuff:
- copy and paste HTMLWriter sources into
HTMLWriterHack
(in the same package javax.swing.text.html
and renaming all strings inside)
- Replace the above listed three
output
lines with something like output(String.valueOf(chars[counter]));
- copy and paste HTMLDocument sources into
HTMLDocumentHack
(in the same package javax.swing.text.html
, renaming all strings inside, making it extend HTMLDocument
and removing clashing methods)
- Use the CustomEditorKit listed below instead of HTMLEditorKit
class CustomEditorKit extends HTMLEditorKit {
@Override
public void write(Writer out, Document doc, int pos, int len) throws IOException, BadLocationException {
HTMLWriterHack writer = new HTMLWriterHack(out, (HTMLDocumentHack) doc);
writer.write();
}
@Override
public Document createDefaultDocument() {
StyleSheet styles = getStyleSheet();
StyleSheet ss = new StyleSheet();
ss.addStyleSheet(styles);
HTMLDocumentHack doc = new HTMLDocumentHack(ss);
doc.setParser(getParser());
doc.setAsynchronousLoadPriority(4);
doc.setTokenThreshold(100);
return doc;
}
}
Although the steps above work (I tested it), I certainly wouldn't recommend doing that.
It is not possible, all characters above code 127 are translated to a numeric entity & # number ;. The HTML-entities are translated into named entities & lt ; , and so on. So you may easily resubstitute them. (This is done in HTMLWriter.output, and there seems to be no provision for character sets whatsoever.)