i am using this code to Download the Xml file.
String url="https://www.sec.gov/Archives/edgar/data/16160/000001616016000061/calm-20160528.xml";
String fileName = url.substring(url.lastIndexOf("/") + 1,
url.length());
String completeFileLocationWithName="/home/user/Downloads/XBRLCODE/"+fileName;
URL surl = new URL(url);
con = surl.openConnection();
con.setConnectTimeout(0);
con.setReadTimeout(0);
InputStream in = con.getInputStream();
Files.copy(in, Paths.get(completeFileLocationWithName));*/
and also tried with String escapedInput = StringEscapeUtils.escapeXml(appNameInput);
INPUT is : URL
OUTPUT is Upon Downloading XML, it should not have above characters like <
, >
, &
etc - instead < , > ,& would be fine for me..
Please anyone share the knowledge on this..
I think you're misunderstanding the problem slightly. Your XML here contains embedded HTML (itself with embedded CSS, as it happens).
To be included in that node, those characters have to be escaped, otherwise the overall XML would be invalid (
<
,>
,&
etc are all reserved entities in XML).If you mean you want the results of that XML node (
us-gaap:FiscalPeriod
) unescaped, then you should extract its string value and then use something likeStringEscapeUtils.unescapeHtml
as already suggested.Depending on what you're trying to do, you might want to go further and strip all HTML tags from the output anyway.
The following seems to work.
Use StringEscapeUtils from commons-lang.jar library.
Here is working code:
Output is without escaped characters, here is sample from console:
Keep on mind that you need: