How do I include &, <, > etc in XML attribute v

2019-01-25 01:31发布

I want to create an XML file which will be used to store the structure of a Java program. I am able to successfully parse the Java program and create the tags as required. The problem arises when I try to include the source code inside my tags, since Java source code may use a vast number of entity reference and reserved characters like &, < ,> , &. I am not able to create a valid XML.

My XML should go like this:

<?xml version="1.0"?>
<prg name="prg_name">
  <class name= "class_name>
    <parent>parent class</parent>
      <interface>Interface name</interface>
.
.
.
      <method name= "method_name">
        <statement>the ordinary java statement</statement>
        <if condition="Conditional Expression">
          <statement> true statements </statement>
        </if>
        <else>
          <statement> false statements </statement>
        </else>
        <statement> usual control statements </statement>
 .
 .
 .
      </method>
    </class>
 .
 .
 .
 </prg>

Like this, but the problem is conditional expressions of if or other statements have a lot of & or other reserved symbols in them which prevents XML from getting validated. Since all this data (source code) is given by the user I have little control over it. Escaping the characters will be very costly in terms of time.

I can use CDATA to escape the element text but it can not be used for attribute values containing conditional expressions. I am using Antlr Java grammar to parse the Java program and getting the attributes and content for the tags. So is there any other workaround for it?

2条回答
够拽才男人
2楼-- · 2019-01-25 02:28

You will have to escape

" to  &quot;
' to  &apos;
< to  &lt;
> to  &gt;
& to  &amp;

for xml.

查看更多
看我几分像从前
3楼-- · 2019-01-25 02:32

In XML attributes you must escape

" with &quot;
< with &lt;
& with &amp;

if you wrap attribute values in double quotes ("), e.g.

<MyTag attr="If a&lt;b &amp; b&lt;c then a&lt;c, it's obvious"/>

meaning tag MyTag with attribute attr with text If a<b & b<c then a<c, it's obvious - note: no need to use &apos; to escape ' character.

If you wrap attribute values in single quotes (') then you should escape these characters:

' with &apos;
< with &lt;
& with &amp;

and you can write " as is. Escaping of > with &gt; in attribute text is not required, e.g. <a b=">"/> is well-formed XML.

查看更多
登录 后发表回答