Is Oracle's EXTRACT function breaking the NOEN

2019-02-17 20:30发布

问题:

Oracle 11g. I figured out that if I add NOENTITYESCAPING to the XMLELEMENT function, it nicely turns off entity escaping. However, when I then pass the result to EXTRACT the escaping seems to come back again.

select xmlelement(NOENTITYESCAPING e,id,'->') 
  from (select level as id 
          from dual 
       connect by level < 6)      

XMLELEMENT(NOENTITYESCAPINGE,ID,'->')
---------------------------------------
<E>1-></E>
<E>2-></E>
<E>3-></E>
<E>4-></E>
<E>5-></E>

Now, adding EXTRACT:

select xmlelement(NOENTITYESCAPING e,id,'->').extract('//text()')
  from (select level as id 
          from dual 
       connect by level < 6)

XMLELEMENT(NOENTITYESCAPINGE,ID,'->').EXTRACT('//TEXT()')
----------------------------------------------------------
1-&gt;
2-&gt;
3-&gt;
4-&gt;
5-&gt;

Any fixes/workarounds to keep the escaping switched off? The manual gives no help.

回答1:

Try to use extractvalue() function, which unescapes encoded entities, instead of extract(). Here is an example:

clear screen;
column res format a20;

-- depending on a situation, NOENTITYESCAPING might be dropped

select extractvalue(
                     xmlelement(NOENTITYESCAPING e,id,'->')
                    , '//text()'
                    ) as res
  from (select level as id 
          from dual 
       connect by level < 6)

Result:

RES                
--------------------
1->                  
2->                  
3->                  
4->                  
5->    

But the use of extractvalue() function may be limited by the fact that it can return value of only one node. In a case of returning values of multiple nodes the utl_i18n package, and unescape_reference() function of that package can be used to unescape encoded entities:

clear screen;
column res format a20;

select utl_i18n.unescape_reference(xmlelement(root
                                             , xmlelement(node1, '>')
                                             , xmlelement(node2, '<')
                                             ).extract('//text()').getstringval()
                                   ) as res
 from dual
connect by level <= 3;

Result:

RES                
--------------------
><                   
><                   
>< 

Yes, as utl_i18n.unescape_reference() function accepts only values of varchar2 data type and types that can be implicitly converted to the varchar2 data type, your hands are tied when it comes to processing large "strings". In this situation you may turn to dbms_xmlgen package and convert() function in particular, which has an overloaded version capable of accepting CLOBs. Here is an example:

select dbms_xmlgen.convert(
                           xmlagg(xmlelement(root
                                             , xmlelement(node1, '>')
                                             , xmlelement(node2, '<')
                                             )
                                  ).extract('//text()').getclobval()
                          , 1) as res
 from dual
connect by level <= 3000;   -- 1 (second parameter of the convert() function)
                            -- instructs function to decode entities  

Result:

RES
------------------------------------------------------
><><><><><><><><><><><><><><><><><><><><><><><><><>
-- ... the rest of the CLOB