I have a table with ID,TEXT,etc columns Here TEXT is clob column which contains data in HTML FORMAT
SELECT ID,TEXT FROM TABLE WHERE ID=1000
I'm getting output for text column as below
<p>NAME: XXX<br />Company Name: YYYYY<br />Location: ZZZ, 22 Z1Z1Z1, Z2Z2Z2,Z3Z3Z3, 0000024, IND<br />Type: PrePA<br />
Team: Team1, Dues tamble <br />Date: January 25 – 26, 2016<br />Rating: Tr 2<br />Number: 8554342</p>
<p><u>Observ: <br /></u>There were (6) major and (2) minor .<br /> <br />MAJOR</p><ul> <li>Sample Text_1.</li> <li>Sample Text_2.</li>
<li>Sample Text_33.</li> <li>Sample Text_4.</li> <li>Sample Text_5.</li></ul><p>MINOR</p><ul> <li>Sample Text_7</li>
<li>Sample Text_8<br /> </li></ul><p><b> Background</b><br /> </p>
I need output like below
NAME: XXX
Company Name: YYYYY
Location: ZZZ, 22 Z1Z1Z1, Z2Z2Z2,Z3Z3Z3, 0000024, IND
Type: PrePA
Team: Team1, Dues tamble
Date: January 25 – 26, 2016
Rating: Tr 2
Number: 8554342
Observ:
There were (6) major and (2) minor .
MAJOR
Sample Text_1.
Sample Text_2.
Sample Text_33.
Sample Text_4.
Sample Text_5.
MINOR
Sample Text_7
Sample Text_8
Background
Which means the query should handle HTML data and should provide out like exact data like above.
I have a query beolw which handling somwhat okey but it's not handling the
exactly the HTML tags.
select ID,
trim(regexp_replace (trim
(regexp_replace( TEXT ,'<[^>]+/>|</[^>]+>|<p>',CHR(13)||CHR(10)))
,'<[^>]+>',''))
from TABLE where ID='1000'
Acually i need to handle all the html tags in the data not only the which i showed is sample.
Create a (simple) HTML parser.
Oracle Setup:
Query:
Output:
Searching for a 'perfect' query might be a lost cause; browser (including text browsers, as recommended elsewhere) have spent years working out all the kinks and edge cases.
If you really can't use an external resource you could walk the DOM via the
dbms_xmldom
package. Here's a demo using an anonymous block and your sample value, which prints to the screen (if you have that enabled); but you could easily adapt it in to a function that returns a CLOB, or whatever you need as an end result.Which produces:
Consider running the HTML through
lynx -stdin
to render it as plain text. Lynx is a web browser for text consoles.