I have a following sample sgml data from my .sgm file and I want convert this in to xml
<?dtd name="viewed">
<?XMLDOC>
<viewed >xyz
<cite>
<yr>2010
<pno cite="2010 abc 1188">10
<?/XMLDOC>
<?XMLDOC>
<viewed>abc.
<cite>
<yr>2010
<pno cite="2010 xyz 5133">9
<?/XMLDOC>
Output should be like this:
<index1>
<num viewed="xyz"/>
<heading>xyz</heading>
<index-refs>
<link caseno="2010 abc 1188</link>
</index-refs>
</index-1>
<index1>
<num viewed="abc"/>
<heading>abc</heading>
<index-refs>
<link caseno="2010 xyz 5133</link>
</index-refs>
</index-1>
Can this be done in c# or can we use xslt 2.0 to do this kind of conversion?
Can the SGML-Reader, originally developed by Chris Lovett help in solving this problem?
Please take a look at some suggestions for SGML -> XML conversion I posted on this question:
Strategy for parsing LOTS and LOTS of not-so-well formed SGML / XML documents
Others have already given some good advice. Here's one way of putting it all together by first converting the input SGML to well-formed XML and then using XSLT to transform that to the exact format you need.
Converting your SGML to well-formed XML
The
osx
tool from the OpenSP package suggested by mzjn is a good tool for this. Since your SGML markup omits end tags, you need to have a DTD from which the correct nesting of elements can be determined. If you don't have a DTD, you need to create one. For your example input, it could be as simple as this:You also need to add a proper doctype declaration to the beginning of your SGML file. Assuming you have your DTD in file
viewed.dtd
.With this addition, you should now be able use
osx
to convert the SGML to XML. (It won't be able to convert the processing instructions which start with a/
as those are not allowed in XML, and will emit a warning about them.)Transforming the resulting XML to your desired format
For the above case, you could use something like the following XSLT stylesheet:
Why XSLT? I doubt you can map SGML to XML Infoset or XDM...
I think that you should better use the language made for this task: DSSSL (Document Style Semantics and Specification Language)
This is the predecessor of XSLT. The author is James Clark. And this is the his site.
Maybe you can use the osx SGML to XML converter. It is part of the OpenSP package (based on SP, originally written by James Clark).