I need to create XML documents but want to automate the process using Excel because I can't copy and paste hundreds of values into the XML document. I am new to this feature of Excel though and am having trouble with the layout of the elements. This is an example of the final XML document:
<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
<ANALYSIS alias="Test Analysis 1" center_name="NeurOmics" broker_name="EGA" xmlns:com="SRA.common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<TITLE>Bed data hsp 1</TITLE>
<DESCRIPTION>Test of NeurOmics Tuebingen HSP Data 1</DESCRIPTION>
<STUDY_REF accession="EGAS00001000697"></STUDY_REF>
<SAMPLE_REF accession="EGAN00001409012"></SAMPLE_REF>
<RUN_REF accession="EGAR00001446342"></RUN_REF>
<SAMPLE_REF accession="EGAN00001409040"></SAMPLE_REF>
<RUN_REF accession="EGAR00001446343"></RUN_REF>
<SAMPLE_REF accession="EGAN00001404184"></SAMPLE_REF>
<RUN_REF accession="EGAR000016818343"></RUN_REF>
<SAMPLE_REF accession="EGAN0000148740"></SAMPLE_REF>
<RUN_REF accession="EGAR00001431643"></RUN_REF>
<ANALYSIS_TYPE>
<SEQUENCE_ANNOTATION/>
</ANALYSIS_TYPE>
<FILES>
<FILE filename="TUE-hpHSPv2_TargetRegions.bed" filetype="bed"/>
</FILES>
</ANALYSIS>
<ANALYSIS alias="Test Analysis 2" center_name="NeurOmics" broker_name="EGA" xmlns:com="SRA.common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<TITLE>BED data ataxia 1</TITLE>
<DESCRIPTION>Test of NeurOmics Tuebingen Ataxia Data 1</DESCRIPTION>
<STUDY_REF accession="EGAS00001000694"></STUDY_REF>
<SAMPLE_REF accession="EGAN00001408934"></SAMPLE_REF>
<RUN_REF accession="EGAR00001446272"></RUN_REF>
<SAMPLE_REF accession="EGAN00001408935"></SAMPLE_REF>
<RUN_REF accession="EGAR00001446271"></RUN_REF>
<ANALYSIS_TYPE>
<SEQUENCE_ANNOTATION/>
</ANALYSIS_TYPE>
<FILES>
<FILE filename="TUE-hpSCAv1_TargetRegions.bed" filetype="bed"/>
</FILES>
</ANALYSIS>
The part I'm having trouble with are the elements SAMPLE_REF
and RUN_REF
as within each ANALYSIS
section there can be any number of these paired values. The rest I think I can figure out, for now I'm just trying to get that part in a format I can copy into the XML document as that's the part that will take all the time. Anyway, I set up this XSD
file for Excel:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="ANALYSIS_SET">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="ANALYSIS" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<!-- single sample information -->
<xsd:element name="refname" type="xsd:string"/>
<xsd:element name="accession" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
I then link this to an Excel spreadsheet which has a column for refname
and a column for accession
, and then export to XML. This is what I get:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ANALYSIS_SET>
<ANALYSIS>
<refname>GS130178</refname>
<accession>EGAR00001446342</accession>
</ANALYSIS>
<ANALYSIS>
<refname>GS130627</refname>
<accession>EGAR00001446343</accession>
</ANALYSIS>
<ANALYSIS>
<refname>GS130781</refname>
<accession>EGAR00001446344</accession>
</ANALYSIS>
<ANALYSIS>
<refname>GS130782</refname>
<accession>EGAR00001446345</accession>
</ANALYSIS>
<ANALYSIS>
<refname>GS130783</refname>
<accession>EGAR00001446346</accession>
</ANALYSIS>
<ANALYSIS>
<refname>GS130843</refname>
<accession>EGAR00001446348</accession>
</ANALYSIS>
<ANALYSIS_SET>
The problem I have is that I don't want each pair to be nested within an ANALYSIS
tag, I want all the refname
-accession
pairs to be within a single ANALYSIS
section.