What's the best way to import Infopath .XML forms in R, and transform into a dataframe? If I open the Infopath .XML file in Excel, the rows and columns of the data frame appear correctly.
Here's what I tried in R using the XML package:
- Iused xmlParse() to parse the XML file
- I used xmlToDataFrame() to attempt to transform the parsed XML file to a data frame
In step 2, however, I receive the following error:
Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c("touch your head13011000", :
duplicate subscripts for columns
There do not appear to be duplicate columns, though, when I open the XML file in Excel. How can I transform this XML file from Infopath into a data frame in R? The expected columns should be (as they appear in Excel):
TCID, DateCoded, tcAge, T1_B3, T1_B2, T1_B1, T1_B0, T1_A3, T1_A2, T1_A1, T1_A0, T1_DelayTotal, T2_A3, T2_A2, T2_A1, T2_A, T2_B3, T2_B2, T2_B1, T2_B0, T2_DelayTotal, Coder, notes_t1, note_t2, bachildpres30, baparpres30, bapassptgo, bapassptnogo, bamissgame, P1_B3, P1_B2, P1_B1, P1_B0, P1_A3, P1_A2, P1_A1, P1_A0, P1_DelayTotal, P1_action, P1_go-nogo, P1_score, P1_delay, P1_trial, P1_Ecommand, P1_imitation, P1_restraint, P1_ruleswitch, P1_trials, P1_gotrials, P1_nogotrials, T1_gotrials, T1_nogotrials, T1_trials, T2_gotrials, T2_nogotrials, T2_trials, P1_notplay, T1_trial, T1_go-nogo, T1_score, T1_delay, T1_action, T2_trial, T2_go-nogo, T2_score, T2_delay, T2_action
For the variables that appear multiple times in the XML file, I'd like them to be in the data frame in long form (i.e., multiple rows for the same variable). I don't have much experience with XML files, so your help is greatly appreciated.
Below is what the parsed XML file looks like in R when I use xmlParse:
<my:myFields lang="en-us" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:st="urn:schemas-microsoft-com:office:smarttags" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2009-07-01T18:12:59" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003">
<my:SPSS>
<my:TCID>10</my:TCID>
<my:DateCoded>2013-04-01</my:DateCoded>
<my:tcAge>30</my:tcAge>
<my:T1_B3>6</my:T1_B3>
<my:T1_B2>0</my:T1_B2>
<my:T1_B1>0</my:T1_B1>
<my:T1_B0>0</my:T1_B0>
<my:T1_A3>0</my:T1_A3>
<my:T1_A2>0</my:T1_A2>
<my:T1_A1>1</my:T1_A1>
<my:T1_A0>5</my:T1_A0>
<my:T1_DelayTotal>1</my:T1_DelayTotal>
<my:T2_A3 nil="true"/>
<my:T2_A2 nil="true"/>
<my:T2_A1 nil="true"/>
<my:T2_A0 nil="true"/>
<my:T2_B3 nil="true"/>
<my:T2_B2 nil="true"/>
<my:T2_B1 nil="true"/>
<my:T2_B0 nil="true"/>
<my:T2_DelayTotal nil="true"/>
<my:Coder>Name</my:Coder>
</my:SPSS>
<my:notes_t1/>
<my:note_t2/>
<my:bachildpres30>0</my:bachildpres30>
<my:baparpres30>0</my:baparpres30>
<my:bapassptgo>1</my:bapassptgo>
<my:bapassptnogo>0</my:bapassptnogo>
<my:bamissgame>0</my:bamissgame>
<my:P1_B3>4</my:P1_B3>
<my:P1_B2>0</my:P1_B2>
<my:P1_B1>0</my:P1_B1>
<my:P1_B0>1</my:P1_B0>
<my:P1_A3>0</my:P1_A3>
<my:P1_A2>0</my:P1_A2>
<my:P1_A1>1</my:P1_A1>
<my:P1_A0>3</my:P1_A0>
<my:P1_DelayTotal>0</my:P1_DelayTotal>
<my:group2>
<my:group3>
<my:P1_action>touch your head</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>1</my:P1_trial>
<my:P1_Ecommand>1</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your nose</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>2</my:P1_trial>
<my:P1_Ecommand>1</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your tummy</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>3</my:P1_trial>
<my:P1_Ecommand>1</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your head</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>0</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>4</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your head</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>5</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your nose</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>6</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>clap your hands</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>7</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your nose</my:P1_action>
<my:P1_go-nogo>0</my:P1_go-nogo>
<my:P1_score>0</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>8</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your ears</my:P1_action>
<my:P1_go-nogo>0</my:P1_go-nogo>
<my:P1_score>0</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>9</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your tummy</my:P1_action>
<my:P1_go-nogo>0</my:P1_go-nogo>
<my:P1_score>0</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>10</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your eyes</my:P1_action>
<my:P1_go-nogo>0</my:P1_go-nogo>
<my:P1_score>1</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>11</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>1</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your eyes</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>12</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
</my:group2>
<my:P1_trials>9</my:P1_trials>
<my:P1_gotrials>5</my:P1_gotrials>
<my:P1_nogotrials>4</my:P1_nogotrials>
<my:T1_gotrials>6</my:T1_gotrials>
<my:T1_nogotrials>6</my:T1_nogotrials>
<my:T1_trials>12</my:T1_trials>
<my:T2_gotrials>0</my:T2_gotrials>
<my:T2_nogotrials>0</my:T2_nogotrials>
<my:T2_trials>0</my:T2_trials>
<my:P1_notplay/>
<my:group4>
<my:group5>
<my:T1_trial>1</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>1</my:T1_delay>
<my:T1_action>Touch your tongue</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>2</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your teeth</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>3</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>0</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your ear</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>4</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Clap your hands</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>5</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>0</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Clap your hands</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>6</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>0</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your eyes</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>7</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>0</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your feet</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>8</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your nose</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>9</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>1</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your nose</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>10</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your tummy</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>11</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>0</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Wave your hand</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>12</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your head</my:T1_action>
</my:group5>
</my:group4>
<my:group6>
<my:group7>
<my:T2_trial>1</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your tongue</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>2</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your teeth</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>3</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your ear</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>4</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Clap your hands</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>5</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Clap your hands</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>6</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your eyes</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>7</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your feet</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>8</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your nose</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>9</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your nose</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>10</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your tummy</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>11</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Wave your hand</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>12</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your head</my:T2_action>
</my:group7>
</my:group6>
</my:myFields>
In my experience,
xmlToDataFrame
only works when the XML is already structured in a very consistent way. The data you are working with is structured in many different ways:So most of your nodes contain a single value. A few are empty. The "SPSS" node contains 22 values, all with different names, and "group2", "group4", and "group6" all contain 12 nodes, each of which contains multiple values, but whose values are similar across nodes. When I looked at what Excel did when I imported the file, it stacked the 12-node components on top of each other, then strung all 22 of the "SPSS" components together with all of the single-value nodes and repeated that string for as many rows as had been created from stacking the 12-node components, and then bound the two pieces together in columns.
To do that, separate out the long string from the 12-node chunks:
Now take the 12-node chunks and turn each chunk into a data frame:
Now use the
plyr
package'srbind.fill
function to put all the new data frames together:Now
cbind
your long string of values to your bound dataframes:This approach, like Excel's, introduces a whole lot of NAs, because the column names for your different 12-node chunks all differ slightly. If you did this before calling
rbind.fill
:You would generate fewer NA's because you would generate fewer redundant columns, but then you would have to rely on the values in the "group" column to keep track of which rows had originally appeared in which node.