I have a complex html file that I need to parse in Objective-C. The html looks like
<HTML>
<TABLE width="100%" border="0" cellpadding="0" cellspacing="0">
<tr>
<td width="10" align="left" valign="top"><img src="http://www.indianrail.gov.in/main_text_left_top2.gif" alt="" width="8" height="8"></td>
<td width="100%" align="left" valign="top" class="text_rail_top"><img src="http://www.indianrail.gov.in/blank.gif" alt="" width="1" height="8"></td>
<td width="10" align="right" valign="top"><img src="http://www.indianrail.gov.in/main_text_rgt_top2.gif"alt="" width="8" height="8" ></td>
</tr>
<tr>
<td height="400" align="right" valign="top" class="text_rail_left"></td>
<td width="100%" align="left" valign="top" class="text_back_color"><table border="0" cellPadding="0" cellSpacing="0" width="100%"><tr>
<td align="left" valign="top"><table width="100%" border="0" cellspacing="2" cellpadding="0"><tr> <td align="middle"> <FONT SIZE = "1"> Indian Railways Online Website: <b><a TITLE = "Passenger Reservation System - CONCERT" href="http://www.indianrail.gov.in/index.html" target="_blank">http://www.indianrail.gov.in</b></a> designed and hosted by CRIS.</FONT> </td></tr></table></td>
</tr><tr>
<td align="left" valign="top"><table width="100%" border="0" cellspacing="2" cellpadding="0">
<tr>
<td><table border="0" width="100%" /></td>
</tr>
<tr>
<td align="center" valign="top" class="inside_heading_text" colspan="4"><br />Trains Between A Pair of Stations </td>
</tr>
<td colspan="4"> </td>
</tr>
<tr>
<td colspan="4" align="center" valign="top" class="Enq_heading"> You Queried For <SCRIPT LANGUAGE="JavaScript" SRC= "/js/inet_srcdest.js">
function getCookie(http://www.indianrail.gov.in/tbisip_400x400.htm)</SCRIPT>
<link href="http://www.indianrail.gov.in/cris_google.css" media="all" rel="Stylesheet" type="text/css" />
<script language ="JavaScript">
var searchQuery ='MUMBAI CENTRAL DELHI '
</script><FORM NAME="Accavl" METHOD="POST" ACTION="http://www.indianrail.gov.in/cgi_bin/inet_accavl_cgi1.cgi">
<TR>
<TD valign="top"><table width="98%" border="0" align="center" cellpadding="3" cellspacing="1" class="table_border">
<TR class="heading_table_top">
<TH>Origin</TH>
<TH>Destination</TH>
</TR>
<TR>
<TD class="table_border_both">MUMBAI CENTRAL -[BCT ]</TD>
<TD class="table_border_both">DELHI -[DLI ]</TD>
</TR>
</TABLE>
</TD></TR>
<TR><td> </td></TR>
<TR>
<td class="main_text">Enter Quota:</td>
<td><SELECT NAME="lccp_quota" SIZE="1" >
<OPTION VALUE="CK">Tatkal Quota
<OPTION VALUE="LD">Ladies Quota
<OPTION VALUE="DF">Defence Quota
<OPTION VALUE="FT">Foreign Tourist Quota
<OPTION VALUE="SS">Lower Berth Quota$
<OPTION VALUE="YU">Yuva Quota
<OPTION VALUE="DP">Duty Pass Quota
<OPTION VALUE="HP">Handicaped Quota
<OPTION VALUE="PH">Parliament House
<OPTION selected VALUE="GN">General Quota
</SELECT></TD></tr>
<tr>
<td class="main_text">Journey Date:</td><td><INPUT NAME="lccp_day" SIZE="2" VALUE="11" onchange="return changedate()"><SELECT NAME="lccp_month" SIZE="1" onClick="return changedate()"><OPTION selected VALUE="5">May<OPTION VALUE="6">Jun<OPTION VALUE="7">Jul</SELECT></TD></tr><INPUT TYPE="HIDDEN" NAME="lccp_classopt" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class1" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class2" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class3" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class4" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class5" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class6" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class7" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class8" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_class9" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_cls10" SIZE="2" VALUE="ZZ"><INPUT TYPE="HIDDEN" NAME="lccp_age" SIZE="2" VALUE="ADULT_AGE"><tr>
<td> </td><td><INPUT TYPE="Button" CLASS="btn_style" NAME="lccp_submitacc" ONCLICK="return submitavailability(0)" VALUE="Get Availability"> <INPUT TYPE="Button" CLASS="btn_style" NAME="lccp_submitfare" ONCLICK="return submitfare(0)" VALUE="Get Full Fare"> <INPUT TYPE="Button" CLASS="btn_style" NAME="lccp_submitpath" ONCLICK="return submitroute(0)" VALUE="Get Schedule"> <INPUT TYPE="BUTTON" CLASS="btn_style" NAME="lccp_submitrun" ONCLICK="return submitrun(0)" VALUE="Get Running Status"></td></tr></table><br>
<TABLE BORDER ALIGN=center><TABLE width="98%" border="1" bordercolor="#993300" align="center" cellpadding="3" cellspacing="1" class="table_border_both_left"><tr class="heading_table_top">
<TH ROWSPAN = 2 width="9%" >Train No.</TH>
<TH ROWSPAN = 2 width="20%" >Train Name</TH>
<TH ROWSPAN = 2 width="15%" >Origin</TH>
<TH ROWSPAN = 2 width="8%" >Dep.Time</TH>
<TH ROWSPAN = 2 width="14%" >Destination</TH>
<TH ROWSPAN = 2 width="7%" >Arr.Time</TH>
<TH COLSPAN = 7 width="7%" >Days Of Run</TH>
<TH COLSPAN = 10 width="7%">Classes</TH>
</TR>
<TR class="heading_table_top">
<TH width="3%">M</TH>
<TH width="3%">T</TH>
<TH width="3%">W</TH>
<TH width="3%">T</TH>
<TH width="3%">F</TH>
<TH width="3%">S</TH>
<TH width="3%">S</TH>
<TH width="3%">1A</TH>
<TH width="3%">2A</TH>
<TH width="3%">FC</TH>
<TH width="3%">3A</TH>
<TH width="3%">CC</TH>
<TH width="3%">SL</TH>
<TH width="3%">2S</TH>
<TH width="3%">3E</TH>
</TR>
<TR><TD><INPUT TYPE="RADIO" NAME="lccp_trndtl" VALUE="19019BDTSNZM YYYYYYYY "ONCLICK="return farefill('19019BDTSNZM YYYYYYYY ','19019','BDTS',0,0,1,0,1,0,1,0,0,0,0)" CHECKED>19019</TD>
<TD ALIGN =Center TITLE = " Please look the following same trains list also "><A HREF="#SAMETRN">+DEHRADUN EXP </A><A NAME="BACKSAMETRN"></A>
<TD ALIGN =Center TITLE="Station CodeBDTS">BANDRA TERMINUS</TD>
<TD ALIGN = Center>00:05</TD>
<TD ALIGN = Center TITLE="Station Code NZM ">H NIZAMUDDIN </TD>
<TD ALIGN = Center>05:25</TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD>-</TD>
<TD><INPUT TYPE="RADIO" Name="lccp_class2" VALUE="2A" ONCLICK="return deselectclass(1,0,1,0,1,0,1,0,0,0,0,'N','Y','N','N','N','N','N','N','N','N')" CHECKED>
<TD>-</TD>
<TD><INPUT TYPE="RADIO" Name="lccp_class4" VALUE="3A" ONCLICK="return deselectclass(1,0,1,0,1,0,1,0,0,0,0,'N','N','N','Y','N','N','N','N','N','N')">
<TD>-</TD>
<TD><INPUT TYPE="RADIO" Name="lccp_class6" VALUE="SL" ONCLICK="return deselectclass(1,0,1,0,1,0,1,0,0,0,0,'N','N','N','N','N','Y','N','N','N','N')">
<TD>-</TD>
<TD>-</TD>
</TR></FONT>
<TR><TD><INPUT TYPE="RADIO" NAME="lccp_trndtl" VALUE="19023BCT NDLSYYYYYYYY "ONCLICK="return farefill('19023BCT NDLSYYYYYYYY ','19023','BCT ',0,0,0,0,0,0,2,1,0,0,0)">19023</TD>
<TD ALIGN =Center TITLE = " Please look the following same trains list also "><A HREF="#SAMETRN">+FZR JANATA EXP </A><A NAME="BACKSAMETRN"></A>
<TD ALIGN =Center TITLE="Station CodeBCT ">MUMBAI CENTRAL </TD>
<TD ALIGN = Center>07:25</TD>
<TD ALIGN = Center TITLE="Station Code NDLS">NEW DELHI </TD>
<TD ALIGN = Center>12:45</TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD><FONT COLOR = green><B>Y</B></TD>
<TD>-</TD>
<TD>-</TD>
<TD>-</TD>
<TD>-</TD>
<TD>-</TD>
<TD><INPUT TYPE="RADIO" Name="lccp_class6" VALUE="SL" ONCLICK="return deselectclass(2,0,0,0,0,0,2,1,0,0,0,'N','N','N','N','N','Y','N','N','N','N')">
<TD><INPUT TYPE="RADIO" Name="lccp_class7" VALUE="2S" ONCLICK="return deselectclass(2,0,0,0,0,0,2,1,0,0,0,'N','N','N','N','N','N','Y','N','N','N')">
<TD>-</TD>
</TR></FONT>
</TABLE>
</BODY>
</HTML>
I want to parse the html using hpple for the following output
19019
BANDRA TERMINUS
00:05
H NIZAMUDDIN
05:25
2A
3A
SL
19023
MUMBAI CENTRAL
07:25
NEW DELHI
12:45
SL
2S
I started with the following xpath query
NSString *tutorialsXpathQueryString = @"//table[@class='table_border_both_left']//td";
But it returns way to many results and is difficult to parse further. Can someone help me with the xpath query so I can parse this more efficiently.
Thanks!
You can locate table rows with this:
In a row find the expected data :
Sorry for my code but I'm using Selenium WebDriver in Java. I hope the given xpath expressions will be useful.
You can use an XPath union expression (i.e.
|
) to return the directtext()
children of yourTD
elements and also the@VALUE
attribute of yourINPUT
elements: