Xpath Table Within Table

2019-09-12 05:28发布

问题:

I am having a bit of a problem of scraping a table-heavy page with DOMXpath.

The layout is really ugly, meaning I am trying to get content out of a table within a table within a table. Using Firebug FirePath I am getting for the table element the following path:

 html/body/table/tbody/tr[3]/td/table[1]/tbody/tr[2]/td[1]/table[1]/tbody/tr[3]/td[4]

Now, after endless experimenting I found out, that with a stand alone table, I need to remove the "tbody" tag to make it work. But this doesn't seem to be enough for tables within tables. So my question is how do I best get content out of tables within tables within tables?

I uploaded the file which I am trying to scrape here:1

回答1:

i have gone through with the same problem as yours scrapping a source of complicated and not well formatted html where i want to get the values in a table inside another tables..

i came with the approach of eyeing the part that i want to get with some series of function like this:

function parse_html() {//gets a specific part of the table i chose to extract the contents
    $query = $xpath->query('//tr[@data-eventid]/@data-eventid'); //gets the table i want
    $this->parse_table();
}
function parse_table() {//
    $query = $xpath->query('//tr[@data-eventid="405412"]/td[@class="impact"]/span[@title]/@title');...etc//extracts the content of the table
    $this->parseEvaluate();
} 
function parseEvaluate(){
    ...verifying values if correct
}

just giving the idea..



回答2:

How about:

//*[contains(text(),"GRABME")]

I know that's probably not what you want, but you get the idea. Identify a pattern and use that pattern to construct the xpath.