I've been trying to concatenate some nested text together with xpath in Scrapy. I think it uses xpath 1.0? I've looked at a bunch of other posts, but nothing seems to get quite what I want
Here is the specific part of the html (actual page http://adventuretime.wikia.com/wiki/List_of_episodes):
<tr>
<td colspan="5" style="border-bottom: #BCD9E3 3px solid">
Finn and Princess Bubblegum must protect the <a href="/wiki/Candy_Kingdom" title="Candy Kingdom">Candy Kingdom</a> from a horde of candy zombies they accidentally created.
</td>
</tr>
<tr>
<td colspan="5" style="border-bottom: #BCD9E3 3px solid">
Finn must travel to <a href="/wiki/Lumpy_Space" title="Lumpy Space">Lumpy Space</a> to find a cure that will save Jake, who was accidentally bitten by <a href="/wiki/Lumpy_Space_Princess" title="Lumpy Space Princess">Lumpy Space Princess</a> at Princess Bubblegum's annual 'Mallow Tea Ceremony.'
</td>
</tr>
(much more stuff here)
Here is the result I want back:
[u'Finn and Princess Bubblegum must protect the Candy Kingdom from a horde of candy zombies they accidentally
created.\n', u'Finn must travel to Lumpy Space to find a cure that will save Jake, who was accidentally bitten', (more stuff here)]
I've tried using the answer from HTML XPath: Extracting text mixed in with multiple tags?
description =sel.xpath("//table[@class='wikitable']/tr[position()>1]/td[@colspan='5']/parent::tr/td[descendant-or-self::text()]").extract()
but this only gets me back
[u'<td colspan="5" style="border-bottom: #BCD9E3 3px solid">Finn and Princess Bubblegum must protect the <a href="/wiki/
Candy_Kingdom" title="Candy Kingdom">Candy Kingdom</a> from a horde of candy zombies they accidentally created.\n</td>',
The string()
answer doesn't seem to work for me either... I get back a list of only one entry, and there should be many more.
The closest I've gotten is with:
description = sel.xpath("//table[@class='wikitable']/tr[position()>1]/td[@colspan='5']//text()").extract()
and this gets me back
[u'Finn and Princess Bubblegum must protect the ', u'Candy Kingdom', u' from a horde of candy zombies they accidentally
created.\n', u'Finn must travel to ', u'Lumpy Space', u' to find a cure that will save Jake, who was accidentally bitten, (more stuff here)]
Anyone got xpath tips on concatenation?
Thanks!!
Edit: Spider Code
class AT_Episode_Detail_Spider_2(Spider):
name = "ep_detail_2"
allowed_domains = ["adventuretime.wikia.com"]
start_urls = [
"http://adventuretime.wikia.com/wiki/List_of_episodes"
]
def parse(self, response):
sel = Selector(response)
description = sel.xpath("//table[@class='wikitable']/tr[position()>1]/td[@colspan='5']//text()").extract()
print description