Code containing my data:
<div id="content"><!-- InstanceBeginEditable name="EditRegion3" -->
<div id="content_div">
<div class="title" id="content_title_div"><img src="img/banner_outlets.jpg" width="920" height="157" alt="Outlets" /></div>
<div id="menu_list">
<table border="0" cellpadding="5" cellspacing="5" width="100%">
<tbody>
<tr>
<td valign="top">
<p>
<span class="foodTitle">Century Square</span><br />
2 Tampines Central 5<br />
#01-44-47 Century Square<br />
Singapore 529509</p>
<p>
<br />
<strong>Opening Hours:</strong><br />
7am to 12am (Sun-Thu & PH)<br />
24 Hours (Fri & Sat &</p>
<p>
Eve of PH)<br />
Telephone: 6789 0457</p>
</td>
<td valign="top">
<img alt="Century Square" src="/assets/images/outlets/century_sq.jpg" style="width: 260px; height: 140px" /></td>
<td valign="top">
<span class="foodTitle">Liat Towers</span><br />
541 Liat towers #01-01<br />
Orchard Road<br />
Singapore 238888<br />
<br />
<strong>Opening Hours: </strong><br />
24 hours (Daily)<br />
<br />
Telephone: 6737 8036</td>
<td valign="top">
<img alt="Liat Towers" src="/assets/images/outlets/century_liat.jpg" style="width: 260px; height: 140px" /></td>
</tr>
**i want to get
place name: Century Square, Liat Towers
address : 2 Tampines Central 5, 541 Liat towers #01-01
postal code: Singapore 529509, Singapore 238888
Opening hours: 7-12am, 24 hours daily**
For example:
the first <"p> in '<"td valign="top">' have 3 data which i want (name,adress,postal). How do i split them?
here is my spider code:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
import re
from todo.items import wendyItem
class wendySpider(BaseSpider):
name = "wendyspider"
allowed_domains = ["wendys.com.sg"]
start_urls = ["http://www.wendys.com.sg/outlets.php"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
values = hxs.select('//td')
items = []
for value in values:
item = wendyItem()
item['name'] = value.select('//span[@class="foodTitle"]/text()').extract()
item['address'] = value.select().extract()
item['postal'] = value.select().extract()
item['hours'] = value.select().extract()
item['contact'] = value.select().extract()
items.append(item)
return items