I am creating a web scrapping python code (using 2.7.11) that extracts the stock price using symbol. I am not sure why this is not working. But it gives me this output:
Enter Financial Symbol
appl YPE h
Do you want to run again?
My code is below:
import urllib
go=True
while go:
print "Enter Financial Symbol"
symbol=raw_input()
page=urllib.urlopen("http://finance.yahoo.com/q?s=" + symbol)
text=page.read()
where=text.find("yfs_l84")
start=where+7
end=start+5
result = text[start:end]
print ( symbol + " "+ result)
print "Do you want to run again?"
choice=raw_input()
if choice == "no":
go=False
How do I make it work?
The string "yfs_l84"
you are searching for is not contained in the HTML returned by yahoo. So
where=text.find("yfs_l84")
leaves where
to be -1
. So your slice text[start:end]
will always be text[6:11]
and cut YPR h
out of the page source:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en-US">
<head>
...
You should not be using str.find to parse a webpage, you should use an html parser like bs4 but in this case there is an api that you can requests the data from in json format, combining with requests makes it pretty trivial to get the data.
In [25]: import requests
In [26]: sym = "DVN"
In [27]: r = requests.get("http://finance.yahoo.com/webservice/v1/symbols/{sym}/quote?format=json".format(sym=sym))
In [28]: r.json()
Out[28]:
{'list': {'meta': {'count': 1, 'start': 0, 'type': 'resource-list'},
'resources': [{'resource': {'classname': 'Quote',
'fields': {'name': 'Devon Energy Corporation Common',
'price': '18.650000',
'symbol': 'DVN',
'ts': '1455915714',
'type': 'equity',
'utctime': '2016-02-19T21:01:54+0000',
'volume': '33916489'}}}]}}
In [29]: sym = "YHOO"
In [30]: r = requests.get("http://finance.yahoo.com/webservice/v1/symbols/{sym}/quote?format=json".format(sym=sym))
In [31]: r.json()
Out[31]:
{'list': {'meta': {'count': 1, 'start': 0, 'type': 'resource-list'},
'resources': [{'resource': {'classname': 'Quote',
'fields': {'name': 'Yahoo! Inc.',
'price': '30.040001',
'symbol': 'YHOO',
'ts': '1455915600',
'type': 'equity',
'utctime': '2016-02-19T21:00:00+0000',
'volume': '20734985'}}}]}}
In [32]: sym = "AAPL"
In [33]: r = requests.get("http://finance.yahoo.com/webservice/v1/symbols/{sym}/quote?format=json".format(sym=sym))
In [34]: r.json()
Out[34]:
{'list': {'meta': {'count': 1, 'start': 0, 'type': 'resource-list'},
'resources': [{'resource': {'classname': 'Quote',
'fields': {'name': 'Apple Inc.',
'price': '96.040001',
'symbol': 'AAPL',
'ts': '1455915600',
'type': 'equity',
'utctime': '2016-02-19T21:00:00+0000',
'volume': '35374173'}}}]}}
You can pull whatever data you want using accessing by key which is a lot more robust than str.find.