I am trying to automate download of historic stock data using python. The URL I am trying to open responds with a CSV file, but I am unable to open using urllib2. I have tried changing user agent as specified in few questions earlier, I even tried to accept response cookies, with no luck. Can you please help.
Note: The same method works for yahoo Finance.
Code:
import urllib2,cookielib
site= \"http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true\"
hdr = {\'User-Agent\':\'Mozilla/5.0\'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
Error
File \"C:\\Python27\\lib\\urllib2.py\", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden
Thanks for your assistance
By adding a few more headers I was able to get the data:
import urllib2,cookielib
site= \"http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true\"
hdr = {\'User-Agent\': \'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11\',
\'Accept\': \'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\',
\'Accept-Charset\': \'ISO-8859-1,utf-8;q=0.7,*;q=0.3\',
\'Accept-Encoding\': \'none\',
\'Accept-Language\': \'en-US,en;q=0.8\',
\'Connection\': \'keep-alive\'}
req = urllib2.Request(site, headers=hdr)
try:
page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
print e.fp.read()
content = page.read()
print content
Actually, it works with just this one additional header:
\'Accept\': \'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\',
This will work in Python 3
import urllib.request
user_agent = \'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7\'
url = \"http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers\"
headers={\'User-Agent\':user_agent,}
request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read() # The data u need
NSE website has changed and the older scripts are semi-optimum to current website. This snippet can gather daily details of security. Details include symbol, security type, previous close, open price, high price, low price, average price, traded quantity, turnover, number of trades, deliverable quantities and ratio of delivered vs traded in percentage. These conveniently presented as list of dictionary form.
Python 3.X version with requests and BeautifulSoup
from requests import get
from csv import DictReader
from bs4 import BeautifulSoup as Soup
from datetime import date
from io import StringIO
SECURITY_NAME=\"3MINDIA\" # Change this to get quote for another stock
START_DATE= date(2017, 1, 1) # Start date of stock quote data DD-MM-YYYY
END_DATE= date(2017, 9, 14) # End date of stock quote data DD-MM-YYYY
BASE_URL = \"https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol={security}&segmentLink=3&symbolCount=1&series=ALL&dateRange=+&fromDate={start_date}&toDate={end_date}&dataType=PRICEVOLUMEDELIVERABLE\"
def getquote(symbol, start, end):
start = start.strftime(\"%-d-%-m-%Y\")
end = end.strftime(\"%-d-%-m-%Y\")
hdr = {\'User-Agent\': \'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11\',
\'Accept\': \'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\',
\'Referer\': \'https://cssspritegenerator.com\',
\'Accept-Charset\': \'ISO-8859-1,utf-8;q=0.7,*;q=0.3\',
\'Accept-Encoding\': \'none\',
\'Accept-Language\': \'en-US,en;q=0.8\',
\'Connection\': \'keep-alive\'}
url = BASE_URL.format(security=symbol, start_date=start, end_date=end)
d = get(url, headers=hdr)
soup = Soup(d.content, \'html.parser\')
payload = soup.find(\'div\', {\'id\': \'csvContentDiv\'}).text.replace(\':\', \'\\n\')
csv = DictReader(StringIO(payload))
for row in csv:
print({k:v.strip() for k, v in row.items()})
if __name__ == \'__main__\':
getquote(SECURITY_NAME, START_DATE, END_DATE)
Besides this is relatively modular and ready to use snippet.