可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am trying to automate download of historic stock data using python. The URL I am trying to open responds with a CSV file, but I am unable to open using urllib2. I have tried changing user agent as specified in few questions earlier, I even tried to accept response cookies, with no luck. Can you please help.

Note: The same method works for yahoo Finance.

Code:

import urllib2,cookielib

site= \"http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true\"

hdr = {\'User-Agent\':\'Mozilla/5.0\'}

req = urllib2.Request(site,headers=hdr)

page = urllib2.urlopen(req)

Error

File \"C:\\Python27\\lib\\urllib2.py\", line 527, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden

Thanks for your assistance

回答1:

By adding a few more headers I was able to get the data:

import urllib2,cookielib

site= \"http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true\"
hdr = {\'User-Agent\': \'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11\',
       \'Accept\': \'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\',
       \'Accept-Charset\': \'ISO-8859-1,utf-8;q=0.7,*;q=0.3\',
       \'Accept-Encoding\': \'none\',
       \'Accept-Language\': \'en-US,en;q=0.8\',
       \'Connection\': \'keep-alive\'}

req = urllib2.Request(site, headers=hdr)

try:
    page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.fp.read()

content = page.read()
print content

Actually, it works with just this one additional header:

\'Accept\': \'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\',

回答2:

This will work in Python 3

import urllib.request

user_agent = \'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7\'

url = \"http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers\"
headers={\'User-Agent\':user_agent,} 

request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read() # The data u need

回答3:

NSE website has changed and the older scripts are semi-optimum to current website. This snippet can gather daily details of security. Details include symbol, security type, previous close, open price, high price, low price, average price, traded quantity, turnover, number of trades, deliverable quantities and ratio of delivered vs traded in percentage. These conveniently presented as list of dictionary form.

Python 3.X version with requests and BeautifulSoup

from requests import get
from csv import DictReader
from bs4 import BeautifulSoup as Soup
from datetime import date
from io import StringIO 

SECURITY_NAME=\"3MINDIA\" # Change this to get quote for another stock
START_DATE= date(2017, 1, 1) # Start date of stock quote data DD-MM-YYYY
END_DATE= date(2017, 9, 14)  # End date of stock quote data DD-MM-YYYY


BASE_URL = \"https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol={security}&segmentLink=3&symbolCount=1&series=ALL&dateRange=+&fromDate={start_date}&toDate={end_date}&dataType=PRICEVOLUMEDELIVERABLE\"




def getquote(symbol, start, end):
    start = start.strftime(\"%-d-%-m-%Y\")
    end = end.strftime(\"%-d-%-m-%Y\")

    hdr = {\'User-Agent\': \'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11\',
         \'Accept\': \'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\',
         \'Referer\': \'https://cssspritegenerator.com\',
         \'Accept-Charset\': \'ISO-8859-1,utf-8;q=0.7,*;q=0.3\',
         \'Accept-Encoding\': \'none\',
         \'Accept-Language\': \'en-US,en;q=0.8\',
         \'Connection\': \'keep-alive\'}

    url = BASE_URL.format(security=symbol, start_date=start, end_date=end)
    d = get(url, headers=hdr)
    soup = Soup(d.content, \'html.parser\')
    payload = soup.find(\'div\', {\'id\': \'csvContentDiv\'}).text.replace(\':\', \'\\n\')
    csv = DictReader(StringIO(payload))
    for row in csv:
        print({k:v.strip() for k, v in row.items()})


 if __name__ == \'__main__\':
     getquote(SECURITY_NAME, START_DATE, END_DATE)

Besides this is relatively modular and ready to use snippet.

urllib2.HTTPError: HTTP Error 403: Forbidden

问题:

回答1:

回答2:

回答3:

Python 3.X version with requests and BeautifulSoup

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮