I have now idea why this piece of code, does not work with this particular site. In other cases it works ok.
url = "http://www.i-apteka.pl/search.php?node=443&counter=all"
content = requests.get(url).text
soup = BeautifulSoup(content)
links = soup.find_all("a", class_="n63009_prod_link")
print links
In this case it prints "[]", but there are obviously some links. Any idea?:)
I had the same problem where locally the Beautiful Soup was working and on my ubuntu Server was returning an empty list all the time. I've tried many parsers following the link [1] and tried many dependencies
Finally what worked for me was:
commands:
and I'm using the the following code:
[http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser][1]
You've found a bug in whichever parser you're using.
I don't know which parser you're using but I do know this:
Python 2.7.2 (from Apple), BS 4.1.3 (from pip), libxml2 2.9.0 (from Homebrew), lxml 3.1.0 (from pip) gets the exact same error as you. Everything else I try—including the same things as above except libxml2 2.7.8 (from Apple)—works. And
lxml
is the default (at least as of 4.1.3) that BS will try first if you don't specify anything else. And I've seen other unexpected bugs with libxml2 2.9.0 (most of which have been fixed on trunk, but no 2.9.1 has been released yet).So, if this is your problem, you may want to downgrade to 2.8.0 and/or build it from top of tree.
But if not… it definitely works for me with 2.7.2 with the stdlib
html.parser
, and in chat you tested the same think with 2.7.1. Whilehtml.parser
(especially before 2.7.3) is slow and brittle, it seems to be good enough for you. So, the simplest solution is to do this:… instead of just letting it pick its favorite parser.
For more info, see Specifying the parser to use (and the sections right above and below).