Web-crawler for facebook in python

I am tring to work with web-Crawler in python to print the number of facebook recommenders. for example in this article from sky-news(http://news.sky.com/story/1330046/are-putins-little-green-men-back-in-ukraine) there are about 60 facebook reccomends. I want to print this number in the python program with web-crawler. i tried to do this, but it doesn't print anything:

import requests
from bs4 import BeautifulSoup

def get_single_item_data(item_url):
    source_code = requests.get(item_url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    # if you want to gather information from that page
    for item_name in soup.findAll('span', {'class': 'pluginCountTextDisconnected'}):
        try:
                print(item_name.string)
        except:
                print("error")

get_single_item_data("http://news.sky.com/story/1330046/are-putins-little-green-men-back-in-ukraine")

标签： python web-crawler python-webbrowser

2条回答

啃猪蹄的小仙女

2楼-- · 2019-07-17 22:31

Facebook recommends are loaded dynamically from javascript, so they won't be available to your HTML parser. You will need to use the Graph API and FQL to get your answer directly from Facebook.

Here is a web console where you can explore queries once you have generated an access token.

0人赞添加讨论(0) 举报

欢心

3楼-- · 2019-07-17 22:35

The Facebook recommends loads in an iframe. You can follow the iframe src attribute to that page, and then load the span.pluginCountTextDisconnected's text:

import requests
from bs4 import BeautifulSoup

url = 'http://news.sky.com/story/1330046/are-putins-little-green-men-back-in-ukraine'
r = requests.get(url) # get the page through requests
soup = BeautifulSoup(r.text) # create a BeautifulSoup object from the page's HTML

url = soup('iframe')[0]['src'] # search for the iframe element and get its src attribute
r = requests.get('http://' + url[2:]) # get the next page from requests with the iframe URL
soup = BeautifulSoup(r.text) # create another BeautifulSoup object

print(soup.find('span', class_='pluginCountTextDisconnected').string) # get the directed information

The second requests.get is written as such due to the src attribute returning //www.facebook.com/plugins/like.php?href=http%3A%2F%2Fnews.sky.com%2Fstory%2F1330046&send=false&layout=button_count&width=120&show_faces=false&action=recommend&colorscheme=light&font=arial&height=21. I added the http:// and ignored the leading //.

BeautifulSoup documentation
Requests documentation

0人赞添加讨论(0) 举报

Web-crawler for facebook in python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间