How to extract all the texts from <a> tag using Se

Here is the link of website from where I want to extract data, I'm trying to get all text of href attribute under anchor tag. Here is the sample html:

<div id="borderForGrid" class="border">
  <h5 class="">
    <a href="/products/product-details/?prod=30AD">A/D TC-55 SEALER</a>
  </h5>

<div id="borderForGrid" class="border">
  <h5 class="">
    <a href="/products/product-details/?prod=P380">Carbocrylic 3356-1</a>
 </h5>

I want to extract all text values like ['A/D TC-55 SEALER','Carbocrylic 3356-1'].
I tried with:

target = driver.find_element_by_class_name('border')
anchorElement = target.find_element_by_tag_name('a')
anchorElement.text

but it gives '' (empty) string.

Any suggestion on how can it be achieved?

PS - Select first value of radio button under PRODUCT TYPE

标签： python selenium xpath css-selectors webdriverwait

3条回答

Rolldiameter

2楼-- · 2019-08-17 03:33

If you need all links values you should be using find_elements_.... functions, not find_element_... functions as the latter one will return you first single match.

Recommended update for your code:

driver.get("http://www.carboline.com/products/")
for link in driver.find_elements_by_xpath("//ul[@id='productList']/descendant::*/a"):
    if link.is_displayed():
        print(link.text)

More information:

0人赞添加讨论(0) 举报

够拽才男人

3楼-- · 2019-08-17 03:34

To extract all the text values within the <a> tags e.g. ['A/D TC-55 SEALER','Carbocrylic 3356-1'], you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following solutions:

Using CSS_SELECTOR:

print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "li.topLevel[data-types='Acrylics'] h5>a[href^='/products/product-details/?prod=']")))])

Using XPATH:

print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Acrylics']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

0人赞添加讨论(0) 举报

【Aperson】

4楼-- · 2019-08-17 03:34

Looks like when the website is first loaded all products are loaded as well. The pagination at the bottom does not actually change to different pages. Therefore you are able to extract all products on the very first request of http://www.carboline.com/products/. I used python requests to fetch the websites HTML and lxml html to parse the HTML.

I would stay away from selenium, etc.. if possible (sometimes you have no choice). But if the website is super simple like the one in your question. Then I would recommend just making a request. This avoids having to use a browser with all the extra overhead because you are only requesting what you need.

**I updated my answer to also show you how you can extract the href and text at the same time.

import requests

from lxml import html

BASE_URL = 'http://www.carboline.com'

def extract_data(tree):
    elements = [
        e
        for e in tree.cssselect('div.border h5 a')
        if e.text is not None
    ]
    return elements

def build_data(data):
    dataset = []

    for d in data:
        link = BASE_URL + d.get('href')
        title = d.text

        dataset.append(
            {
                'link':link,
                'title':title
            }
        )

    return dataset

def request_website(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
    }
    r = requests.get(url, headers=headers)
    return r.text

response = request_website('http://www.carboline.com/products/')
tree = html.fromstring(response)
data = extract_data(tree)
dataset = build_data(data)
print (dataset)

0人赞添加讨论(0) 举报

How to extract all the texts from tag using Se

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间