Search on YouTube and return all links in Python

2020-02-12 21:25发布

On YouTube, I want to search for certain videos (i.e. videos on Python) and after this, I want to return all videos this search returns. Right now if, I try this Python returns all the videos on the start page not on the page after the search.

Current code:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("http://youtube.com")
driver.find_element_by_name("search_query").send_keys("Python")
driver.find_element_by_id("search-icon-legacy").click()
links = driver.find_elements_by_id("video-title")
for x in links:
    print(x.get_attribute("href"))

What goes wrong here?

3条回答
beautiful°
2楼-- · 2020-02-12 21:46

To return all videos from the search with the keyword as Python you need to:

  • Maximize the screen so all the resultant video links get rendered within the HTML DOM.
  • Induce WebDriverWait for the desired elements to be visible before extracting the href attributes.
  • You can use the following solution

    • Code Block:

      from selenium import webdriver
      from selenium.webdriver.common.by import By
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.support import expected_conditions as EC
      
      options = webdriver.ChromeOptions()
      options.add_argument("start-maximized")
      options.add_argument("disable-infobars")
      options.add_argument("--disable-extensions")
      driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
      driver.get("https://www.youtube.com/")
      WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input#search"))).send_keys("Python")
      driver.find_element_by_css_selector("button.style-scope.ytd-searchbox#search-icon-legacy").click()
      print([my_href.get_attribute("href") for my_href in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.yt-simple-endpoint.style-scope.ytd-video-renderer#video-title")))])
      
    • Console Output:

      ['https://www.youtube.com/watch?v=rfscVS0vtbw', 'https://www.youtube.com/watch?v=7UeRnuGo-pg', 'https://www.youtube.com/watch?v=3cZsjOclmoM', 'https://www.youtube.com/watch?v=f79MRyMsjrQ', 'https://www.youtube.com/watch?v=CtbckFw0pJs', 'https://www.youtube.com/watch?v=Z1Yd7upQsXY', 'https://www.youtube.com/watch?v=kLZuut1fYzQ', 'https://www.youtube.com/watch?v=IZ0IM_T4aio', 'https://www.youtube.com/watch?v=qiSCMNBIP2g', 'https://www.youtube.com/watch?v=N0lxfilGfak', 'https://www.youtube.com/watch?v=N4mEzFDjqtA', 'https://www.youtube.com/watch?v=s3Ejdx6cIho', 'https://www.youtube.com/watch?v=Y8Tko2YC5hA', 'https://www.youtube.com/watch?v=c3FXQU3TyCU', 'https://www.youtube.com/watch?v=yE9v9rt6ziw', 'https://www.youtube.com/watch?v=yvHrNlAF0Y0', 'https://www.youtube.com/watch?v=ZDa-Z5JzLYM']
      
查看更多
疯言疯语
3楼-- · 2020-02-12 21:49

But is better to use an explicit wait for this:

links = ui.WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.ID, "video-title")))

Reference.

Hope it helps you!

查看更多
三岁会撩人
4楼-- · 2020-02-12 22:05

As per the discussion with @Mark:

It seems that the elements of the first page of Youtube are still in the DOM...

The only fix I see is to go to the search URL:

driver.get("http://youtube.com/results?search_query=Python")
# driver.find_element_by_name("search_query").send_keys("Python")
# driver.find_element_by_id("search-icon-legacy").click()

You should use WebDriverWait not sleep:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

from selenium.webdriver.chrome.options import Options

opt = Options()
opt.add_argument("--incognito")

driver = webdriver.Chrome(executable_path=r'C:\path\to\chromedriver.exe', chrome_options=opt)
driver.get("http://youtube.com")
driver.find_element_by_name("search_query").send_keys("Python")
driver.find_element_by_id("search-icon-legacy").click()
WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.ID, "video-title")))
links = driver.find_elements_by_id("video-title")

for x in links:

    print(x.get_attribute("href"))

The output:

https://www.youtube.com/watch?v=rfscVS0vtbw
https://www.youtube.com/watch?v=f79MRyMsjrQ
https://www.youtube.com/watch?v=kLZuut1fYzQ
https://www.youtube.com/watch?v=N4mEzFDjqtA
https://www.youtube.com/watch?v=Z1Yd7upQsXY
https://www.youtube.com/watch?v=hnDU1G9hWqU
https://www.youtube.com/watch?v=3cZsjOclmoM
https://www.youtube.com/watch?v=f3EbDbm8XqY
https://www.youtube.com/watch?v=2uCXIbkbDSE
https://www.youtube.com/watch?v=HXV3zeQKqGY
https://www.youtube.com/watch?v=JJmcL1N2KQs
https://www.youtube.com/watch?v=qiSCMNBIP2g
https://www.youtube.com/watch?v=7lmCu8wz8ro
https://www.youtube.com/watch?v=25ovCm9jKfA
https://www.youtube.com/watch?v=q6Mc_sAPZ2Y
https://www.youtube.com/watch?v=yE9v9rt6ziw
https://www.youtube.com/watch?v=Y8Tko2YC5hA
https://www.youtube.com/watch?v=G0rQ7AEl5LA
https://www.youtube.com/watch?v=CtbckFw0pJs
https://www.youtube.com/watch?v=sugvnHA7ElY
查看更多
登录 后发表回答