Download file via hyperlink in PhantomJS using Sel

2019-01-23 22:26发布

问题:

I am using selenium to do a click function on a hyperlink, which is loaded on a certain page. The script works for google chrome, but does not for phantomjs. Why is this not working?

from selenium import webdriver

driver = webdriver.Chrome()   
#driver = webdriver.PhantomJS(executable_path = "/Users/jameslemieux/PythonProjects/phantomjs-1.9.8-macosx/bin/phantomjs")

driver.get("http://www.youtube-mp3.org/?e=t_exp&r=true#v=hC-T0rC6m7I")

elem = driver.find_element_by_link_text('Download')
elem.click()


driver.save_screenshot('/Users/jameslemieux/Desktop/Misc./test_image.png')

driver.quit()

This works in chrome, but it always opens up a new chrome window to complete the task. I read that I should use phantomjs to have it run behind the scenes, however when i switch the drivers to phantomjs, the download does not seem to go through. The screenshot grabs, and it is indeed at the right page, and the 'Download' is definitely there. So the

elem.click()

is not doing what it should, or it IS clicking, but phantomjs doesnt know how to deal with a direct download link. Please help, ive been at this for hours on end.

回答1:

Since PhantomJS would never proceed with a download request, we need to download the file manually.

The idea here is to click the "Convert" button, wait for the "Download" link to appear, get the href attribute, containing the link to the generated mp3 file, and download it via urllib.urlretrieve():

import urllib
from urlparse import urljoin

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

base_url = 'http://www.youtube-mp3.org/'

driver = webdriver.PhantomJS()
driver.get("http://www.youtube-mp3.org/?e=t_exp&r=true#v=hC-T0rC6m7I")

# convert the video to mp3
driver.find_element_by_id('submit').click()

# wait for download link to appear
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.LINK_TEXT, "Download")))
link = element.get_attribute('href')
url = urljoin(base_url, link)

# download the song
urllib.urlretrieve(url, 'song.mp3')

driver.quit()

# enjoy the great song