Mimicking HTML5 Video support on PhantomJS used th

2020-06-23 06:42发布

问题:

I am trying to extract the source link of an HTML5 video found in the video tag . Using Firefox webdrive , I am able to get the desired result ie -

[<video class="video-stream html5-main-video" src='myvideoURL..'</video>]

but if I use PhantomJS -

 <video class="video-stream html5-main-video" style="width: 854px; height: 480px; left: 0px; top: 0px; -webkit-transform: none;" tabindex="-1"></video>

I suspect this is because of PhantomJS' lack of HTML5 Video support . Is there anyway I can trick the webpage into thinking that HTML5 Video is supported so that it generates the URL ? Or can I do something else ?

tried this

try:

    WebDriverWait(browser,10).until(EC.presence_of_element_located((By.XPATH, "//video")))


finally:


    k = browser.page_source


    browser.quit()


soup = BeautifulSoup(k,'html.parser')


print (soup.find_all('video'))

回答1:

The way Firefox and phantomjs webdrivers communicate with Selenium are quite different.

When using Firefox, it signals back that the page has finished loading after it loaded some of the javascript

Differently in phantomjs, it signals Selenium that the page has finished loading as soon as it is able to get the page source meaning it wouldn't have loaded any javascript.

What you need to do is Wait for the element to be present before extracting it, in this case it would be:

video = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//video")))

EDIT:

Youtube first checks if the browser supports the video content before deciding whether to provide the source, theres a workaround though described here