Scraping javascript data within a grid of a webpag

2019-07-11 18:54发布

My issue is that I need all the data within the grid containing subdomains from the website https://applipedia.paloaltonetworks.com - (data containing NAME , CATEGORY, SUBCATEGORY, RISK, TECHNOLOGY). What I require is [Example: In line number 5: 2ch has 2 subdomains |_2ch-base and 2ch-posting. Like this I only want to get the list of all apps having subdomains]

Right not whenever I have tried adding anything in the line:

table =wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,    'tbody#bodyScrollingTable tr')))

I am getting a timeout error.

Below is the script I have as of now which fetches all the data from the grid but I need only the apps and it's containing subdomains.[Example 2ch, 2ch-base, 2ch-posting]. I have found out a pattern through inspect element which is all apps that doesn't have subdomains have ( ) or we can go by the () field which is common for all apps having subdomains. Any help on solving this problem will be much appreciated.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC 

driver   = webdriver.Chrome(executable_path = r'/Users/am/Downloads/chromedriver')
driver.maximize_window()

driver.get("https://applipedia.paloaltonetworks.com/") 

wait = WebDriverWait(driver,30)

table =wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,    'tbody#bodyScrollingTable tr')))

for tab in table:
  print(tab.text)

2条回答
姐就是有狂的资本
2楼-- · 2019-07-11 19:18

With code below you can get list of domains with subdomains fast and clear:

WebDriverWait(driver, 20).until(EC. visibility_of_element_located((By.CSS_SELECTOR, "[ottawagroup='1'] a")))
domains = driver.execute_script("return  [...document.querySelectorAll(\"[ottawagroup='1'] a\")].map(e=>e.textContent.trim())")
查看更多
狗以群分
3楼-- · 2019-07-11 19:36

As per the url https://applipedia.paloaltonetworks.com/ to get the list of all apps having subdomains you need to induce WebDriverWait for the desired elements to be visible and you can use the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    options.add_argument("--disable-gpu")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
    driver.get('https://applipedia.paloaltonetworks.com/')
    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='btmTable' and @id='dataTable']//tbody[@id='bodyScrollingTable']//tr[not(@ottawagroup='0') and not(@ottawagroup='2')]/td/a")))
    for element in elements:
        print(element.get_attribute("innerHTML"))
    
  • Console Output:

    DevTools listening on ws://127.0.0.1:12927/devtools/browser/d4a5d576-a4b0-4a3d-959b-9d37aff36fc6
    
                                    2ch
    
    
                                    51.com
    
    
                                    adobe-connect
    
    
                                    adobe-connectnow
    
    
                                    adobe-creative-cloud
    
    
                                    aim
    
    
                                    aim-express
    
    
                                    ali-wangwang
    
    
                                    amazon-cloud-drive
    
    
                                    amazon-music
    
    
                                    ameba-now
    
    
                                    assembla
    
    
                                    autodesk360
    
    
                                    avaya-webalive
    
    
                                    bacnet
    
    
                                    baidu-hi
    
    
                                    bebo
    
    
                                    bitbucket
    
    
                                    boxnet
    
    
                                    buddybuddy
    
    
                                    chinaren
    
    
                                    cisco-spark
    
    
                                    cloudapp
    
    
                                    cloudforge
    
    
                                    cloudinary
    
    
                                    concur
    
    
                                    confluence
    
    
                                    convo
    
    
                                    cyph
    
    
                                    daum
    
    
                                    dcinside
    
    
                                    diameter
    
    
                                    dnp3
    
    
                                    dochub
    
    
                                    docstoc
    
    
                                    docusign
    
    
                                    draw.io
    
    
                                    dropbox
    
    
                                    egnyte
    
    
                                    evernote
    
    
                                    facebook
    
    
                                    fetion
    
    
                                    filestack
    
    
                                    flickr
    
    
                                    flixwagon
    
    
                                    fuze-meeting
    
    
                                    gatherplace
    
    
                                    genesys
    
    
                                    git
    
    
                                    github
    
    
                                    gitlab
    
    
                                    glassdoor
    
    
                                    globalmeet
    
    
                                    gmail
    
    
                                    google-calendar
    
    
                                    google-cloud-storage
    
    
                                    google-docs
    
    
                                    google-hangouts
    
    
                                    google-plus
    
    
                                    google-spaces
    
    
                                    google-talk
    
    
                                    google-translate
    
    
                                    google-video
    
    
                                    gotomypc
    
    
                                    gotowebinar
    
    
                                    gtp
    
    
                                    hadoop
    
    
                                    hightail
    
    
                                    hipchat
    
    
                                    hootsuite
    
    
                                    huddle
    
    
                                    hulu
    
    
                                    hyves
    
    
                                    iccp
    
    
                                    icloud
    
    
                                    iec-60870-5-104
    
    
                                    imeet
    
    
                                    imgur
    
    
                                    instagram
    
    
                                    instan-t
    
    
                                    ip-messenger
    
    
                                    ipsec
    
    
                                    irc
    
    
                                    issuu
    
    
                                    itunes
    
    
                                    jira
    
    
                                    join-me
    
    
                                    jumpshare
    
    
                                    kaixin
    
    
                                    kaixin001
    
    
                                    kakaotalk
    
    
                                    laiwang
    
    
                                    landesk
    
    
                                    linkedin
    
    
                                    live-mesh
    
    
                                    lotus-notes
    
    
                                    lotuslive
    
    
                                    lucidpress
    
    
                                    mail.ru
    
    
                                    mail.ru-agent
    
    
                                    maytech
    
    
                                    meebo
    
    
                                    meetup
    
    
                                    mega
    
    
                                    mendeley
    
    
                                    mercurial
    
    
                                    mixi
    
    
                                    modbus
    
    
                                    ms-ds-smb
    
    
                                    ms-lync
    
    
                                    ms-office365
    
    
                                    ms-onedrive
    
    
                                    msn
    
    
                                    myspace
    
    
                                    nateon-im
    
    
                                    netease-webdisk
    
    
                                    netflix
    
    
                                    ning
    
    
                                    noteworthy
    
    
                                    now-tv
    
    
                                    odnoklassniki
    
    
                                    onehub
    
    
                                    owncloud
    
    
                                    paltalk
    
    
                                    pastebin
    
    
                                    pcanywhere
    
    
                                    pinterest
    
    
                                    pivotaltracker
    
    
                                    powow
    
    
                                    prezi
    
    
                                    proofhub
    
    
                                    qik
    
    
                                    qliksense-cloud
    
    
                                    qq
    
    
                                    quip
    
    
                                    quora
    
    
                                    rally-software
    
    
                                    readytalk
    
    
                                    reddit
    
    
                                    rediffbol
    
    
                                    renren
    
    
                                    rtp
    
    
                                    salesforce
    
    
                                    sap-jam
    
    
                                    screencast
    
    
                                    scribd
    
    
                                    second-life
    
    
                                    secure-data-space
    
    
                                    sendthisfile
    
    
                                    service-now
    
    
                                    sharefile
    
    
                                    sharepoint
    
    
                                    sharevault
    
    
                                    showmax
    
    
                                    siemens-s7
    
    
                                    signiant
    
    
                                    sina-uc
    
    
                                    sina-weibo
    
    
                                    skydrive
    
    
                                    slack
    
    
                                    slideshare
    
    
                                    smartsheet
    
    
                                    snmp
    
    
                                    softros-messenger
    
    
                                    solarwinds
    
    
                                    soundcloud
    
    
                                    sourceforge
    
    
                                    spark-im
    
    
                                    ss7-map
    
    
                                    stocktwits
    
    
                                    storify
    
    
                                    subversion
    
    
                                    surveymonkey
    
    
                                    syncplicity
    
    
                                    tableau
    
    
                                    teamdrive
    
    
                                    teamup-calendar
    
    
                                    teamviewer
    
    
                                    thwapr
    
    
                                    torch-browser
    
    
                                    trello
    
    
                                    tumblr
    
    
                                    twitter
    
    
                                    uc-yun
    
    
                                    viber
    
    
                                    vimeo
    
    
                                    vine
    
    
                                    virustotal
    
    
                                    vkontakte
    
    
                                    vnc
    
    
                                    watchdox
    
    
                                    webex
    
    
                                    wechat
    
    
                                    weiyun
    
    
                                    whatsapp
    
    
                                    windows-azure
    
    
                                    windows-defender-atp
    
    
                                    workday
    
    
                                    yahoo-im
    
    
                                    yammer
    
    
                                    youku
    
    
                                    yousendit
    
    
                                    youtube
    
    
                                    yunpan360
    
    
                                    yy-voice
    
    
                                    zalo
    
    
                                    zendesk
    
    
                                    zenefits
    
    
                                    zettahost
    
查看更多
登录 后发表回答