My issue is that I need all the data within the grid containing subdomains from the website https://applipedia.paloaltonetworks.com - (data containing NAME , CATEGORY, SUBCATEGORY, RISK, TECHNOLOGY). What I require is [Example: In line number 5: 2ch has 2 subdomains |_2ch-base and 2ch-posting. Like this I only want to get the list of all apps having subdomains]
Right not whenever I have tried adding anything in the line:
table =wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'tbody#bodyScrollingTable tr')))
I am getting a timeout error.
Below is the script I have as of now which fetches all the data from the grid but I need only the apps and it's containing subdomains.[Example 2ch, 2ch-base, 2ch-posting]. I have found out a pattern through inspect element which is all apps that doesn't have subdomains have ( ) or we can go by the () field which is common for all apps having subdomains. Any help on solving this problem will be much appreciated.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path = r'/Users/am/Downloads/chromedriver')
driver.maximize_window()
driver.get("https://applipedia.paloaltonetworks.com/")
wait = WebDriverWait(driver,30)
table =wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'tbody#bodyScrollingTable tr')))
for tab in table:
print(tab.text)
As per the url https://applipedia.paloaltonetworks.com/
to get the list of all apps having subdomains you need to induce WebDriverWait for the desired elements to be visible and you can use the following solution:
Code Block:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument("--disable-gpu")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
driver.get('https://applipedia.paloaltonetworks.com/')
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='btmTable' and @id='dataTable']//tbody[@id='bodyScrollingTable']//tr[not(@ottawagroup='0') and not(@ottawagroup='2')]/td/a")))
for element in elements:
print(element.get_attribute("innerHTML"))
Console Output:
DevTools listening on ws://127.0.0.1:12927/devtools/browser/d4a5d576-a4b0-4a3d-959b-9d37aff36fc6
2ch
51.com
adobe-connect
adobe-connectnow
adobe-creative-cloud
aim
aim-express
ali-wangwang
amazon-cloud-drive
amazon-music
ameba-now
assembla
autodesk360
avaya-webalive
bacnet
baidu-hi
bebo
bitbucket
boxnet
buddybuddy
chinaren
cisco-spark
cloudapp
cloudforge
cloudinary
concur
confluence
convo
cyph
daum
dcinside
diameter
dnp3
dochub
docstoc
docusign
draw.io
dropbox
egnyte
evernote
facebook
fetion
filestack
flickr
flixwagon
fuze-meeting
gatherplace
genesys
git
github
gitlab
glassdoor
globalmeet
gmail
google-calendar
google-cloud-storage
google-docs
google-hangouts
google-plus
google-spaces
google-talk
google-translate
google-video
gotomypc
gotowebinar
gtp
hadoop
hightail
hipchat
hootsuite
huddle
hulu
hyves
iccp
icloud
iec-60870-5-104
imeet
imgur
instagram
instan-t
ip-messenger
ipsec
irc
issuu
itunes
jira
join-me
jumpshare
kaixin
kaixin001
kakaotalk
laiwang
landesk
linkedin
live-mesh
lotus-notes
lotuslive
lucidpress
mail.ru
mail.ru-agent
maytech
meebo
meetup
mega
mendeley
mercurial
mixi
modbus
ms-ds-smb
ms-lync
ms-office365
ms-onedrive
msn
myspace
nateon-im
netease-webdisk
netflix
ning
noteworthy
now-tv
odnoklassniki
onehub
owncloud
paltalk
pastebin
pcanywhere
pinterest
pivotaltracker
powow
prezi
proofhub
qik
qliksense-cloud
qq
quip
quora
rally-software
readytalk
reddit
rediffbol
renren
rtp
salesforce
sap-jam
screencast
scribd
second-life
secure-data-space
sendthisfile
service-now
sharefile
sharepoint
sharevault
showmax
siemens-s7
signiant
sina-uc
sina-weibo
skydrive
slack
slideshare
smartsheet
snmp
softros-messenger
solarwinds
soundcloud
sourceforge
spark-im
ss7-map
stocktwits
storify
subversion
surveymonkey
syncplicity
tableau
teamdrive
teamup-calendar
teamviewer
thwapr
torch-browser
trello
tumblr
twitter
uc-yun
viber
vimeo
vine
virustotal
vkontakte
vnc
watchdox
webex
wechat
weiyun
whatsapp
windows-azure
windows-defender-atp
workday
yahoo-im
yammer
youku
yousendit
youtube
yunpan360
yy-voice
zalo
zendesk
zenefits
zettahost
With code below you can get list of domains with subdomains fast and clear:
WebDriverWait(driver, 20).until(EC. visibility_of_element_located((By.CSS_SELECTOR, "[ottawagroup='1'] a")))
domains = driver.execute_script("return [...document.querySelectorAll(\"[ottawagroup='1'] a\")].map(e=>e.textContent.trim())")