I am trying to scrape some data from this link.
After sequentially selecting options in the three drop down menus - "Crop group", "Crop" and "Variety name" and then with the button "Show features", there is the option to export to csv. I am trying to download all such csv files for a crop.
I am able to extract all the options in the first drop down as follows.
library(rvest)
library(httr)
library(tidyverse)
pg <- read_html("http://seednet.gov.in/SeedVarieties/Varietydetail.aspx")
cropgp_nodes <- html_nodes(pg, "select[id='_ctl0_ContentPlaceHolder1_ddlgroup'] option")
crpgps <- data_frame(crpgp = html_text(cropgp_nodes),
value = html_attr(cropgp_nodes, "value"))
crpgps
# A tibble: 24 x 2
crpgp value
<chr> <chr>
1 --Select Crop Group-- --Select Crop Group--
2 CEREALS A01
3 MILLETS A02
4 PULSES A03
5 OILSEEDS A04
6 FIBRE CROPS A05
7 FORAGE CROPS A06
8 SUGAR CROPS A07
9 STARCH CROPS A08
10 NARCOTICS(OTHER CROPS) A09
# ... with 14 more rows
However as it is sequential, I am not able to get the options for next one.
html_nodes(pg, "select[id='_ctl0_ContentPlaceHolder1_ddlCrop'] option")
{xml_nodeset (0)}
How to scrape the data in this case?
One option is using the
RSelenium
to start the 'Selenium' server-connect with the selenium driver
-loop through the 'crpgp' already extracted and use it to the send the keys to extract the corresponding 'crop' in a loop
-output
-close the connection and stop the server afterwards