I have a question about my latest r vest scrape.
I want to scrape this page (and some other stocks as well):
http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1
I need a list of the Market Capital, which is the first box in the second line.
This list should contain approx 50-100 stocks.
I am using rvest for that.
library(rvest)
html = read_html("http://www.finviz.com/quote.ashx?t=A")
cast = html_nodes(html, "table-dark-row")
The problem is, I can not get around the html_nodes.
Any idea about how to find out the correct node for the html_nodes?
I am using firebug/firefinder to check out the webpage.
Not sure if this is what you want because I cannot find a list with aprox. 50-100 stocks.
But for what is worth, using SelectorGadget I came up with this node .table-dark-row:nth-child(2) .snapshot-td2:nth-child(2), to select the Market Cap (first box in the second line of this page http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1).
> library(rvest)
>
> html = read_html("http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1")
>
> cast = html_nodes(html, ".table-dark-row:nth-child(2) .snapshot-td2:nth-child(2)")
> cast
{xml_nodeset (1)}
[1] <td width="8%" class="snapshot-td2" align="left">\n <b>11.58B</b>\n</td>
>
If this is not exactly what you want, just use SelectorGadget to locate what you want.
Hope this helps.
EDIT :
Here complete solution:
library(rvest)
html = read_html("http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1")
cast = html_nodes(html, ".table-dark-row:nth-child(2) .snapshot-td2:nth-child(2)")
html_text(cast) %>%
gsub(pattern = "B", replacement = "") %>%
as.numeric()