I am trying to program a javascript that'll grab the Inner HTML code from the top news story of the BBC website (http://www.bbc.co.uk/news), and put it in a txt document. I don't know much about javascript, I know more of .BAT and .VBS, but I know that they can't do this.
I'm not sure how to approach this. I thought of making it scan for a fixed outerHTML code, and then copy the inner one to txt file.
However, I can't seem to find an outerHTML code that is permanent everyday. For example, this is the title of today's.
<span class="title-link__title-text">Benefit plan 'could hit young Britons'</span>
As you see, it has the headline incorporated.
I'm using Firefox if that makes a different.
Any help would be much appreciated.
Regards,
Master-chip.
You want download txt file with content from html?Is this right, you can use this create txt file and download it If you want to get text from all title spans, you need do this
And then write txt variable to file, like in post i mentioned above.
My thoughts -
JS can be used to get data/text from pages, but, to save it into a file, you have to use something in the backend like Python or PHP etc.,
Why use JS? You can scrape the web very well using CURL. Use PHP Curl if that's easier for you.
You can scrape/download the webpage using -
Then use the function at your discretion-
Reference Links-
Web scraping with PHP and CURL
Scraping in PHP with CURL
You can scrape more clearly using DIV's and Node's of HTML elements. Check these out - Part1 - Part2 - Part3
Hope it helps. Happy Coding!
Pure client Browser approach:
Ok i made this fiddle for you and may help others too. This was interesting to me and challenging. Below are the points on how i achieved the possible solution
Javascript:
HTML:
Note:
Suggested Approach:
Use node.js server and you can modify the above script for to run as stanalone
Or any server side scripting frameworks like php, java spring etc.
Using Node js approach:
Javascript:
Dependencies for the above code:
Hope it helped you and other also