I want to make a Greasemonkey script that, while you are in URL_1, the script parses the whole HTML web page of URL_2 in the background in order to extract a text element from it.
To be specific, I want to download the whole page's HTML code (a Rotten Tomatoes page) in the background and store it in a variable and then use getElementsByClassName[0]
in order to extract the text I want from the element with class name "critic_consensus".
I've found this in MDN: HTML in XMLHttpRequest so, I ended up in this unfortunately non-working code:
var xhr = new XMLHttpRequest();
xhr.onload = function() {
alert(this.responseXML.getElementsByClassName(critic_consensus)[0].innerHTML);
}
xhr.open("GET", "http://www.rottentomatoes.com/m/godfather/",true);
xhr.responseType = "document";
xhr.send();
It shows this error message when I run it in Firefox Scratchpad:
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://www.rottentomatoes.com/m/godfather/. This can be fixed by moving the resource to the same domain or enabling CORS.
PS. The reason why I don't use the Rotten Tomatoes API is that they've removed the critics consensus from it.
For cross-origin requests, where the fetched site has not helpfully set a permissive CORS policy, Greasemonkey provides the
GM_xmlhttpRequest()
function. (Most other userscript engines also provide this function.)GM_xmlhttpRequest
is expressly designed to allow cross-origin requests.To get your target information create a
DOMParser
on the result. Do not use jQuery methods as this will cause extraneous images, scripts and objects to load, slowing things down, or crashing the page.Here's a complete script that illustrates the process:
The JavaScript same origin policy prevents you from accessing content that belongs to a different domain.
The above reference also gives you four techniques for relaxing this rule (CORS being one of them).
The problem is: XMLHttpRequest cannot load http://www.rottentomatoes.com/m/godfather/. No 'Access-Control-Allow-Origin' header is present on the requested resource.
Because you are not the owner of the resource you can not set up this header.
What you can do is set up a proxy on heroku which will proxy all requests to rottentomatoes web site Here is a small node.js proxy https://gist.github.com/igorbarinov/a970cdaf5fc9451f8d34
I modified https://github.com/massive/firebase-proxy/ code for this
I published proxy on http://peaceful-cove-8072.herokuapp.com/ and on http://peaceful-cove-8072.herokuapp.com/m/godfather you can test it
Here is a gist to test http://jsfiddle.net/uuw8nryy/