So, maybe I'm being paranoid.
I'm scraping my Facebook timeline for a hobby project using PhantomJS. Basically, I wrote a program that finds all of my ads by querying the page for the text Sponsored
with XPATH inside of phantom's page.evaluate
block. The text was being displayed as innerHTML of html a
elements.
Things were working great for a few days and it was finding tons of ads.
Then it stopped returning any results.
When I logged into Facebook manually to inspect the elements again, I found that the word Sponsored
was now appearing on the page in an ::after
pseudoclass element with the css property content: sponsored
. This means that an XPATH query for the text no longer yields any results. No joke, Facebook seemed to have changed the way they rendered this word after being scraped for a couple days.
Paranoid. I told you.
So, I offer this question to the community of Javascript, Web-Scraping, and PhantomJS developers out there. What the heck is going on. Can Facebook know what my PhantomJS program is doing inside of the page.evaluate
block?
If so, how? Would my phantom commands appear in a key logger program embedded in the page, for instance?
What are some of your theories?