This question already has an answer here:
- How do I prevent site scraping? [closed] 26 answers
There is a library of php that makes evreybody able to attacks me (something like cURL). Then i have a idea to prevent it, I want to use dynamic class name for my elements. look at this:
<div class="<?php $ClassName ?>">anything</div> // $className is taken from the database
Note:
$ClassName
will vary evry time.
In this case, anyone don't know what is my class name to select my element and then copy my data. Now i have two problem:
- How can I communicate between
$ClassName
and.$ClassName
(in css file)? in other words, how can i use php variable for css class names ? (dynamic css classes) - Is it optimized to take all class names from database ?!
$ClassName
as random generated string, you don't need to connect to the database.Update
Building on bishop answer, you can add changeable DOM structure to your document. You have to introduce two PHP variable such as
$start
and$close
. The$start
will have a random opening tags such as<span><div><p>
and$close
their close,</p></div></span>
then enclose your document between themUsing the database to get the class name is not optimal until it can be done locally. You should define a array of all class names, and then pick one up them by
array_rand
, some thing like this:Note: you must know that you can't use php codes at
.css
file, then you should write all css codes that you want to be dynamic in your.php
file and use<style> stuff </style>
.Some thing like this: (full code)
And for higher security, You can put your content (Here 'anything') (in addition to the external dynamic tags). for example:
In this case, the adjacent tag with data is also dynamic, And this makes it harder for crawlers.
Finally, I must say that you can't prevent crawlers utterly, you just make it difficult. If you really want to protect your data, you can do things like them:
Sorry to say, but your effort will be wasted. Even if the class name randomly changes, your DOM can still be attacked positionally, like:
div + div > span > a
.But even if you rotated your positions (by eg adding spurious
div
andspan
), any scraper worth its salt isn't actually going to care: it's going to find the text on your page, then infer from nearest markup the intent. That's how Google works, BTW.You have one realistic approach to this problem. First, attach an IDS monitor to your web server. When the IDS detects a scan pattern, throttle or shut down the IP. Or, and this is my favorite, throw the scanner into a honey pot with faked content. Ie, if your actual text reads "Freds widgets are the best in the world", serve an alternate page that reads "Bobs gonads fell short of maritime bliss."
I deploy that latter tactic on a couple of my customers' sites to hilarious results on Chinese copy cats.