可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
A product I'm helping to develop will basically work like this:
- A Web publisher creates a new page on their site that includes a
<script>
from our server.
- When a visitor reaches that new page, that
<script>
gathers the text content of the page and sends it to our server via a POST request (cross-domain, using a <form>
inside of an <iframe>
).
- Our server processes the text content and returns a response (via JSONP) that includes an HTML fragment listing links to related content around the Web. This response is cached and served to subsequent visitors until we receive another POST request with text content from the same URL, at which point we regenerate a "fresh" response. These POSTs only happen when our cached TTL expires, at which point the server signifies that and prompts the
<script>
on the page to gather and POST the text content again.
The problem is that this system seems inherently insecure. In theory, anyone could spoof the HTTP POST request (including the referer header, so we couldn't just check for that) that sends a page's content to our server. This could include any text content, which we would then use to generate the related content links for that page.
The primary difficulty in making this secure is that our JavaScript is publicly visible. We can't use any kind of private key or other cryptic identifier or pattern because that won't be secret.
Ideally, we need a method that somehow verifies that a POST request corresponding to a particular Web page is authentic. We can't just scrape the Web page and compare the content with what's been POSTed, since the purpose of having JavaScript submit the content is that it may be behind a login system.
Any ideas? I hope I've explained the problem well enough. Thanks in advance for any suggestions.
回答1:
There is no smoking gun for this. However, where big guns don't exist major annoyance can. Hackers like a challenge, but they prefer an easy target. Be annoying enough that they give up.
Google and others do this effectively with ad words. Create an api token and have them send that. Have a "verification" process for sites using your script that requires the registrant for this script to allow their site to be profiled prior to the use of the script. You can then collect every bit of information about the server in question and if the server profile does not match the one on record, can the request.
Get everything you can know about the browser and client and create a profile for it. If there is any chance it's browser spoofing, drop the request. If the profile repeats but the cookie is gone ignore the input. If you get more than one request from the token in a short period (i.e. rapid page refreshes inherent with hack attempts) ignore the request.
Then go one step further and ping the actual domain to verify that it exists and is an authorized domain. Even if the page is behind a login the domain will still respond. This in itself won't stop hackers, but it is done server side and therefore hidden.
Also, you might consider profiling the content for a page. If a site dedicated to kitchen utensils starts sending back content for adult dating, raise a red flag.
Lastly, when a bad request comes in that you've profiled as a bad request, send the JSONP from what would be a good request for that page based on data you know is good (a 24 hour old version of the page etc.). Don't tell the hacker you know they are there. Act as if everything is fine. It will take them quite awhile to figure that one out!
None of these ideas fulfills the exact needs of your question, but hopefully it will inspire some insidious and creative thinking on your part.
回答2:
How about this? - the <script/>
tag that a third party sites includes has a dynamic src
attribute. So, instead of loading some static Javascript resource, it comes to your server, generates a unique key as an identifier for the website and sends it back in the JS response. You save the same key in user-session or your database. The form created and submitted by this JS code will submit this key parameter too. Your backend will reject any POST request which does not have a matching key with the one in your db/session.
回答3:
Give people keys on a per-domain basis.
Make people include in the requests the hash the value of the [key string + request parameters]. (The hash value should be computed on the server)
When they send you the request, you, knowing the parameters and the key, can verify the validity.
回答4:
The primary weakness with the system as you described it is that you are "given" the page content, why not go and get the page content for yourself?
- A Web publisher creates a new page on their site that includes a script from your server.
- When a visitor reaches that new page, that script sends a get request to your server.
- Your server goes and gets the content of the page (possibly by using the referrer header to determine the source of the request).
- Your server processes the text content and returns a response (via JSONP) that includes an HTML fragment listing links to related content around the Web. This response is cached and served to subsequent visitors from a server side cache / proxy
- When the TTL for the cached version expires, the proxy will forward the request on to your app and the whole cycle starts again from step 3.
This stops malicious content from being "fed" to your server and allows you to provide some form of API key that ties requests and domains or pages together ( i.e. api key 123 only works for referrers on mydomain.com - anything else is obviously spoofed ). Due to the caching / proxy your app is protected to some degree from any form of DOS type attack as well because the page content is only processed once every time the cache TTL expires ( and now you can handle increasing loads by extending the TTL until you can bring additional processing capability on). Now your client side script is insanely small and simple - no more scraping content and posting it - just send an ajax request and maybe populate a couple of parameters ( api key / page ).
回答5:
First of all, I would validate the domain (and maybe the "server profile") as suggested by others here, and obviously very strictly validate the content of the POST (as I hope you're already doing anyway).
If you make the URL for your script file point to something that's dynamically generated by your server, you can also include a time-sensitive session key to be sent along with the POST. This won't completely foil anyone, but if you're able to make the session expire quickly enough it will be a lot more difficult to exploit (and if I understand your application correctly, sessions should only need to last long enough for the user to enter something after loading a page).
After typing this, I realize it's basically what avlesh already suggested with the addition of an expiry.
回答6:
If you can add server-side code to the site pushing data to your site, you could use a MAC to at least prevent non-logged in users from sending anything.
If just anyone is allowed to use the page, then I can't think of a waterproof way of confirming the data without scraping the webpage. You can make sending arbitrary content somewhat more difficult with referer checks and whatnot, but not 100% impossible.
回答7:
You could have hashed keys specific to each clients IP address and compare that value on the server for each post using the IP in the post header. The up side to this is if someone spoofs their IP the response will still be sent to the spoofed IP and not the attacker's. You might already know this but i'd also suggest adding salt to your hashes.
With a spoofed IP a proper TCP handshake can't take place so the attackers spoofed post isn't completed.
There could be other security concerns i'm not aware of but i think it might be an option
回答8:
Can the web publisher also put a Proxy page on their server?
Then load the script through the proxy. Then you have a number of possibilities where you can control the connection between the two servers, add encryption and things like that.
What is the login system? What about using a SSO solution and keeping your scripts separate?
回答9:
You could scrape the site, and if you get a code 200 response including your script just use that scrape. If not you may resolve to information from your "client proxy", that way the problem is down to the sites that you can't scrape.
For raising the security in these cases you could have multiple users sending the page and filter out any information that is not present on a minimum number of the responses. That will also have the added benefit of filtering out any user specific content. Also make sure to register what user you ask to do the proxy work and verify that you only receive pages from users that you have asked to do the job. You could also try to make sure that very active users don't get a higher chance of doing the job, that will make it harder to "fish" for the job.
回答10:
How about:
Site A creates a nonce (basically a random string), sends it to your site B that puts it into the session. Then when the site A makes the POST request from the site it sends the nonce along with the request and the request is only accepted if the nonce matches the one in the site B's session.