How to anonymously identify a user and store that

2020-05-21 00:07发布

问题:

I need a simple user identification system for the purpose of allowing/prohibiting an action.

This is not a high-security requirement and it is ok to make mistakes (eg same user will execute non-allowed action using different browsers).

To be less abstract, let's see at the StackOverflow voting and assume we want to allow voting by public audience, but only once.

The simplest thing that can work - is using a cookie: set a new cookie per answer; store all votes in one cookie (or combine these somehow).

This is a bit unreliable due to the limitations of a cookie size/number. It will also sent the cookie to the site all the time, while it is only required on 1 action.

So from this perspective I would like to avoid using cookie.
But don't see a better of doing this over a regular HTTP. I don't consider IP/MAC address etc.

So, with the context above, the questions is: how to anonymously identify a user and store that information on the client?

Thanks.

回答1:

Anonymous user identification can certainly be done (and is being done) with a fairly high degree of accuracy. Rather than reprint the methodology here's a bit of reading that will lay it all out.

First is an old bit by the EFF regarding the mathmatics of user privacy (specifically the entropy behind your data) on the internet. Certainly optional, but it expresses the model that we're looking at. You can skip this if the math behind identification doesn't interest you.

http://www.eff.org/deeplinks/2010/01/primer-information-theory-and-privacy

Basically taken in a nutshell: using just the browser-agent, IP address and other data published in one of their examples (panopticlick) http://panopticlick.eff.org/ you have a very high likelihood of uniquely identifying a user (as long as it is the same machine) without the need for cookies. Additional information regarding their research into browser detection and uniqueness is available here:

http://panopticlick.eff.org/browser-uniqueness.pdf

Visit the panopticlick page and give it a test. It will show you what to look for (and give examples and source of how to go about it) while the .pdf will detail the uniqueness and specifics of the fingerprinting method.

My system configuration, for example, is unique among the 1,301,578 total tested with 20.38 bits of identifying information (reduction of entropy). Given their research, you will have an accuracy of 94.2% and 99.1% in identifying users between visits without the use of any client side tracking.



回答2:

Store a small unique identifier on the client side as a cookie, then use that identifier to look up whatever related session data you need on your server's side.



回答3:

MAC address would be awesome, but not possible over HTTP unfortunately (and a good thing for user privacy).

Cookie and IP address are your only options.

You can delete the cookie when you're done with it, only have it apply to the path that is relevant, or just have it expire in a few hours/days whatever is appropriate.