Stateless web application, an urban legend?

2019-03-15 19:24发布

问题:

I am trying to understand token-based authentication these days, which claims to be a stateless authentication method. And I met the concept of stateless web application.

Below are some threads I read about:

  • Use Spring MVC for Stateless web application development (no response yet)
  • Stateless Spring MVC
  • How to make a java web application fully stateless
  • How do I make my Web Application stateless yet still do something useful?

At first, I was thrilled at this idea. But more and more I think stateless is a pseudo-proposition.

For example, suppose we use client-stored token for authentication, how can we make a statistic of online users (suppose there's no log)? Shall we store the token in DB? Doesn't that mean we store state info on server? And even more, is the plain user info such as name, age, etc. in DB also some kind of state info?

I think the real question here is not to make a web app stateless, but to make the web app properly handle the state info such that it won't jeopardize scalability.

That depends on how to interpret the word stateless:

  1. Web app doesn't have state.
  2. Or web app doesn't store state itself.

I prefer 2 because there can always be some inevitable global state (quoted from @deceze's comment to his answer). And no matter we store state info as HTML 5 web storage, or HTTP header, or hidden form fields, or Cookie, the state still exists. Only that it is stored somewhere other than on the server.

Am I missing something great? Could anybody shed some light on this so I can be relieved from this mental struggle?

ADD 1

Just read about the book RESTful Web Services by Leonard Richardson. In chapter 4, at end of the section Statelessness, it classifies the state into Application State and Resource State. So the plain user info and data I mentioned before like images, etc. can be classified as Resource State. And what stateless refers to is Application State. So it doesn't break the code of stateless to store resource state on server.

But the book also mentions the scenario where an application key is used to restrict how many times a user can invoke a web service. It admits that such info cannot be stored on client side. And having to store it on server side breaks the code of stateless and introduce the issue of session affinity. It claims stateless can avoid session affinity issue but doesn't explain how. I really don't see how stateless can handle this scenario. Anyone could shed some light here?

回答1:

The "state" only really refers to the state between the client and the server. Of course the server will store data, and technically you can see any modification of any data on the server as "altering state". Hence a "stateless" application in this sense makes absolutely no practical sense.

What "stateless" refers to is whether the server is, at any particular time, in a state to allow a particular client to send a particular request to it.

Consider: with a traditional cookie-based login session, the server is only in a state to accept requests from the client for a limited time window; for as long as the current session is valid. The client cannot predict for how long that is. At any time, a request from the client may fail, because some state on the server timed out. In this case, the client needs to reset the server's state by logging in again.

Contrast this with token based authentication. The token must be valid indefinitely. It is essentially a substitution for a username and password. For the sake of discussion, just assume the client sends their username and password with every request. This means every request can be authenticated on its own merits, not requiring the server to be in some particular temporal "state".

The reason why you use tokens instead of usernames and passwords is twofold:

  1. you can authorise multiple clients using the same account, but each with their individually managed credentials
  2. you don't want to be sending the "master password" back and forth with every request

Of course the server will need to keep track of the created tokens and authenticate against some database with each request. That's an irrelevant implementation detail. This does not differ from using session cookies; however, since tokens are valid indefinitely, requests can potentially be cached easier instead of needing to replicate a temporary session store.

One last potential argument that needs preemptive countering: what's the difference between an indefinite session and an indefinite token, and what's the difference when the session ends vs. when the token may be revoked?
When a session ends, it can be reestablished using some other "master credentials" (logging back in). A token can/should only end when actively revoked, which is akin to revoking the authorisation to access the service entirely for the master credentials, and is not something that is part of the regular application flow.


Speaking more generally: contrast the stateless HTTP protocol with a stateful protocol like FTP. In FTP, the server and client need to keep a shared state in sync. For instance the FTP protocol has, among many other things, the CWD command to change the current working directory. I.e., there is a notion of what directory a client "is in" at any given time. Subsequent commands behave differently depending on what directory one is in. That is stateful. You can't arbitrarily send commands without being aware of that state, else you won't be able to predict what the outcome will be.


Stateless client/server communication simplifies the client side first of all, since the client can assume at all times to be able to request anything of the server, without needing to know the state of the server ("is my session still active or not?", "what directory will this action affect?"). It can help scale out the server implementation since only static information needs to be replicated between all servers, instead of a constantly changing pool of valid sessions and their associated data.


Architecturally, your goal should be to have as many stateless components as possible. This will simplify scaling out. For example, if your web server is keeping a local session store, that makes it very hard to scale out your web server to multiple instances behind a load balancer/CDN. One improvement is to centralise the session store to an independent database; now you can have several stateless web servers which know how to get data (including session data) from somewhere and can render templates, but are otherwise completely interchangeable.

However, a session store must be kept in perfect sync across everyone trying to access it, which makes it hard to scale it. Tokens improve this by making the data change less often (only when tokens are added or removed), which means you can use a distributed database or other simpler replication mechanism if you want to have several token stores in possibly multiple locations, and/or makes that data cacheable.



回答2:

OK, I don't think that the term stateless web application makes any sense. What does make sense is stateless protocol. And stateless protocol is a one that treats each request independently.

So in your case if you send an auth token with each request than it is stateless. That's how HTTP authentication is supposed to work.

On the other hand if you would send auth token only once and each consecutive request wouldn't have to (for example because server knows that this TCP connection is alredy authenticated) then this means that each request depends on the authentication request. This makes the protocol stateful.

Stateless protocols are easier to scale, easier to proxy, etc.

Now as for web applications the term may or may not make sense depending on the definition. I don't know any reasonable though.

Side note: being stateful/stateless is unrelated to sharing data between client and server.



回答3:

I don't think stateless authentication and stateless applications are related in the way you think; the word stateless is being used in two different contexts here.

Stateless authentication is a method of identifying who a client is without carrying any information/state from a previous client request or interaction, unlike cookies for example.

Stateless web applications? Sure, they're possible, but it entirely depends whether or not user data must be persisted, that is, it really depends on the application in question.