From what I've learned so far, the purpose of tokens is to prevent an attacker from forging a form submission.
For example, if a website had a form that input added items to your shopping cart, and an attacker could spam your shopping cart with items you don't want.
This makes sense because there could be multiple valid inputs for the shopping cart form, all the attacker would have to do is know an item that the website is selling.
I understand how tokens work and add security in this case, because they ensure the user has actually filled in and pressed the "Submit" button of the form for each item added to the cart.
However, do tokens add any security to a user login form, which requires a username and password?
Since the username and password are very unique the attacker would have to know both in order for the login forgery to work (even if you didn't have tokens setup), and if an attacker already knew that, he could just sign onto the website himself. Not to mention, a CSRF attack that makes the user log himself in wouldn't have any practical purpose anyway.
Is my understanding of CSRF attacks and tokens correct? And are they useless for user login forms as I suspect?
Yes. In general, you need to secure your login forms from CSRF attacks just as any other.
Otherwise your site is vulnerable to a sort of "trusted domain phishing" attack. In short, a CSRF-vulnerable login page enables an attacker to share a user account with the victim.
The vulnerability plays out like this:
- The attacker creates a host account on the trusted domain
- The attacker forges a login request in the victim's browser with this host account's credentials
- The attacker tricks the victim into using the trusted site, where they may not notice they are logged in via the host account
- The attacker now has access to any data or metadata the victim "created" (intentionally or unintentionally) while their browser was logged in with the host account
As a pertinent example, consider YouTube. YouTube allowed users to see a record of "their own" viewing history, and their login form was CSRF-vulnerable! So as a result, an attacker could set up an account with a password they knew, log the victim into YouTube using that account — stalking what videos the victim was watching.
There's some discussion in this comment thread that implies it could "only" be used for privacy violations like that. Perhaps, but to quote the section in Wikipedia's CSRF article:
Login CSRF makes various novel attacks possible; for instance, an
attacker can later log in to the site with his legitimate credentials
and view private information like activity history that has been saved
in the account.
Emphasis on "novel attacks". Imagine the impact of a phishing attack against your users, and then imagine said phishing attack working via the user's own trusted bookmark to your site! The paper linked in the aforementioned comment thread gives several examples that go beyond simple privacy attacks.
Your understanding is correct -- the whole point of CSRF is that the attacker can forge a legitimate-looking request from beforehand. But this cannot be done with a login form unless the attacker knows the victim's username and password, in which case there are more efficient ways to attack (log in yourself).
Ultimately the only thing that an attacker can do is inconvenience your users by spamming failed logins, when the security system might lock out the user for a period of time.
CSRF validation pre-login doesn't make too much sense IMHO.
Thanks to @squiddle for the link: seclab.stanford.edu/websec/csrf/csrf.pdf, we can read on the very first page:
The most popular CSRF defense is to include a secret
token with each request and to validate that the received
token is correctly bound to the user’s session,
preventing CSRF by forcing the attacker to guess the
session’s token.
If you attempt CSRF validation pre-login, then you give a potential attacker the opportunity to scrape a valid code of your web site! He/she would then be able to re-post the token defeating the purpose.
Perhaps an attacker can then try to guess a username of your site. What I've done, if the IP address tries to guess say 10 usernames without success, I simply black list it.