I've read tons of documentation related to this problem but I still can't get all the pieces together, so I'd like to ask a couple of questions.
First of all I'll describe briefly the authentication procedure as I understand it, as I may be mistaken in that regard: A client starts a connection, which a server responds to with a combination of public key, some metadata and digital signature of a trusted authority. Then the client takes the decision if she trusts the server, encrypts some random session key with the public key and sends it back. This session key can be decrypted only with private key stored on the server. Server does this and then the HTTPS session begins.
So, if I'm correct above, the question is how the man-in-the-middle attack can occur in such scenario? I mean, even if somebody intercepts the server (e.g. www.server.com) response with public key and has some means to make me think that he is www.server.com, he still wouldn't be able to decrypt my session key without the private key.
Speaking about the mutual authentication, is it all about the server confidence about the client identity? I mean, the client can already be sure that she is communicating with the right server, but now the server wants to find out who the client is, right?
And the last question is about the alternative to the mutual authentication. If I act as a client in the situation described, what if I send a login/password in the HTTP header after the SSL session is established? As I see it, this information can't be intercepted because the connection is already secured and the server can rely on it for my identification. Am I wrong? What are the downsides of such an approach compared with mutual authentication (only security issues are important, not the implementation complexity)?
Everything you have said is correct except the part about the session key. The point of CAs is to defeat a man-in-the-middle attack -- everything else is done by SSL itself. Client authentication is an alternative to a username and password scheme.
The server responds with an X.509 certificate chain and a digital signature signed with its own private key.
Correct.
No. The client and server engage in a mutual session key generation process whereby the session key itself is never transmitted at all.
No.
No.
The TLS/SSL session begins, but there are more steps first.
By masquerading as the server and acting as the SSL endpoint. The client would have to omit any authorization step. Sadly the only authorization step in most HTTPS sessions is a hostname check.
See above. There is no session key to decrypt. The SSL connection itself is secure, it's who you're talking to that may not be secure.
Correct.
No.
It's only as secure as the username/password, which are a lot easier to leak than a private key.
Anyone on the road between client and server can stage a man in the middle attack on https. If you think this is unlikely or rare, consider that there are commercial products that systematically decrypt, scan and re-encrypt all ssl traffic across an internet gateway. They work by sending the client an ssl cert created on-the-fly with the details copied from the "real" ssl cert, but signed with a different certificate chain. If this chain terminates with any of the browser's trusted CA's, this MITM will be invisible to the user. These products are primarily sold to companies to "secure" (police) corporate networks, and should be used with the knowledge and assent of users. Technically though, there's nothing stopping their use by ISPs or any other network carrier. (It would be safe to assume the NSA has at least one trusted root CA signing key).
If you're serving a page, you can include an HTTP header indicating what public key the page should be signed with. This may help to alert users to the MITM of their "secure" connection, but it's a trust-on-first-use technique. If Bob doesn't already have a record of the "real" public key pin, Mallory just rewrites the pkp header in the document. The list of web sites using this technology (HSTS) is depressingly short. It includes google and dropbox, to their credit. Usually, a https-intercepting gateway will wave through pages from the few big trusted sites that use HSTS. If you see an HSTS error when your not expecting it, be wary.
Regarding passwords, everything on an https connection is secured by https, except the domain name, which needs to be in the clear so the request can be routed. In general, it's recommended not to put passwords in the query string, where they can hang around in logs, bookmarks etc. But the query string is not visible unless https is compromised.
Man-in-the-middle attacks on SSL are really only possible if one of SSL's preconditions is broken, here are some examples;
The server key has been stolen - means the attacker can appear to be the server, and there is no way for the client to know.
The client trusts an untrustworthy CA (or one that has had it's root key stolen) - whoever holds a trusted CA key can generate a certificate pretending to be the server and the client will trust it. With the number of CAs pre-existing in browsers today, this may be a real problem. This means that the server certificate would appear to change to another valid one, which is something most clients will hide from you.
The client doesn't bother to validate the certificate correctly against its list of trusted CA's - anyone can create a CA. With no validation, "Ben's Cars and Certificates" will appear to be just as valid as Verisign.
The client has been attacked and a fake CA has been injected in his trusted root authorities - allows the attacker to generate any cert he likes, and the client will trust it. Malware tends to do this to for example redirect you to fake banking sites.
Especially #2 is rather nasty, even if you pay for a highly trusted certificate, your site will not be in any way locked to that certificate, you have to trust all CAs in the client's browser since any of them can generate a fake cert for your site that is just as valid. It also does not require access to either the server or the client.