Last night a customer called, frantic, because Google had cached versions of private employee information. The information is not available unless you login.
They had done a Google search for their domain, e.g.:
site:example.com
and noticed that Googled had crawled, and cached, some internal pages.
Looking at the cached versions of the pages myself:
This is Google's cache of https://example.com/(F(NSvQJ0SS3gYRJB4UUcDa1z7JWp7Qy7Kb76XGu8riAA1idys-nfR1mid8Qw7sZH0DYcL64GGiB6FK_TLBy3yr0KnARauyjjDL3Wdf1QcS-ivVwWrq-htW_qIeViQlz6CHtm0faD8qVOmAzdArbgngDfMMSg_N4u45UysZxTnL3d6mCX7pe2Ezj0F21g4w9VP57ZlXQ_6Rf-HhK8kMBxEdtlrEm2gBwBhOCcf_f71GdkI1))/ViewTransaction.aspx?transactionNumber=12345. It is a snapshot of the page as it appeared on 15 Sep 2013 00:07:22 GMT
I was confused by the long url. Rather than:
https://example.com/ViewTransaction.aspx?transactionNumber=12345
there was a long string inserted:
https://example.com/[...snip...]/ViewTransaction.aspx?transactionNumber=12345
It took me a few minutes to remember: that might be a symptom of ASP.net's "cookie-less sessions". If your browser does not support Set-Cookie, the web-site will embed a cookie in the URL.
Except our site doesn't use that.
And even if our site did have cookie-less sessions auto-detected, and Google managed to cajole the web-server into handing it a session in the url, how did it take over another user's session?
Yes, Google a non-malicious bot hijacked a session
The site has been crawled by bots for years. And this past May 29 was no different.
Google usually starts its crawl by checking the robots.txt
file (we don't have one). But nobody is allowed to ready anything on the site (including robots.txt
) without first being authenticated, so it fails:
Time Uri Port User Name Status
======== ======================= ==== ================ ======
1:33:04 GET /robots.txt 80 302 ;not authenticated, see /Account/Login.aspx
1:33:04 GET /Account/Login.aspx 80 302 ;use https plesae
1:33:04 GET /Account/Login.aspx 443 200 ;go ahead, try to login
All that time Google was looking for a robots.txt file. It never got one. Then it returns to try to crawl the root:
Time Uri Port User Name Status
======== ======================= ==== ================ ======
1:33:04 GET / 80 302 ;not authenticated, see /Account/Login.aspx
1:33:04 GET /Account/Login.aspx 80 302 ;use https plesae
1:33:04 GET /Account/Login.aspx 443 200 ;go ahead, try to login
And another check of robots.txt on the secure site:
Time Uri Port User Name Status
======== ======================= ==== ================ ======
1:33:04 GET /robots.txt 443 302 ;not authenticated, see /Account/Login.aspx
1:33:04 GET /Account/Login.aspx 443 200 ;go ahead, try to login
And then the stylesheet on the login page:
Time Uri Port User Name Status
======== ======================= ==== ================ ======
1:33:04 GET /Styles/Site.css 443 200
And that's how every crawl from GoogleBot, msnbot, and BingBot works. Robots, login, secure, login. Never getting anywhere, because it cannot get past WebForms Authentication. And all is well with the world.
Until one day; out of nowhere
Until one day, GoogleBot shows up, with a Session cookie in hand!
Time Uri Port User Name Status
======== ========================= ==== =================== ======
1:49:21 GET / 443 jatwood@example.com 200 ;they showed up logged in!
1:57:35 GET /ControlPanel.aspx 443 jatwood@example.com 200 ;now they're crawling that user's stuff!
1:57:35 GET /Defautl.aspx 443 jatwood@example.com 200 ;back to the homepage
2:07:21 GET /ViewTransaction.aspx 443 jatwood@example.com 200 ;and here comes the private information
The user, jatwood@example.com
had not been logged in for over a day. (I was hoping that IIS had giving the same session identifier to two simultaneous visitors, separated by an application recycle). And our site (web.config
) is not configured to enable session-less cookies. And the server (machine.config
) is not configured to enable session-less cookies.
So:
- how did Google get ahold of a sessionless cookie?
- how did Google get ahold of a valid sessionless cookie?
- how did Google get ahold of a valid sessionless cookie that belonged to another user?
As recently as October 1 (4 days ago), the GoogleBot was still showing up, cookie in hand, logging in as this user, crawling, caching, and publishing, some of their private details.
How is Google a non-malicious web-crawler bypassing WebForms authentication?
IIS7, Windows Server 2008 R2, single server.
Theories
The server is not configured to give out cookieless sessions. But ignoring that fact, how can Google bypass authentication?
- GoogleBot is visting the web-site, and attempting random usernames and passwords (not likely, the logs show no attempts to login)
- GoogleBot decided to insert a random cookieless session into the url string, and it happened to match the session of an existing user (not likely)
- The user managed to figure out how to make an IIS web-site return a cookieless url (not likely), then pasted that url onto another web-site (not likely), where Google found the cookieless url and crawled it
- The user is running through mobile proxy (which they're not). The proxy server doesn't support cookies, so IIS creates a cookieless session. That (e.g. Opera Mobile) caching server was breached (not likely) and all cached links posted on a hacker forum. GoogleBot crawled the hacker forum, and started following all links; including our
jatwood@example.com
cookieless session url. - The user has a virus, which manages to cajole any IIS web-servers into handing back a cookieless url. That virus then reports back to headquarters. The urls are posted onto a publicly accessible resource, that GoogleBot crawl. GoogleBot then shows up at our server with the cookieless url.
None of these are really plausable.
How can Google a non-malicous web-crawler bypass WebForms authentication, and hijack a user's existing session?
What are you asking?
I don't even know how an ASP.net web-site, that is not configured to give out cookieless-sessions, could give out cookieless session. Is it possible to back-convert a cookie-based session id into a cookieless-based session id? I could quote the relevant <sessionState>
section of web.config
and machine.config
, and show there is no presence of
<sessionState cookieless="true">
How does the web-server decide that the browser doesn't support cookies? I tried blocking cookies in Chrome, and I was never given a cookie-less session identifier. Can I simulate a browser that doesnt' support cookies, in order to verify that my server is not giving out cookieless sessions?
Does the server decide cookieless sessions by User-Agent string? If so, I could set Internet Explorer with a spoofed UA.
Does session identity in ASP.net depend solely on the cookie? Can anyone, from any IP, with the cookie-url, access that session? Does ASP.net not, by default, also take into account?
If ASP.net does tie IP address with the session, wouldn't that mean that the session couldn't have originated from the employee at their home computer? Because then when the GoogleBot crawler tried to use it from a Google IP, it would have failed?
Has there been any instances anywhere (besides the one I linked) of ASP.net giving out cookieless sessions when it's not configured to? Is there a Microsoft Connect issue on this?
Is Web-Forms authentication known to have issues, and should not be used to security?
Bonus Reading
- A guy on StackOverflow who's web-server is sometimes giving out cookieless urls when it's not configured to
Edit: Removed name of Google the bot that bypassed privilege, as people are pants on head retarded; confusing Google the name of the crawler for something else. I use Google the name of the crawler as a reminder that it was a non-malicious web-crawler that managed to crawl it's way into another user's WebForm's session. This is to contrast it with a malicious crawler, that was trying to break into another user's session. Nothing like a pedant to bring out the aggravation.