I need to make a proxy script that can access a page hidden behind a login screen. I do not need the proxy to "simulate" logging in, instead the login page HTML should be displayed to the user normally, and all the cookies and HTTP GET/POST data to flow through the proxy to the server, so the login should be authentic.
I don't want the login/password, I only need access to the HTML source code of the pages generated after logging in.
Does anybody here know how this can be accomplished? Is it easy?
If not, where do I begin?* (I'm currently using PHP)*
What you are talking about is accessing pages for which you need to authenticate yourself.
Here are a few things that must be laid down:
- you can't view those pages without authenticating yourself.
- if the website (whose HTML code you want to see) only supports web login as an authentication method, you will need to simulate login by sending a (username,password) via POST/GET, as the case may be
- if the website will let you authenticate yourself in other ways (like LDAP, Kerberos etc), then you should do that
The key point is that you cannot gain access without authenticating yourself first.
As for language, it is pretty doable in PHP. And as the tags on the question suggest, you are using the right tools to do that job already.
One thing I would like to know is, why are you calling it a "proxy"? do you want to serve the content to other users?
EDIT: [update after comment]
In that case, use phproxy. It does what you want, along with a host of other features.
Have your PHP script request the URL you want, and rewrite all links and form actions to point back to your php script. When receiving requests to the script that have a URL parameter, forward that to the remote server and repeat.
You won't be able to catch all JavaScript requests, (unless you implemented a JavaScript portion of your "proxy")
Eg: User types http://example.com/login.php into your proxy form.
send the user to http://yoursite.com/proxy.php?url=http://example.com/login.php
make sure to urlencode the parameter "http://example.com/login.php"
In http://yoursite.com/proxy.php, you make an HTTP request to http://example.com/login.php
$url = $_REQUEST['url'];
// make sure we have a valid URL and not file path
if (!preg_match("`https?\://`i", $url)) {
die('Not a URL');
}
// make the HTTP request to the requested URL
$content = file_get_contents($url);
// parse all links and forms actions and redirect back to this script
$content = preg_replace("/some-smart-regex-here/i", "$1 or $2 smart replaces", $content);
echo $content;
Note that /some-smart-regex-here/i is actually a regex expression you should write to parse links, and such.
The example just proxies the HTTP Body, you may want to proxy the HTTP Headers. You can use fsockopen() or PHP stream functions in PHP5+ (stream_socket_client() etc.)
You could check out http://code.google.com/p/php-transparent-proxy/ , I made it because I was asking myself that exact same question and I decided to make one. It's under BSD license, so have fun :)
I would recommand using Curl (php library that you might need to activate in your php.ini)
It's used to manipulate remote websites, handling cookies and every http parameters you need.
You'll have to write your proxy based on the web pages you're hitting, but it'll make the job.