php cURL log into jsp website and return HTML

2019-09-01 15:32发布

I'm trying to use cURL to log into a jsp/tomcat website (we'll call it https://unknown.com for privacy reasons) and return the HTML from a page. I've observed the Net panel in firebug and the cookie panel with Firecookie to outline the manual the steps below:

  1. Open web root - https://unknown.com
  2. Redirected to https://unknown.com/common/frames.jsp -Cookie Created: JSESSIONID
  3. Fill out j_username and j_password
  4. Post "j_username=user&j_password=pass&submit=logon" to https://unknown.com/common/j_security_check
  5. Redirect to https://unknown.com/common/frames.jsp
  6. User selects link from home page where the HTML to be return is.

So basically I don't have a lot of experience with cURL and I'm not having much luck, I really just need to start off with understanding the steps that cURL will require to log in to the site and go to the destination page.

EDIT: Here is my code:

//user login information
$username = "user";
$password = "pass";

$postData = "j_username=".$username."&j_password=".$password."&logon=submit";

$cookie_file = "/tmp/curl_cookies.txt";

//$fp = fopen($cookie_file, "w");
//fclose($fp);

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'https://unknown.com/common/j_security_check');
curl_setopt($ch, CURLOPT_POSTFIELDS,$postData);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_REFERER, "https://unknown.com/common/Frames.jsp");
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$data = curl_exec($ch);
curl_close($ch);

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'https://unknown.com/claritymatch/ClarityBatchViewer.jsp?id=123');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$data = curl_exec($ch);

curl_close($ch);
echo $data;

It doesn't work when I first run the .php file, but the second time it brings up the destination HTML - how can I get it to just bring it up the first time? Also, since I'm storing the JSESSIONID cookie in the file indicated above, wont I run into problems with that session id not changing or will it change as needed?

1条回答
Rolldiameter
2楼-- · 2019-09-01 15:52

Here are a few suggestions for your situation...

  • Re-use the same curl handle for simplicity
    This reduces the need to duplicate options for each request. Set the majority of your options at the beginning and do it only once. I refer mostly to cookie options, user-agent, follow-location etc.
    You can then set the URL and request method for each individual request you make.
    You can even gain additional performance by adding a Keep-Alive header to your request so if the remote server supports it, the same connection will be used to make multiple requests without having to reconnect each time.

  • Set CURLOPT_FOLLOWLOCATION to true and start from the beginning
    Try to follow exactly what you see the browser do. That is, request the web root; if the site redirects you to the security check URL, cURL will follow that redirect and capture any cookies set in the process. One cURL request can result in multiple HTTP requests if a redirect is sent. Then proceed to "fill out" the login form.

  • Use http_build_query() for your post data
    There is nothing wrong with the way you set up your post string, but the data must be url-encoded. Using http_build_query() with an array is easier to manipulate and will result in an url-encoded string you can feed directly to cURL.

See also this answer I posted a couple of days ago for a person trying to do something similar. I also posted a few references to some other answers that contain full samples of requesting multiple URLs using cURL; just looking at those answers should help you get an idea of how to do what you want. Especially see this answer which was the first reference in the post I mentioned as it shows how to log into Google by making several post requests and finally a get request.

查看更多
登录 后发表回答