I have a task to fetch html from a website, before I go to that page I need to log in.
I use a low-level api url fetch service. Here is my code test code:
private String postPage(String loginPageHtml) throws IOException{
String charset = "UTF-8";
Document doc = Jsoup.parse(loginPageHtml);
Iterator<Element> inputHiddensIter = doc.select("form").first().select("input[type=hidden]").iterator();
String paramStr = "";
paramStr += "Username" + "=" + URLEncoder.encode("username", charset) + "&";
paramStr += "Password" + "=" + URLEncoder.encode("password", charset) + "&";
paramStr += "ImageButton1.x" + "=" + URLEncoder.encode("50", charset) + "&";
paramStr += "ImageButton1.y" + "=" + URLEncoder.encode("10", charset) + "&";
while (inputHiddensIter.hasNext()) {
Element ele = inputHiddensIter.next();
String name = ele.attr("name");
String val = ele.attr("value");
paramStr += name + "=" + URLEncoder.encode(val, charset) + "&";
}
URL urlObj = new URL(LOG_IN_PAGE);
URLFetchService fetcher = URLFetchServiceFactory.getURLFetchService();
HTTPRequest request = new HTTPRequest(urlObj, HTTPMethod.POST);
HTTPHeader header = new HTTPHeader("Content-Type", "application/x-www-form-urlencoded");
HTTPHeader header3 = new HTTPHeader("Content-Language", "en-US");
HTTPHeader header4 = new HTTPHeader("User-Agent", DEFAULT_USER_AGENT);
if(!cookie.isEmpty()){
request.addHeader(new HTTPHeader("Set-Cookie", cookie));
}
request.addHeader(header);
request.addHeader(header3);
request.addHeader(header4);
request.setPayload(paramStr.getBytes());
request.getFetchOptions().setDeadline(30d);
HTTPResponse response = null;
try{
response = fetcher.fetch(request);
byte[] content = response.getContent();
int responseCode = response.getResponseCode();
log.severe("Response Code : " + responseCode);
List<HTTPHeader>headers = response.getHeaders();
for(HTTPHeader h : headers) {
String headerName = h.getName();
if(headerName.equals("Set-Cookie")){
cookie = h.getValue();
}
}
String s = new String(content, "UTF-8");
return s;
}catch (IOException e){
/* ... */
}
return "";
}
Here is my default user agent:
private static final String DEFAULT_USER_AGENT = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.83 Safari/537.1";
It works fine on my dev machine, but when I deploy on app engine and test it, I get response code 500 and the following error:
Validation of viewstate MAC failed. If this application is hosted by a Web Farm or cluster, ensure >that configuration specifies the same validationKey and validation algorithm. >AutoGenerate cannot be used in a cluster.
Description: An unhandled exception occurred during the execution of the current web request. Please >review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.Web.HttpException: Validation of viewstate MAC failed. If this application >is hosted by a Web Farm or cluster, ensure that configuration specifies the same >validationKey and validation algorithm. AutoGenerate cannot be used in a cluster.
It seems some error occur on ASP side.
Is there something wrong with my code or some limitation on app engine?
It looks like you are doing a
POST
to an aspx page.When an aspx page receives a
POST
request it expects some hidden inputs which have an encoded ViewState present - if you browse to the page in question and "View Source" you should see some fields just inside the<form />
tag that look something like this:Because you are submitting a
POST
request without these values present, it's having trouble decoding and validating them (which is what that error means - it can also crop up for other reasons in other scenarios).There are a couple of possible solutions to this:
1 - If you have access to the code for the site, and the login page doesn't require ViewState, you could try switching it off at the page level within the
@Page
directive:2 - You could do a double-request - do a
GET
request on the login page to retrieve the values for any missing hidden fields - use those values and include them in yourPOST
EDIT Ah yes, from your comment I can see that you're including the hidden form fields already - apologies!
In which case, another possibility is that the login page is on a load balanced environment. Each server in that environment will have a different
MachineKey
value (which is used to encode/decode the ViewState). You may be reading from one and posting to the other. Some LBs inject ArrowPoint cookies into the response to ensure that you "stick" to the same server between requests.I can see you're already including a cookie in your
POST
, but I can't see where it's defined. Is it from the firstGET
request, or is it a custom cookie? If you haven't tried it already, maybe try using the cookie from the originalGET
where you're retrieving the login page HTML? Other than that, I'm out of ideas - sorry!