HTMLUnit : Determine if the “login” was successful

2019-07-22 15:22发布

问题:

I have developed a script whose sole purpose is to check if the website/service is up and running. The way it does that is ,it connects to the page using its url and logs in to the site using the user credentials. If the login is successful then the service is running fine.

This script has been written in Java and uses HTMLUnit. Here lies my problem. How do i ensure the HTML page returned after logging in(clicking the login/sign in button after filling out the form) is the post-login "Account home page".In other words how do i determine if the login operation was successful.

Here is how i am doing it right now. Account pages usually have some user related info. For instance if i log in to yahoo mail it will have "Welcome , Username" on the top right corner of the page Or page will always have "Compose" or "inbox" on it. I am using this logic to test for success.

This has been my observation while testing this script. I have come across cases where this rule falls apart.

  • Some times the page returned after login is an error page asking you to
    check your entered credentials.

  • There are times where page returned may ask you to turn on your javascript or
    enable cookie in your browser.

  • I have come across a case where the page returned by the server has been the same pre-login page (no explanation given as to why)

  • Some web pages are dynamic in nature hence the content changes from time to time. In such cases key-word search may result in false
    negatives.Which is why this logic of searching for presence of a string hinges
    purely on the choice of "search string/keywords".

The point that i am trying to make is that coding for these cases upfront is not realistic.

I tried comparing urls of the pre-login and post-login pages but found out
that there are many cases where both are the same.Hence even this method is not conclusive.

I need a sure shot way of determining if the login was successful. I am not a professional web developer. Does the server return any status code with the new page, that can be resolved? Does HTMLUnit have some ability to test for success and failure.

I appreciate your help/comments.Thank you!!!

回答1:

Well... this is kind of a tricky question. This is because you've no control of the server. I you ask for A you might probably receive A but you should be prepared to receive B, C and D... and you will probably miss E.

I need a sure shot way of determining if the login was successful.

Based on your comments, looking for the "Welcome <Username>" string should be quite sure shot. In other (more programmatic) words, if you you have that string in the result page then you are logged in. There is your sure shot.

Now, you've mentioned that there are cases in which you try to log in and you don't receive that string. In those cases, and based on your examples, you should almost always not be logged in.

However, as you said, they can change that string from "Welcome, <Username>" to "There you are again!" and you will be getting false negatives. It is unlikely, however, that you ever get false positives applying that logic.

So, is there any way to have a 100% success on guessing if the user is logged in and also 100% success on guessing if the user is not logged in? No, there isn't. The simplest way to understand this is using the web in a human-like way:

Scenario 1:

  1. Try to log in
  2. You see a red label saying "User and password are incorrect
  3. You infer you are not logged in. You are right

Scenario 2:

  1. Try to log in
  2. You see the "Welcome <Username>"
  3. You infer you are logged in. You are right

Scenario 3:

  1. Try to log in
  2. You get the "Enable javascript" message
  3. You infer you are not logged in. However, this happened to be just an advice from the server and you are actually logged in. You refresh the page and then you see the "Welcome . So here you failed even as a human

Scenario 4:

  1. Try to log in
  2. You get a timeout
  3. You infer you are not logged in. However, the login request got to the server logged you in and when the server answered back with the HtmlPage your internet connection, ISP, or just the Internet broke down for a millisecond and your packet got lost. So here you failed even as a human

Those are just a few scenarios but there are many more. Now think of this: even a human head con not be 100% sure of the result of a log in trial... how can we expect a headless browser to be? :)