I had a similar code as in this question. Extending the code, in accepted answer, worked for me too.
Before this time, I used this type of codes and never meet any exception.
Now, my questions are:
- Why should I use the USER AGENT?
- Why it became necessary to use in my program?
Is it necessary to use in every program?
- If yes, how my program ran so good before?
- If no, why I have to handle this now?
- How the string
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
is generated? (I want to know the exact formatting).
Note that:
The program where I fixed it, I use it daily, but it never had any issue before.
Many web administrators want to prevent bots from accessing their sites because what they do is scrape data at regular intervals but the owner can't earn any ad revenue from these hits. So no obvious benefits but they keep using resources. For this reason they block anything that doesn't look like a browser used by a human. As you have seen, it is completely trivial to make your program pretend to be another. So this technique is not effective against anyone who knows what they are doing. In general though, it is considered polite to not pretend something you're not (internet etiquette).
User agent strings can technically be anything you want, but most applications follow a common pattern such as $product/$version
. You can see some examples here.
For more information, check out the wikipedia article on the matter.
So quick summary:
- You should use it because the servers expect all clients to have one
- The library probably has a default user agent (eg.
JavaLib/1.1
), but you had to set your own for the reasons stated above.
- Not necessary for all programs, but pretending to be a browser is useful for bots. Just remember that it is considered impolite. For example wget works 99% of the time for me without modification, but some sites block its user agent.
- The string is not generated, it's just copied from an existing browser, IE 6.0 in this case. And the server you're connecting to seems to accept it.