WebClient forbids opening wikipedia page?

2019-05-13 19:08发布

Here's the code I'm trying to run:

var wc = new WebClient();
var stream = wc.OpenRead(
             "http://en.wikipedia.org/wiki/List_of_communities_in_New_Brunswick");

But I keep getting a 403 forbidden error. Don't understand why. It worked fine for other pages. I can open the page fine in my browser. How can I fix this?

1条回答
We Are One
2楼-- · 2019-05-13 19:39

I wouldn't normally use OpenRead(), try DownloadData() or DownloadString() instead.

Also it might be that wikipedia is deliberately blocking your request because you have not provided a user agent string:

WebClient client = new WebClient();
client.Headers.Add("user-agent", 
    "Mozilla/5.0 (Windows; Windows NT 5.1; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4");

I use WebClient quite often, and learned quite quickly that websites can and will block your request if you don't provide a user agent string that matches a known web browser. Also, if you make up your own user agent string (eg "my super cool web scraper") you will also be blocked.

[Edit]

I changed my example user agent string to that of a modern version of Firefox. The original example I gave was the user agent string for IE6 which is not a good idea. Why? Some websites may perform filtering based on IE6 and send anyone with that browser a message or to a different page that says "Please update your browser" - this means you will not get the content you wanted to get.

查看更多
登录 后发表回答