I use htmlagility to get webpage data but I tried everything with page using www.cloudflare.com protection for ddos. The redirect page is not possible to handle in htmlagility because they don't redirect with meta nor js I guess, they check if you have already being checked with a cookie that I failed to simulate with c#. When I get the page, the html code is from the landing cloadflare page.
相关问题
- Sorting 3 numbers without branching [closed]
- Graphics.DrawImage() - Throws out of memory except
- Why am I getting UnauthorizedAccessException on th
- 求获取指定qq 资料的方法
- How to know full paths to DLL's from .csproj f
Use
WebClient
to get html of the page,I wrote following class which handles cookies too,
Just pass
CookieContainer
instance in constructor.USAGE:
I also encountered this problem some time ago. The real solution would be solve the challenge the cloudflare websites gives you (you need to compute a correct answer using javascript, send it back, and then you receive a cookie / your token with which you can continue to view the website). So all you would get normally is a page like
In the end, I just called a python-script with a shell-execute. I used the modules provided within this github fork. This could serve as a starting point to implement the circumvention of the cloudflare anti-dDoS page in C# aswell.
FYI, the python script I wrote for my personal usage just wrote the cookie in a file. I read that later again using C# and store it in a
CookieJar
to continue browsing the page within C#.EDIT: To repeat this, this has only LITTLE to do with cookies! Cloudflare forces you to solve a REAL challenge using javascript commands. It's not as easy as accepting a cookie and using it later on. Look at https://github.com/Anorov/cloudflare-scrape/blob/master/cfscrape/init.py and the ~40 lines of javascript emulation to solve the challenge.
Edit2: Instead of writing something to circumvent the protection, I've also seen people using a fully-fledged browser-object (this is not a headless browser) to go to the website and subscribe to certain events when the page is loaded. Use the
WebBrowser
class to create an infinetly small browser window and subscribe to the appropiate events.Edit3: Alright, I actually implemented the C# way to do this. This uses the JavaScript Engine Jint for .NET, available via https://www.nuget.org/packages/Jint
The cookie-handling code is ugly because sometimes the
HttpResponse
class won't pick up the cookies, although the header contains aSet-Cookie
section.The function will return a webclient with the solved challenges and cookies inside. You can use it as follows:
A "simple" working method to bypass Cloudflare if you don't use libraries (that sometimes does not work).
Make sure the UserAgent for both WebBrowser and WebClient are identical. Cloudflare will give you a 503 if a mismatch there on the WebClient aftwerwards.
You will need to search here on stack on how to get cookies from WebBrowser and how to modify WebClient so you can set its cookiecontainer + modify the UserAgent on 1 or both so they are identical.
Since the cookies from Cloudflare seems to never expire, you can then serialize the cookies to somewhere temporary and load it each time you run your app, maybe a verification and refetch if failing.
Been doing this for a while and it works quite well. Could not get the C# libs to work for a specific Cloudflare site while they worked on others. No clue to why yet.
This also works behind the scenes on an IIS server, but you will have to set up "frowned upon" settings. That is, run the app pool as SYSTEM or ADMIN and set it to Classic mode.