Headless browser for C# (.NET)? [closed]

2019-01-01 13:42发布

问题:

I am (was) a Python developer who is building a GUI web scraping application. Recently I\'ve decided to migrate to .NET framework and write the same application in C# (this decision wasn\'t mine).

In Python, I\'ve used the Mechanize library. However, I can\'t seem to find anything similar in .NET. What I need is a browser that will run in a headless mode, which has the ability to fill out forms, submit them, etc. JavaScript parser is not a must, but it would be quite useful.

回答1:

There are some options:

  • WebKit.Net (free)

  • Awesomium
    It is based on Chrome/WebKit and works like a charm. There is a free license available but also a commercial one and if need be you can buy the source code :-)

  • HTML Agility Pack (free)
    This helps with extracting information from HTML etc. and might be useful in your case (possibly in combination with HttpWebRequest)



回答2:

More solutions:

  • PhantomJS - full featured headless web browser. Often used in pair with Selenium which allows you to access the browser from .NET application.
  • Optimus (nuget package)- lightweight headless web browser. It\'s in beta but it is sufficient for some cases.

I used to use both for web testing. But they are also suitable for web scraping.



回答3:

You may be after TrifleJS (currently in beta), or something similar using the .NET WebBrowser class which communicates with IE via a windowless ActiveX/COM API.

You\'ll essentially be running a fully fledged browser (not a http request wrapper) using Internet Explorer\'s Trident engine, if you are not interested in the JavaScript API (a port of phantomjs) you may still be able to use some of the C# codebase to get around key concepts (custom headers, cookies, script execution, screenshot rendering etc).

Note that this can also emulate different versions of IE depending on what you have installed.

\"enter