I am (was) a Python developer who is building a GUI web scraping application. Recently I\'ve decided to migrate to .NET framework and write the same application in C# (this decision wasn\'t mine).
In Python, I\'ve used the Mechanize library. However, I can\'t seem to find anything similar in .NET. What I need is a browser that will run in a headless mode, which has the ability to fill out forms, submit them, etc. JavaScript parser is not a must, but it would be quite useful.
More solutions:
- PhantomJS - full featured headless web
browser. Often used in pair with Selenium which allows you to
access the browser from .NET application.
- Optimus (nuget package)- lightweight headless web browser. It\'s in beta but it is sufficient for some cases.
I used to use both for web testing. But they are also suitable for web scraping.
You may be after TrifleJS (currently in beta), or something similar using the .NET WebBrowser class which communicates with IE via a windowless ActiveX/COM API.
You\'ll essentially be running a fully fledged browser (not a http request wrapper) using Internet Explorer\'s Trident engine, if you are not interested in the JavaScript API (a port of phantomjs) you may still be able to use some of the C# codebase to get around key concepts (custom headers, cookies, script execution, screenshot rendering etc).
Note that this can also emulate different versions of IE depending on what you have installed.