I need a library (hopefully in C#!) which works as a web crawler to access HTTP files and FTP files. In principle, I'm happy with reading HTML, I want to extend it to PDF, WORD, etc..
I'm happy with a starter's open source software or at least any directions for documentation.
I have developed the Crawler Engine of the Crawler-Lib Framework. It is a workflow enabled crawler which can easily extended to do any kind of requests or even processing you want to have.
Here is the engine: http://www.crawler-lib.net/crawler-lib-engine
Here are some Youtube Videos, showing how the Crawler-Lib engine works: http://www.youtube.com/user/CrawlerLib
I know this project is not open source, but there is a free version.
Check NCrawler project