C# web and ftp crawler library

2019-05-10 22:17发布

I need a library (hopefully in C#!) which works as a web crawler to access HTTP files and FTP files. In principle, I'm happy with reading HTML, I want to extend it to PDF, WORD, etc..

I'm happy with a starter's open source software or at least any directions for documentation.

标签： c# web-crawler

2条回答

太酷不给撩

2楼-- · 2019-05-10 22:29

I have developed the Crawler Engine of the Crawler-Lib Framework. It is a workflow enabled crawler which can easily extended to do any kind of requests or even processing you want to have.

Here is the engine: http://www.crawler-lib.net/crawler-lib-engine

Here are some Youtube Videos, showing how the Crawler-Lib engine works: http://www.youtube.com/user/CrawlerLib

I know this project is not open source, but there is a free version.

0人赞添加讨论(0) 举报

干净又极端

3楼-- · 2019-05-10 22:41

Check NCrawler project

Simple and very efficient multithreaded web crawler with pipeline based processing written in C#. Contains HTML, Text, PDF, and IFilter document processors and language detection(Google). Easy to add pipeline steps to extract, use and alter information.

0人赞添加讨论(0) 举报

C# web and ftp crawler library

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间