Apache Nutch 2.1 different batch id (null)

2019-03-19 19:20发布

I crawl few sites with Apache Nutch 2.1.

While crawling I see the following message on lot of pages:
ex. Skipping http://www.domainname.com/news/subcategory/111111/index.html; different batch id (null).

What causes this error ?
How can I resolve this problem, because the pages with different batch id (null) are not stored in database.

The site that I crawled is based on drupal, but i have tried on many others non drupal sites.

标签： apache nutch web-crawler

1条回答

做个烂人

2楼-- · 2019-03-19 20:16

I think, the message is not problem. batch_id not assigned to all of url. So, if batch_id is null , skip url. Generate url when batch_id assined for url.

0人赞添加讨论(0) 举报

Apache Nutch 2.1 different batch id (null)

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间