How to detect the main article tag like Evernote c

2019-03-10 08:00发布

When I tried with Evernote clipper extension, I see a very useful feature. When I clicked at "article", It gives me a really correct main content of page. Let see the result when I used Evernote Clipper with page https://developer.chrome.com/extensions/api_index extract article in a page

I looked at the main article that evernote field out, in several pages, the article is infact extracted from the first article tag. However evernote clipper still work well with pages doesn't use that kind of tag.

I wonder how Evernote clipper can do that ? Is there any js library support to detect the main tag containing the main content of pages. Could you give me some advises to do it.

Thank you in advance!

标签： javascript html5 evernote

1条回答

Bombasti

2楼-- · 2019-03-10 08:10

From my knowledge, there is no universal js lib to do that. The Evernote clipper uses its own method to extract the "interesting" content from a web page. You can access the code of the Evernote clipper to try to understand the process.

On my mac, the path to the chrome extension is :

~/Library/Application Support/Google/Chrome/Default/Extensions/pioclpoplcdbaefihamjohnefbikjilc/6.2_0/

Here's another tool that works pretty much the same : https://www.readability.com/

You can also check this thread : What algorithm does Readability use for extracting text from URLs?

or search on google for terms like 'content extraction js lib' for example. (Found this one : https://github.com/hatena/extract-content-javascript)

Hope this helps

0人赞添加讨论(0) 举报

How to detect the main article tag like Evernote c

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间