how do web crawlers handle javascript

Today a lot of content on Internet is generated using JavaScript (specifically by background AJAX calls). I was wondering how web crawlers like Google handle them. Are they aware of JavaScript? Do they have a built-in JavaScript engine? Or do they simple ignore all JavaScript generated content in the page (I guess quite unlikely). Do people use specific techniques for getting their content indexed which would otherwise be available through background AJAX requests to a normal Internet user?

标签： javascript web-crawler

6条回答

我想做一个坏孩纸

2楼-- · 2020-01-28 03:34

Crawlers doesn't parse Javascript to find out what it does.

They may be built to recognise some classic snippets like onchange="window.location.href=this.options[this.selectedIndex].value;" or onclick="window.location.href='blah.html';", but they don't bother with things like content fetched using AJAX. At least not yet, and content fetched like that will always be secondary anyway.

So, Javascript should be used only for additional functionality. The main content taht you want the crawlers to find should still be plain text in the page and regular links that the crawlers easily can follow.

0人赞添加讨论(0) 举报

smile是对你的礼貌

3楼-- · 2020-01-28 03:36

Most of them don't handle Javascript in any way. (At least, all the major search engines' crawlers don't.)

This is why it's still important to have your site gracefully handle navigation without Javascript.

0人赞添加讨论(0) 举报

姐就是有狂的资本

4楼-- · 2020-01-28 03:41

Precisely what Ben S said. And anyone accessing your site with Lynx won't execute JavaScript either. If your site is intended for general public use, it should generally be usable without JavaScript.

Also, related: if there are pages that you would want a search engine to find, and which would normally arise only from JavaScript, you might consider generating static versions of them, reachable by a crawlable site map, where these static pages use JavaScript to load the current version when hit by a JavaScript-enabled browser (in case a human with a browser follows your site map). The search engine will see the static form of the page, and can index it.

0人赞添加讨论(0) 举报

家丑人穷心不美

5楼-- · 2020-01-28 03:45

I have tested this by putting pages on my site only reachable by Javascript and then observing their presence in search indexes.

Pages on my site which were reachable only by Javascript were subsequently indexed by Google.

The content was reached through Javascript with a 'classic' technique or constructing a URL and setting the window.location accordingly.

0人赞添加讨论(0) 举报

看我几分像从前

6楼-- · 2020-01-28 03:45

crawlers can handle javascript or ajax calls if they are using some kind of frameworks like 'htmlunit' or 'selenium'

0人赞添加讨论(0) 举报

疯言疯语

7楼-- · 2020-01-28 03:48

JavaScript is handled by both Bing and Google crawlers. Yahoo uses the Bing crawler data, so it should be handled as well. I didn't look into other search engines, so if you care about them, you should look them up.

Bing published guidance in March 2014 as to how to create JavaScript-based websites that work with their crawler (mostly related to pushState) that are good practices in general:

Avoid creating broken links with pushState
Avoid creating two different links that link to the same content with pushState
Avoid cloaking. (Here's an article Bing published about their cloaking detection in 2007)
Support browsers (and crawlers) that can't handle pushState.

Google later published guidance in May 2014 as to how to create JavaScript-based websites that work with their crawler, and their recommendations are also recommended:

Don't block the JavaScript (and CSS) in the robots.txt file.
Make sure you can handle the load of the crawlers.
It's a good idea to support browsers and crawlers that can't handle (or users and organizations that won't allow) JavaScript
Tricky JavaScript that relies on arcane or specific features of the language might not work with the crawlers.
If your JavaScript removes content from the page, it might not get indexed. around.

0人赞添加讨论(0) 举报

how do web crawlers handle javascript

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间