Why do search engine crawlers not run javascript?

2019-02-16 10:12发布

问题:

I have been working with some advanced javascript applications using a lot of ajax requests to render my page. To make the applications crawlable (by google), I have to follow https://developers.google.com/webmasters/ajax-crawling/?hl=fr . This tells us to do something like: redesigning our links, creating html snapshots,... to make the site searchable.

I wonder why crawlers don't run javascript to get the rendered page and index on it. Is there a reason behind this? Or it's a missing feature of search engines that may come in the future?

回答1:

Even though GoogleBot actually does handle sites written in js. The big problem with ajax sites is that even if GoogleBot can execute js and handle ajax requests.

It's not exactly possible for the web crawler to know when the page finished loading. For that reason, a web crawler could load a page and index the page before it started doing ajax requests. Let say a script will get executed on page scroll. It's very likely that the google bot will not trigger every possible events.

The other problem is navigation

Since navigation can be done without page reloading, one url can map to multiple "view result". For that reason, google ask developpers to keep a copy of pages using static pages to support those pages that would be inaccessible otherwise. They are going to get indexed.

If your site can have each page accessible through a fully qualified url. Then you shouldn't have problem indexing your site.

That said, scripts are going to get run. But it's not certain that the crawler will index the page after it finished handling all scripts.

Here's a link:

GoogleBot smarter: It was written in 2010 and we can expect that the webcrawlers got much smarter since then.



回答2:

Reading pure HTML is way faster than waiting/calling for javascript functions etc and then making notice, how the page is set up. I think that's the main reason.

Another might be that the whole crawling thing is automated - so, again, reading static page is a lot easier and makes a lot more sense. As with javascript the content of the page might change every second etc, making the crawler "confused"

Considered, that this has not yet been implemented in search engines, I think that it won't come in the near future.



回答3:

It's harder to read pages with scripts for crawlers, because it is all about dynamically changing content. And crawlers cares not only about first site visit, they rechecking indexed pages every week-two in a fast mode, simply comparing in a way "find 10 differences" for content and link changes. Rechecking pages with scripts will be too painful and costly for crawlers in a world web.