“Single-page” JS websites and SEO

2020-01-27 09:27发布

There are a lot of cool tools for making powerful "single-page" JavaScript websites nowadays. In my opinion, this is done right by letting the server act as an API (and nothing more) and letting the client handle all of the HTML generation stuff. The problem with this "pattern" is the lack of search engine support. I can think of two solutions:

  1. When the user enters the website, let the server render the page exactly as the client would upon navigation. So if I go to http://example.com/my_path directly the server would render the same thing as the client would if I go to /my_path through pushState.
  2. Let the server provide a special website only for the search engine bots. If a normal user visits http://example.com/my_path the server should give him a JavaScript heavy version of the website. But if the Google bot visits, the server should give it some minimal HTML with the content I want Google to index.

The first solution is discussed further here. I have been working on a website doing this and it's not a very nice experience. It's not DRY and in my case I had to use two different template engines for the client and the server.

I think I have seen the second solution for some good ol' Flash websites. I like this approach much more than the first one and with the right tool on the server it could be done quite painlessly.

So what I'm really wondering is the following:

  • Can you think of any better solution?
  • What are the disadvantages with the second solution? If Google in some way finds out that I'm not serving the exact same content for the Google bot as a regular user, would I then be punished in the search results?

8条回答
地球回转人心会变
2楼-- · 2020-01-27 09:38

So, it seem that the main concern is being DRY

  • If you're using pushState have your server send the same exact code for all urls (that don't contain a file extension to serve images, etc.) "/mydir/myfile", "/myotherdir/myotherfile" or root "/" -- all requests receive the same exact code. You need to have some kind url rewrite engine. You can also serve a tiny bit of html and the rest can come from your CDN (using require.js to manage dependencies -- see https://stackoverflow.com/a/13813102/1595913).
  • (test the link's validity by converting the link to your url scheme and testing against existence of content by querying a static or a dynamic source. if it's not valid send a 404 response.)
  • When the request is not from a google bot, you just process normally.
  • If the request is from a google bot, you use phantom.js -- headless webkit browser ("A headless browser is simply a full-featured web browser with no visual interface.") to render html and javascript on the server and send the google bot the resulting html. As the bot parses the html it can hit your other "pushState" links /somepage on the server <a href="/someotherpage">mylink</a>, the server rewrites url to your application file, loads it in phantom.js and the resulting html is sent to the bot, and so on...
  • For your html I'm assuming you're using normal links with some kind of hijacking (e.g. using with backbone.js https://stackoverflow.com/a/9331734/1595913)
  • To avoid confusion with any links separate your api code that serves json into a separate subdomain, e.g. api.mysite.com
  • To improve performance you can pre-process your site pages for search engines ahead of time during off hours by creating static versions of the pages using the same mechanism with phantom.js and consequently serve the static pages to google bots. Preprocessing can be done with some simple app that can parse <a> tags. In this case handling 404 is easier since you can simply check for the existence of the static file with a name that contains url path.
  • If you use #! hash bang syntax for your site links a similar scenario applies, except that the rewrite url server engine would look out for _escaped_fragment_ in the url and would format the url to your url scheme.
  • There are a couple of integrations of node.js with phantom.js on github and you can use node.js as the web server to produce html output.

Here are a couple of examples using phantom.js for seo:

http://backbonetutorials.com/seo-for-single-page-apps/

http://thedigitalself.com/blog/seo-and-javascript-with-phantomjs-server-side-rendering

查看更多
仙女界的扛把子
3楼-- · 2020-01-27 09:40

Interesting. I have been searching around for viable solutions but it seems to be quite problematic.

I was actually leaning more towards your 2nd approach:

Let the server provide a special website only for the search engine bots. If a normal user visits http://example.com/my_path the server should give him a JavaScript heavy version of the website. But if the Google bot visits, the server should give it some minimal HTML with the content I want Google to index.

Here's my take on solving the problem. Although it is not confirmed to work, it might provide some insight or idea's for other developers.

Assume you're using a JS framework that supports "push state" functionality, and your backend framework is Ruby on Rails. You have a simple blog site and you would like search engines to index all your article index and show pages.

Let's say you have your routes set up like this:

resources :articles
match "*path", "main#index"

Ensure that every server-side controller renders the same template that your client-side framework requires to run (html/css/javascript/etc). If none of the controllers are matched in the request (in this example we only have a RESTful set of actions for the ArticlesController), then just match anything else and just render the template and let the client-side framework handle the routing. The only difference between hitting a controller and hitting the wildcard matcher would be the ability to render content based on the URL that was requested to JavaScript-disabled devices.

From what I understand it is a bad idea to render content that isn't visible to browsers. So when Google indexes it, people go through Google to visit a given page and there isn't any content, then you're probably going to be penalised. What comes to mind is that you render content in a div node that you display: none in CSS.

However, I'm pretty sure it doesn't matter if you simply do this:

<div id="no-js">
  <h1><%= @article.title %></h1>
  <p><%= @article.description %></p>
  <p><%= @article.content %></p>
</div>

And then using JavaScript, which doesn't get run when a JavaScript-disabled device opens the page:

$("#no-js").remove() # jQuery

This way, for Google, and for anyone with JavaScript-disabled devices, they would see the raw/static content. So the content is physically there and is visible to anyone with JavaScript-disabled devices.

But, when a user visits the same page and actually has JavaScript enabled, the #no-js node will be removed so it doesn't clutter up your application. Then your client-side framework will handle the request through it's router and display what a user should see when JavaScript is enabled.

I think this might be a valid and fairly easy technique to use. Although that might depend on the complexity of your website/application.

Though, please correct me if it isn't. Just thought I'd share my thoughts.

查看更多
Melony?
4楼-- · 2020-01-27 09:44

I think you need this: http://code.google.com/web/ajaxcrawling/

You can also install a special backend that "renders" your page by running javascript on the server, and then serves that to google.

Combine both things and you have a solution without programming things twice. (As long as your app is fully controllable via anchor fragments.)

查看更多
forever°为你锁心
5楼-- · 2020-01-27 09:46

Use NodeJS on the serverside, browserify your clientside code and route each http-request's(except for static http resources) uri through a serverside client to provide the first 'bootsnap'(a snapshot of the page it's state). Use something like jsdom to handle jquery dom-ops on the server. After the bootsnap returned, setup the websocket connection. Probably best to differentiate between a websocket client and a serverside client by making some kind of a wrapper connection on the clientside(serverside client can directly communicate with the server). I've been working on something like this: https://github.com/jvanveen/rnet/

查看更多
手持菜刀,她持情操
6楼-- · 2020-01-27 09:48

To take a slightly different angle, your second solution would be the correct one in terms of accessibility...you would be providing alternative content to users who cannot use javascript (those with screen readers, etc.).

This would automatically add the benefits of SEO and, in my opinion, would not be seen as a 'naughty' technique by Google.

查看更多
Anthone
7楼-- · 2020-01-27 09:49

If you're using Rails, try poirot. It's a gem that makes it dead simple to reuse mustache or handlebars templates client and server side.

Create a file in your views like _some_thingy.html.mustache.

Render server side:

<%= render :partial => 'some_thingy', object: my_model %>

Put the template your head for client side use:

<%= template_include_tag 'some_thingy' %>

Rendre client side:

html = poirot.someThingy(my_model)
查看更多
登录 后发表回答