Single Page App + Amazon S3 + Amazon CloudFront +

2019-03-25 04:20发布

问题:

  1. I have single page app built with Backbone.js.
  2. I host app (app consists of static files only) on Amazon S3.
  3. I use CloudFront as a Bucket CDN.
  4. App is accessed by https://myapp.com -> https://abcdefgh34545.cloudfront.com -> https://myBucket.s3-eu-west-1.amazonaws.com/index.html

How I can use Prerender.io service with this stack? I have to somehow detect that WebSpider/WebRobot is accessing the page and redirect it to prerender.io...

回答1:

It's hard to use Prerender.io with a static Amazon S3 site.

You could stand up an nginx/apache server in front of s3: https://myapp.com -> https://mynginx-server.com -> https://myBucket.s3-eu-west-1.amazonaws.com/index.html

This solution is less ideal because you lose the closest-location benefit of cloudfront.

This is a good article about a custom solution: http://www.dave.cx/post/23/prerendering-angular-s3/

David was able to generate the static HTML and save them in S3, then use CloudFlare to detect _escaped_fragment_ in the URL and redirect it to the static HTML on S3.



回答2:

I managed to do this by not using Prerender at all but creating AWS Lambda function that:

  • Requests the origin page from CloudFront (it actually is always the same index.html)
  • Map the lambda function via API Gateway catch-all proxy
  • Study the path and figure out what resource page should be about (in my case it is simply /user/{name}, so I only have to do one use-case
  • Make REST API request to get the dynamic data for the user
  • Regex replace the existing meta-fields with the dynamic ones
  • Return the new index-file with new metas

Configure new origin (new lambda function) and behaviour (map /user/* requests to this new origin). Be sure to use "HTTPS only" Origin Protocol Policy for the origin, as API Gateway is only HTTPS, redirect here will cause the hostname to change.

(If you by accident used the redirect, then you will need to Invalidate "/*" as due to some CloudFront bug the configuration change will not help ; I spent multiple hours debugging this last night)



回答3:

You can use Lambda@Edge to configure CloudFront to send crawler HTTP requests directly to prerender.io.

The basic idea is to have a viewer-request handler which sets a custom HTTP header for requests which should be sent to prerender.io. For example this Lambda@Edge code:

        'use strict';
        /* change the version number below whenever this code is modified */
        exports.handler = (event, context, callback) => {
            const request = event.Records[0].cf.request;
            const headers = request.headers;
            const user_agent = headers['user-agent'];
            const host = headers['host'];
            if (user_agent && host) {
              if (/baiduspider|Facebot|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator/.test(user_agent[0].value)) {
                headers['x-prerender-token'] = [{ key: 'X-Prerender-Token', value: '${PrerenderToken}'}];
                headers['x-prerender-host'] = [{ key: 'X-Prerender-Host', value: host[0].value}];
              }
            }
            callback(null, request);
        };

The cloudfront distribution must be configured to pass through the X-Prerender-Host and X-Prerender-Token headers.

Finally a origin-request handler changes the origin server if X-Prerender-Token is present:

      'use strict';
      /* change the version number below whenever this code is modified */
      exports.handler = (event, context, callback) => {
           const request = event.Records[0].cf.request;
           if (request.headers['x-prerender-token'] && request.headers['x-prerender-host']) {
             request.origin = {
                 custom: {
                     domainName: 'service.prerender.io',
                     port: 443,
                     protocol: 'https',
                     readTimeout: 20,
                     keepaliveTimeout: 5,
                     customHeaders: {},
                     sslProtocols: ['TLSv1', 'TLSv1.1'],
                     path: '/https%3A%2F%2F' + request.headers['x-prerender-host'][0].value
                 }
             };
          }
          callback(null, request);
      };

There's a fully worked example at: https://github.com/jinty/prerender-cloudfront



回答4:

Have a look at the full solution over here, creating snapshots of your website with grunt and serving them to search engines with nothing more than amazon S3:

AngularJS SEO for static webpages (S3 CDN)



回答5:

As mentioned, it seems the easiest way to do this is to configure CloudFront/Lambda@Edge to proxy requests to a prerender service. I've found a repo that seems to take care of quite a bit of the aforementioned work for you: https://github.com/sanfrancesco/prerendercloud-lambda-edge

This uses Lambda@Edge to prerender your app via a make deploy command. Unfortunately, this uses prerender.cloud, NOT prerender.io. Hopefully this isn't a blocker.