Using a pushState
enabled page, normally you redirect SEO bots using the escaped_fragment
convention. You can read more about that here.
The convention assumes that you will be using a (#!
) hashbang prefix before all of your URI's on a single page application. SEO bots will escape these fragments by replacing the hashbang with it's own recognizable convention escaped_fragment
when making a page request.
//Your page
http://example.com/#!home
//Requested by bots as
http://example.com/?_escaped_fragment=home
This allows the site administrator to detect bots, and redirect them to a cached prerendered page.
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
RewriteRule ^(.*)$ https://s3.amazonaws.com/mybucket/$1 [P,QSA,L]
The problem is that the hashbang is getting phased out quickly with the widely adapted pushState
support. It's also really ugly and isn't very intuitive to a user.
So what if we used HTML5 mode where pushState guides the entire user application?
//Your index is using pushState
http://example.com/
//Your category is using pushState (not a folder)
http://example.com/category
//Your category/subcategory is using pushState
http://example.com/category/subcategory
Can rewrite rules guide bots to your cached version using this newer convention? Related but only accounts for index edge case. Google also has an article that suggests using an opt-in method for this single edge case using <meta name="fragment" content="!">
in the <head>
of the page. Again, this is for a single edge case. Here we are talking about handling every page as an opt-in senario.
http://example.com/?escaped_fragment=
http://example.com/category?escaped_fragment=
http://example.com/category/subcategory?escaped_fragment=
I'm thinking that the escaped_fragment
could still be used as an identifier for SEO bots, and that I could extract everything inbetween the the domain and this identifier to append to my bucket location like:
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=$
# (high level example I have no idea how to do this)
# extract "category/subcategory" == $2
# from http://example.com/category/subcategory?escaped_fragment=
RewriteRule ^(.*)$ https://s3.amazonaws.com/mybucket/$2 [P,QSA,L]
What's the best way to handle this?
Had a similar problem on a single page web app.
The only solution I found to this problem was effectively creating static versions of pages for the purpose of making something navigable by the Google (and other) bots.
You could do this yourself, but there are also services that do exactly this and create your static cache for you (and serve up the snapshots to the bots over their CDN).
I ended up using SEO4Ajax, although other similar services are available!
I was having the exact same problem. For now, I've modified .htaccess like so:
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
RewriteRule ^$ /snapshots/index.html? [L,NC]
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
RewriteRule ^(.*)$ /snapshots/$1.html? [L,NC]
Not sure if there's a better solution, but it's working for me so far. Just be sure to have the directory structure for your snapshots match the URL structure.
I'm using Symfony2, and although I'm told by other devs that Googlebot and Bingbot execute Javascript well enough to generate their own HTML snippets, I don't feel confident. I also feel that serving static resources is a better alternative for ppl running with JS turned off (however unlikely that is) and so am interested in serving HTML snippets anyway, so long as it's not a hassle. Below is a method I'm thinking of using but haven't tried:
Here are other SO questions that are similar (one is mine).
Angularjs vs SEO vs pushState
HTML snippets for AngularJS app that uses pushState?
Here's a solution I posted in that question and am considering for myself in case I want to send HTML snippets to bots. This would be a solution for a Symfony2 backend:
- Use prerender or another service to generate static snippets of all your pages. Store them somewhere accessible by your router.
In your Symfony2 routing file, create a route that matches your SPA. I have a test SPA running at localhost.com/ng-test/, so my route would look like this:
# Adding a trailing / to this route breaks it. Not sure why.
# This is also not formatting correctly in StackOverflow. This is yaml.
NgTestReroute:
----path: /ng-test/{one}/{two}/{three}/{four}
----defaults:
--------_controller: DriverSideSiteBundle:NgTest:ngTestReroute
--------'one': null
--------'two': null
--------'three': null
--------'four': null
----methods: [GET]
In your Symfony2 controller, check user-agent to see if it's googlebot or bingbot. You should be able to do this with the code below, and then use this list to target the bots you're interested in (http://www.searchenginedictionary.com/spider-names.shtml)...
if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
{
// what to do
}
If your controller finds a match to a bot, send it the HTML snippet. Otherwise, as in the case with my AngularJS app, just send the user to the index page and Angular will correctly do the rest.
Also, if your question been answered please select one so I and others can tell what worked for you.
I'm using PhantomJS to generate static snapshots of my pages. My directory structure is only one level deep (root
and /projects
), so I have two .htaccess files, in which I redirect to a PHP file (index-bots.php
) that starts a PhantomJS process pointed at my SPA index.html
and prints out the rendered static pages.
The .htaccess files look like this:
/.htaccess
# redirect search engine bots to index-bots.php
# in order to serve rendered HTML via phantomjs
RewriteCond %{HTTP_USER_AGENT} (bot|crawl|slurp|spider) [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !^/index-bots\.php [NC]
RewriteRule ^(.*)$ index-bots.php?url=%{REQUEST_URI} [L,QSA]
/projects/.htaccess
# redirect search engine bots to index-bots.php
# in order to serve rendered HTML via phantomjs
RewriteCond %{HTTP_USER_AGENT} (bot|crawl|slurp|spider) [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ ../index-bots.php?url=%{REQUEST_URI} [L,QSA]
A couple of notes:
- The
!-f
RewriteCond
is critical! Since .htaccess will apply RewriteRule
s to all requests, assets on your page will each be rewritten to the PHP file, spinning up multiple instances of PhantomJS and bringing your server to its knees.
- It's also important to exempt
index-bots.php
from the rewrites to avoid an endless loop.
- I strip out the JS in my PhantomJS runner script, to ensure the JS doesn't do anything when bots that support it come across the 'static' pages.
- I'm no .htaccess wizard, so there's probably a better way to do this. I'd love to hear it if so.