i am setting up a drupal site and would like to make it scalable on openshift (bronze plan, Small.highcpu). two questions in this respect:
a) background tasks?
would be great if someone can explain point 3 in more detail:
from https://github.com/openshift/drupal-quickstart/blob/master/README.md:
Because none of your application code is checked into Git and lives entirely in your data directory, if this application is set to scalable the new gears will have empty data directories and won't serve requests properly. If you'd like to make the app scalable, you'll need to:
- Check the contents of php/* into your Git repository (in the php/* dir)
- Only install new modules via Drush from the head gear, and then commit those changes to the Git repo
- Use a background task to copy file contents from gear to gear
All of the scripts used to deploy and configure Drupal are located in the build and deploy hooks.
b) additional filesystem:
here the poster says that a more persistent filesystem (e.g. S3) is needed to scale: https://groups.drupal.org/node/297403. is that really necessary for a site with around 30-50 pages per second in peak time? what are the benefits of adding S3?
In a scalable OpenShift app, you want all gears to behave identically. In the case of Drupal, each gear needs to have the core Drupal files, modules, and any additional data to be served by the gear (images, etc.).
The guide recommends you to check core PHP files and extra modules (after using Drush) in to git so each gear has them.
Here background tasks and S3 are two approaches to the same problem—to make sure each gear serves the same data.
a. Background tasks
One way to realize "a background task to copy file contents from gear to gear" is to use OpenShift cron on the head gear which
scp
's the data files to the remaining gears at regular intervals.b. Additional filesystem
The other way to have gears serving the same content is to have all gears point to external storage—S3. So if you use S3, you don't need background jobs to copy data between gears. If the bottleneck in serving 30-50 pages is I/O in reading data, then S3 may definitely help in offloading that to its servers.