I'm using Lesti FPC on a Magento site with 10 customer groups and a lot of categories/products.
I've created a shell script which reads the sitemap.xml and wget's each url overnight to build the cache of the site. This works great for guests but when a customer group user logs in, they are building the cache themselves (if they are the first of the day).
Does anyone know how to make a shell script that could simulate logging itself in and then trawl the site? Is it even possible for a shell script to hold its own session/cookie information to remain logged in? and if not, any other ideas?
Many thanks
So thanks to some Googling and lots of trial and error, I've found a solution which I thought I'd share.
You can use WGET to hold session/cookie information by saving and loading the cookies. Magento has it's own restriction as you need to establish a session cookie before you login or the script will be redirected to the 'enable-cookies' page rather than login, so here is the script;
#!/bin/bash
# Establish a session and nab the cookie
wget --save-cookies cookies.txt \
http://www.yourmagentourl.co.uk/
# Post your user credentials to login and update the cookie
wget --save-cookies cookies.txt \
--load-cookies cookies.txt \
--post-data 'login[username]=USERNAME&login[password]=PASSWORD' \
http://www.yourmagentourl.co.uk/customer/account/loginPost/
# Load the cookie for each page you want to WGET to maintain the session
wget --load-cookies cookies.txt \
-p http://www.yourmagentourl.co.uk/some-category.html
That's the basis, so very easy to now load all the urls from a sitemap.xml and build the logged in versions of the cache.
Props to Grafista for a steer on saving cookie info.
Happy caching!
EDIT - AS PER REQUEST TO SHOW THE ORIGINAL CODE
Here's the code to cycle through the sitemap and load each page to build the cache for guests. Save this as cachewarm.sh and create a cronjob to run it each night (dont forget to delete or expire your pagecache first)
#!/bin/bash
# Pixie Media https://www.pixiemedia.co.uk
# Use the sitemap and reload the Page Cache by accessing each page once
#
wget --quiet http://YOUR-URL.co.uk/sitemap.xml --output-document - | egrep -o "http://YOUR-URL.co.uk/[^<]+" | wget -q --delete-after -i -