I'm running a Virtual Private Server where, every day at midnight, all files are backed up automatically by the VPS provider.
So I need to export the Solr index to a file, so that if something goes wrong someday, I'll be able to import it back to Solr with ease.
How can I do this?
The Solr database IS a (or a couple of) file(s).
There is a folder that looks something like this:
root@vs210044:/home/solr/apache-solr-1.4.0/example/solr/data/index# ls
segments.gen _xzy.tii _y26.tii _y4f.tii _y6o.tii _y8n.tii _y9i.tis _y9k.fdt _y9l.fdx _y9m.fnm
segments_uud _xzy.tis _y26.tis _y4f.tis _y6o.tis _y8n.tis _y9j.fdt _y9k.fdx _y9l.fnm _y9m.frq
_xzy_2n.del _y26_20.del _y4f_1z.del _y6o_21.del _y8n_2.del _y9i.fdt _y9j.fdx _y9k.fnm _y9l.frq _y9m.nrm
_xzy.fdt _y26.fdt _y4f.fdt _y6o.fdt _y8n.fdt _y9i.fdx _y9j.fnm _y9k.frq _y9l.nrm _y9m.prx
_xzy.fdx _y26.fdx _y4f.fdx _y6o.fdx _y8n.fdx _y9i.fnm _y9j.frq _y9k.nrm _y9l.prx _y9m.tii
_xzy.fnm _y26.fnm _y4f.fnm _y6o.fnm _y8n.fnm _y9i.frq _y9j.nrm _y9k.prx _y9l.tii _y9m.tis
_xzy.frq _y26.frq _y4f.frq _y6o.frq _y8n.frq _y9i.nrm _y9j.prx _y9k.tii _y9l.tis
_xzy.nrm _y26.nrm _y4f.nrm _y6o.nrm _y8n.nrm _y9i.prx _y9j.tii _y9k.tis _y9m.fdt
_xzy.prx _y26.prx _y4f.prx _y6o.prx _y8n.prx _y9i.tii _y9j.tis _y9l.fdt _y9m.fdx
HOWEVER: it would suffice to save this folder. you can as well just backup your entire solr isntallation using incremental rsync or whatever... once started again only caches would need to be filled up newly etc.
BUT: i hope solr is not your primary database? its meant to be a search engine and not a replacement for a database and not even a backup!
just like mysql replications are nice to do load balancing but are useless as a backup...
why? because with the same query you could end up with an empty index. its just the same with solr/lucene. ... or for many, many other reasons that have far more brilliant people discussed already.
keeping that in mind i wish you a good day!
Please see my other answer about taking hot backups using Solr's ReplicationHandler. You can just wget
a URL and Solr will safely snapshot your data directory. I would not take a snapshot using cp
.
If you are concerned about keeping incremental states, there are a number of shell scripts that can be configured to run, either scheduled via cron or after commits and optimizes.
Find out more at http://wiki.apache.org/solr/SolrOperationsTools
One thing I would note is that while Solr is probably typically not used as the primary "System of Record", but as an auxiliary to some other data store, there isn't anything that requires that!
There are many use cases where if you lost your Solr indexes then you would lose your data. Think a site that crawls the internet for specific data. The only copy of each crawl result might only be in Solr, and I think, with appropriate backups, that is okay!