I'm still wrapping my head around Kubernetes and how that's supposed to work. Currently, I'm struggling to understand how to model something like a PostgreSQL cluster with streaming replication, scaling out and automatic failover/failback (pgpool-II
, repmgr
, pick your poison).
My main problem with the approach is the dual nature of a PostgreSQL instance, configuration-wise -- it's either a master or a cold/warm/hot standby. If I increase the number of replicas, I'd expect them all to come up as standbys, so I'd imagine creating a postgresql-standby
replication controller separately from a postgresql-master
pod. However I'd also expect one of those standbys to become a master in case current master is down, so it's a common postgresql
replication controller after all.
The only idea I've had so far is to put the replication configuration on an external volume and manage the state and state changes outside the containers.
(in case of PostgreSQL the configuration would probably already be on a volume inside its data
directory, which itself is obviously something I'd want on a volume, but that's beside the point)
Is that the correct approaach, or is there any other cleaner way?
There's an example in OpenShift: https://github.com/openshift/postgresql/tree/master/examples/replica The principle is the same in pure Kube (it's not using anything truly OpenShift specific, and you can use the images in plain docker)
You can give PostDock a try, either with docker-compose or Kubernetes. Currently I have tried it in our project with docker-compose, with the schema as shown below:
pgmaster (primary node1) --|
|- pgslave1 (node2) --|
| |- pgslave2 (node3) --|----pgpool (master_slave_mode stream)----client
|- pgslave3 (node4) --|
|- pgslave4 (node5) --|
I have tested the following scenarios, and they all work very well:
- Replication: changes made at the primary (i.e., master) node will be replicated to all standby (i.e., slave) nodes
- Failover: stops the primary node, and a standby node (e.g., node4) will automatically take over the primary role.
- Prevention of two primary nodes: resurrect the previous primary node (node1), node4 will continue as the primary node, while node1 will be in sync but as a standby node.
As for the client application, these changes are all transparent. The client just points to the pgpool node, and keeps working fine in all the aforementioned scenarios.
Note: In case you have problems to get PostDock up running, you could try my forked version of PostDock.
Pgpool-II with Watchdog
A problem with the aforementioned architecture is that pgpool is the single point of failure. So I have also tried enabling Watchdog for pgpool-II with a delegated virtual IP, so as to avoid the single point of failure.
master (primary node1) --\
|- slave1 (node2) ---\ / pgpool1 (active) \
| |- slave2 (node3) ----|---| |----client
|- slave3 (node4) ---/ \ pgpool2 (standby) /
|- slave4 (node5) --/
I have tested the following scenarios, and they all work very well:
- Normal scenario: both pgpools start up, with the virtual IP automatically applied to one of them, in my case, pgpool1
- Failover: shutdown pgpool1. The virtual IP will be automatically applied to pgpool2, which hence becomes active.
- Start failed pgpool: start again pgpool1. The virtual IP will be kept with pgpool2, and pgpool1 is now working as standby.
As for the client application, these changes are all transparent. The client just points to the virtual IP, and keeps working fine in all the aforementioned scenarios.
You can find this project at my GitHub repository on the watchdog branch.
Kubernetes's statefulset is a good base for setting up the stateful service. You will still need some work to configure the correct membership among PostgreSQL replicas.
Kubernetes has one example for it. http://blog.kubernetes.io/2017/02/postgresql-clusters-kubernetes-statefulsets.html
You can look at one of the below postgresql open-source tools
1 Crunchy data postgresql
- Patroni postgresql
.