I have a working Cluster with services that all respond behind a helm installed Ingress nGinx running on Azure AKS. This ended up being Azure specific.
My question is: Why does my connection to the services / pods in this cluster periodically get severed (apparently by some sort of idle timeout), and why does that connection severing appear to also coincide with my Az AKS Browse UI connection getting cut?
This is an effort to get a final answer on what exactly triggers the time-out that causes the local 'Browse' proxy UI to disconnect from my Cluster (more background on why I am asking to follow).
When working with Azure AKS from the Az CLI you can launch the local Browse UI from the terminal using:
az aks browse --resource-group <resource-group> --name <cluster-name>
This works fine and pops open a browser window that looks something like this (yay):
In your terminal you will see something along the lines of:
- Proxy running on http://127.0.0.1:8001/ Press CTRL+C to close the tunnel...
- Forwarding from 127.0.0.1:8001 -> 9090 Forwarding from
- [::1]:8001 -> 9090 Handling connection for 8001 Handling connection for 8001 Handling connection for 8001
If you leave the connection to your Cluster idle for a few minutes (ie. you don't interact with the UI) you should see the following print to indicate that the connection has timed out:
E0605 13:39:51.940659 5704 portforward.go:178] lost connection to pod
One thing I still don't understand is whether OTHER activity inside of the Cluster can prolong this timeout but regardless once you see the above you are essentially at the same place I am... which means we can talk about the fact that it looks like all of my other connections OUT from pods in that server have also been closed by whatever timeout process is responsible for cutting ties with the AKS browse UI.
So what's the issue?
The reason this is a problem for me is that I have a Service running a Ghost Blog pod which connects to a remote MySQL database using an npm package called 'Knex'. As it happens the newer versions of Knex have a bug (which has yet to be addressed) whereby if a connection between the Knex client and a remote db server is cut and needs to be restored — it doesn't re-connect and just infinitely loads.
nGinx Error 503 Gateway Time-out
In my situation that resulted in nGinx Ingress giving me an Error 503 Gateway time-out. This was because Ghost wasn't responding after the Idle timeout cut the Knex connection — since Knex wasn't working properly and doesn't restore the broken connection to the server properly.
Fine. I rolled back Knex and everything works great.
But why the heck are my pod connections being severed from my Database to begin with?
Hence this question to hopefully save some future person days of attempting to troubleshoot phantom issues that relate back to Kubernetes (maybe Azure specific, maybe not) cutting connections after a service / pod has been idle for some time.