Is there a way to update the TLS certificates in a

2019-04-24 00:48发布

问题:

I have a simple https server serving a simple page like so (no error handling for brevity):

package main

import (
    "crypto/tls"
    "fmt"
    "net/http"
)

func main() {
    mux := http.NewServeMux()

    mux.HandleFunc("/", func(w http.ResponseWriter, req *http.Request) {
        fmt.Fprintf(w, "hello!")
    })

    xcert, _ := tls.LoadX509KeyPair("cert1.crt", "key1.pem")

    tlsConf := &tls.Config{
        Certificates: []tls.Certificate{xcert},
    }

    srv := &http.Server{
        Addr:      ":https",
        Handler:   mux,
        TLSConfig: tlsConf,
    }

    srv.ListenAndServeTLS("", "")
}

I want to use a Let's Encrypt TLS certificate to serve the content over https. I would like to be able to do certificate renewals and update the certificate in the server without any downtime.

I tried running a goroutine to update the tlsConf:

go func(c *tls.Config) {
        xcert, _ := tls.LoadX509KeyPair("cert2.crt", "key2.pem")

        select {
        case <-time.After(3 * time.Minute):
            c.Certificates = []tls.Certificate{xcert}
            c.BuildNameToCertificate()
            fmt.Println("cert switched!")
        }

    }(tlsConf)

However, that doesn't work because the server does not "read in" the changed config. Is there anyway to ask the server to reload the TLSConfig?

回答1:

There is: you can use tls.Config’s GetCertificate member instead of populating Certificates. First, define a data structure that encapsulates the certificate and reload functionality (on receiving the SIGHUP signal in this example):

type keypairReloader struct {
        certMu   sync.RWMutex
        cert     *tls.Certificate
        certPath string
        keyPath  string
}

func NewKeypairReloader(certPath, keyPath string) (*keypairReloader, error) { 
        result := &keypairReloader{
                certPath: certPath,
                keyPath:  keyPath,
        }
        cert, err := tls.LoadX509KeyPair(certPath, keyPath)
        if err != nil {
                return nil, err
        }
        result.cert = &cert
        go func() {
                c := make(chan os.Signal, 1)
                signal.Notify(c, syscall.SIGHUP)
                for range c {
                        log.Printf("Received SIGHUP, reloading TLS certificate and key from %q and %q", *tlsCertPath, *tlsKeyPath)
                        if err := result.maybeReload(); err != nil {
                                log.Printf("Keeping old TLS certificate because the new one could not be loaded: %v", err)
                        }
                }
        }()
        return result, nil
}

func (kpr *keypairReloader) maybeReload() error { 
        newCert, err := tls.LoadX509KeyPair(kpr.certPath, kpr.keyPath)
        if err != nil {
                return err
        }
        kpr.certMu.Lock()
        defer kpr.certMu.Unlock()
        kpr.cert = &newCert
        return nil
}

func (kpr *keypairReloader) GetCertificateFunc() func(*tls.ClientHelloInfo) (*tls.Certificate, error) { 
        return func(clientHello *tls.ClientHelloInfo) (*tls.Certificate, error) {
                kpr.certMu.RLock()
                defer kpr.certMu.RUnlock()
                return kpr.cert, nil
        }
}

Then, in your server code, use:

kpr, err := NewKeypairReloader(*tlsCertPath, *tlsKeyPath)
if err != nil {
    log.Fatal(err)
}
srv.TLSConfig.GetCertificate = kpr.GetCertificateFunc()

I recently implemented this pattern in RobustIRC.



回答2:

You'd have to stop and restart the Listener, which that in and of itself will be 'downtime'.

If it is a must-have for "no downtime", one option is to build in graceful restarting by spinning up a child instance:

http://grisha.org/blog/2014/06/03/graceful-restart-in-golang/

But in reality, this is a false sense of security... The very fact that you have only 1 instance running and trying to guarantee that instance is stable means it is a single point of failure as you cannot guarantee uptime. Servers reboot, apps panic, connections drop.

Instead, consider setting up a web farm of at least 2 or 3 nodes to distribute the traffic.

Hear me out for a moment...

Amazon AWS has "Elastic Beanstalk" (among other similar offerings). Windows Azure has "Websites." Both of these managed options allow for Rolling Updates. Let go of the SSH access, and just sit back and let it be managed.

What are rolling updates? Say you have two instances on version 1. You want to deploy version 2.

  1. You "update" the package and AWS starts the deployment by spinning up a 3rd instance.
  2. Once the VM is in "ready" state, with your Version 2 code deployed and running, AWS will start to direct TCP traffic to it by changing the ELB (load balancer).
  3. AWS will then stop directing traffic to one of the older nodes on Version 1. It won't shut it down just yet, just stops sending new connections.
  4. Once all TCP connections to this old version 1 instance has been drained, AWS then shuts down that instance.
  5. AWS now spins up a 4th instance, on Version 2, and at Ready state starts directing traffic.
  6. AWS stops traffic to the last old version 1 instance, waiting for existing connections to finish.
  7. Once the connections are drained, AWS shuts down the last instance of the old version.

Zero downtime. Zero connections dropped. Zero TCP packages lost. Fully automated. Rolling upgrades of your SSL certs, as you want.

This is of course completely configurable such as Blue/Green deployments (spinning up several new instances first, and direct all new traffic to the new environment - best for DB schema changes). You can also do canary testing with small traffic, etc.