SSH Multiplexing with ControlMaster
A typical Deployer run fires somewhere between twenty and a hundred SSH commands per host. Pull from git, install
vendors, run migrations, swap a symlink, restart container, clean up old releases. If each of those opened a fresh SSH
connection, deploys would feel painful. They do not, because of a small OpenSSH feature called ControlMaster. This
post is about how it works, and how Deployer wires it up.
The cost of an SSH handshake
Every SSH connection goes through a handshake before it can run a single command. TCP three-way, version exchange, key exchange (Diffie-Hellman or ECDH), host key verification, user authentication. Even on a fast network, this takes 150 to 500 milliseconds. On a transatlantic link it is happily over a second.
Run a quick experiment against your own server:
$ time ssh deploy@example.com true
real 0m0.412s
$ time ssh deploy@example.com true
real 0m0.398s
Now imagine a deploy that runs fifty commands against five hosts. That is 250 handshakes. At 400ms each, you have spent 100 seconds doing nothing but TLS-style negotiation, before any actual work happened. For a tool whose entire job is to run commands over SSH, that is the dominant cost.
What ControlMaster does
OpenSSH has a feature called connection sharing, configured through three options:
ControlMaster auto
ControlPath ~/.ssh/control-%r@%h:%p
ControlPersist 60
The first SSH connection to a given user, host, and port becomes the "master". It opens a Unix domain socket at the
ControlPath location. Every subsequent SSH connection with the same target sees that socket, skips the handshake
entirely, and multiplexes a new channel inside the existing SSH session.
The numbers change dramatically:
$ time ssh -o ControlMaster=auto -o ControlPath=~/.ssh/cm-%r@%h:%p \
-o ControlPersist=60 deploy@example.com true
real 0m0.412s
$ time ssh -o ControlMaster=auto -o ControlPath=~/.ssh/cm-%r@%h:%p \
-o ControlPersist=60 deploy@example.com true
real 0m0.022s
400ms to 22ms. The second invocation does not negotiate a key, does not authenticate, does not even open a TCP connection. It just sends a message to the local socket asking the running master to open a new channel.
ControlPersist=60 tells the master to stay alive for 60 seconds after the last channel closes. So if you run
dep deploy followed by dep cleanup a few seconds later, the second command reuses the master from the first.
How Deployer turns it on
Multiplexing is on by default in v8. Look at Host::connectionOptions():
if ($this->has('ssh_multiplexing') && $this->getSshMultiplexing()) {
$options = array_merge($options, [
'-o', 'ControlMaster=auto',
'-o', 'ControlPersist=60',
'-o', 'ControlPath=' . $this->getSshControlPath(),
]);
}
These three options get appended to every ssh invocation Deployer makes. The control path is generated per host:
private function generateControlPath(): string
{
$C = $this->getHostname();
if ($this->has('remote_user')) {
$C = $this->getRemoteUser() . '@' . $C;
}
if ($this->has('port')) {
$C .= ':' . $this->getPort();
}
if (getenv('CI') && is_writable('/dev/shm')) {
return "/dev/shm/$C";
}
return "~/.ssh/$C";
}
Two interesting bits in there.
First, the path is keyed on user@host:port, not just hostname. If you connect to the same machine as two different
users (say deploy for releases, root for provisioning), each gets its own master. That is what you want, because the
underlying SSH session is authenticated as one specific user.
Second, on CI, the socket goes into /dev/shm instead of ~/.ssh. There are two reasons for this. /dev/shm is
tmpfs, so writes never hit disk and the socket is gone when the runner terminates. And inside some container
configurations, ~/.ssh is read-only or simply does not exist for the CI user, while /dev/shm is reliably writable.
rsync gets it for free
rsync runs over SSH when you give it a remote source or destination, and it accepts an -e flag to set the SSH
command. Deployer builds that command from the same connectionOptions() array:
$rsh = quote(rsync_rsh($host->connectionOptions()));
runLocally("rsync {$rsyncFlags} -e $rsh ... '$src/' '{$host->connectionString()}:$dst/'");
So when your deploy:update_code task does an rsync upload, it runs over the multiplexed connection that the previous
run() calls already opened. No second handshake. The same is true for the local_archive upload step, the rsync
recipe in contrib/, and any custom task that uses Httpie to fetch from a remote.
Real numbers
On a recent deploy of a mid-size Laravel app to three hosts, with multiplexing off:
deploy:update_code 28.4s
deploy:vendors 7.2s
deploy:writable 5.1s
artisan:migrate 3.8s
artisan:storage:link 0.9s
deploy:symlink 1.2s
deploy:cleanup 2.3s
Same recipe with multiplexing back on:
deploy:update_code 4.1s
deploy:vendors 6.7s
deploy:writable 0.6s
artisan:migrate 2.2s
artisan:storage:link 0.2s
deploy:symlink 0.3s
deploy:cleanup 0.4s
The biggest savings are not in the heavy tasks (vendors is dominated by Composer doing actual work), but in the dozens
of small tasks that each fire one or two SSH commands. Those go from "a noticeable pause" to "instant".
When it bites
ControlMaster is not free. A few things to know.
Socket path length. Linux limits Unix domain socket paths to 108 bytes (108 on Linux, 104 on macOS). If your
hostname is long, the default control path can blow past that. The symptom is a confusing error like "ControlPath too
long". Fix with Host::setSshControlPath() and a shorter location, often something like /tmp/dep.sock.
host('very-long-hostname.region.cloud.example.com')
->setSshControlPath('/tmp/dep-%h.sock');
Stale sockets. If the master process crashes (kill -9, OOM, container restart), the socket file is left behind and
looks valid until you try to use it. With ControlMaster=auto, OpenSSH detects the failure and creates a new master,
but you may see a one-time error like "mux_client_request_session: read from master failed". Re-run usually resolves it.
If it does not, just delete the socket file.
Jump hosts and bastions. If your SSH config uses ProxyJump, multiplexing still works, but the master holds the
proxied connection open. That is usually what you want. If your bastion enforces short session timeouts on the jump leg,
you may see deploys fail mid-task because the master died on the jump host. Disable multiplexing for those hosts:
host('production')
->set('ssh_multiplexing', false);
Disabling it
You can turn multiplexing off globally:
set('ssh_multiplexing', false);
Per host:
host('weird-server')
->set('ssh_multiplexing', false);
Or just for one run:
dep deploy -o ssh_multiplexing=false
Why this is in a deploy tool
OpenSSH has supported ControlMaster since 2005. Anyone deploying over SSH could have wired up these three options in
their ~/.ssh/config and gotten the same speedup, decades ago. Many people did not. Some did and discovered the
socket-path issue, gave up, and turned it off.
Deployer turns it on by default, picks a sensible socket location, handles the CI case, and keeps the lifecycle short enough that stale sockets are rare. None of that is interesting on its own. Together it is the difference between a deploy that takes 25 seconds and a deploy that takes 90.
The lesson, if there is one, is that good defaults compound. A tool that asks you to read a config doc to make it fast is almost always slower than the same tool with the right defaults baked in.
