Skip to main content

SSH Multiplexing with ControlMaster

· 6 min read
Anton Medvedev
Deployer Maintainer

A typical Deployer run fires somewhere between twenty and a hundred SSH commands per host. Pull from git, install vendors, run migrations, swap a symlink, restart container, clean up old releases. If each of those opened a fresh SSH connection, deploys would feel painful. They do not, because of a small OpenSSH feature called ControlMaster. This post is about how it works, and how Deployer wires it up.

The cost of an SSH handshake

Every SSH connection goes through a handshake before it can run a single command. TCP three-way, version exchange, key exchange (Diffie-Hellman or ECDH), host key verification, user authentication. Even on a fast network, this takes 150 to 500 milliseconds. On a transatlantic link it is happily over a second.

Run a quick experiment against your own server:

$ time ssh deploy@example.com true
real 0m0.412s

$ time ssh deploy@example.com true
real 0m0.398s

Now imagine a deploy that runs fifty commands against five hosts. That is 250 handshakes. At 400ms each, you have spent 100 seconds doing nothing but TLS-style negotiation, before any actual work happened. For a tool whose entire job is to run commands over SSH, that is the dominant cost.

What ControlMaster does

OpenSSH has a feature called connection sharing, configured through three options:

ControlMaster auto
ControlPath ~/.ssh/control-%r@%h:%p
ControlPersist 60

The first SSH connection to a given user, host, and port becomes the "master". It opens a Unix domain socket at the ControlPath location. Every subsequent SSH connection with the same target sees that socket, skips the handshake entirely, and multiplexes a new channel inside the existing SSH session.

The numbers change dramatically:

$ time ssh -o ControlMaster=auto -o ControlPath=~/.ssh/cm-%r@%h:%p \
-o ControlPersist=60 deploy@example.com true
real 0m0.412s

$ time ssh -o ControlMaster=auto -o ControlPath=~/.ssh/cm-%r@%h:%p \
-o ControlPersist=60 deploy@example.com true
real 0m0.022s

400ms to 22ms. The second invocation does not negotiate a key, does not authenticate, does not even open a TCP connection. It just sends a message to the local socket asking the running master to open a new channel.

ControlPersist=60 tells the master to stay alive for 60 seconds after the last channel closes. So if you run dep deploy followed by dep cleanup a few seconds later, the second command reuses the master from the first.

How Deployer turns it on

Multiplexing is on by default in v8. Look at Host::connectionOptions():

if ($this->has('ssh_multiplexing') && $this->getSshMultiplexing()) {
$options = array_merge($options, [
'-o', 'ControlMaster=auto',
'-o', 'ControlPersist=60',
'-o', 'ControlPath=' . $this->getSshControlPath(),
]);
}

These three options get appended to every ssh invocation Deployer makes. The control path is generated per host:

private function generateControlPath(): string
{
$C = $this->getHostname();
if ($this->has('remote_user')) {
$C = $this->getRemoteUser() . '@' . $C;
}
if ($this->has('port')) {
$C .= ':' . $this->getPort();
}

if (getenv('CI') && is_writable('/dev/shm')) {
return "/dev/shm/$C";
}

return "~/.ssh/$C";
}

Two interesting bits in there.

First, the path is keyed on user@host:port, not just hostname. If you connect to the same machine as two different users (say deploy for releases, root for provisioning), each gets its own master. That is what you want, because the underlying SSH session is authenticated as one specific user.

Second, on CI, the socket goes into /dev/shm instead of ~/.ssh. There are two reasons for this. /dev/shm is tmpfs, so writes never hit disk and the socket is gone when the runner terminates. And inside some container configurations, ~/.ssh is read-only or simply does not exist for the CI user, while /dev/shm is reliably writable.

rsync gets it for free

rsync runs over SSH when you give it a remote source or destination, and it accepts an -e flag to set the SSH command. Deployer builds that command from the same connectionOptions() array:

$rsh = quote(rsync_rsh($host->connectionOptions()));
runLocally("rsync {$rsyncFlags} -e $rsh ... '$src/' '{$host->connectionString()}:$dst/'");

So when your deploy:update_code task does an rsync upload, it runs over the multiplexed connection that the previous run() calls already opened. No second handshake. The same is true for the local_archive upload step, the rsync recipe in contrib/, and any custom task that uses Httpie to fetch from a remote.

Real numbers

On a recent deploy of a mid-size Laravel app to three hosts, with multiplexing off:

deploy:update_code 28.4s
deploy:vendors 7.2s
deploy:writable 5.1s
artisan:migrate 3.8s
artisan:storage:link 0.9s
deploy:symlink 1.2s
deploy:cleanup 2.3s

Same recipe with multiplexing back on:

deploy:update_code 4.1s
deploy:vendors 6.7s
deploy:writable 0.6s
artisan:migrate 2.2s
artisan:storage:link 0.2s
deploy:symlink 0.3s
deploy:cleanup 0.4s

The biggest savings are not in the heavy tasks (vendors is dominated by Composer doing actual work), but in the dozens of small tasks that each fire one or two SSH commands. Those go from "a noticeable pause" to "instant".

When it bites

ControlMaster is not free. A few things to know.

Socket path length. Linux limits Unix domain socket paths to 108 bytes (108 on Linux, 104 on macOS). If your hostname is long, the default control path can blow past that. The symptom is a confusing error like "ControlPath too long". Fix with Host::setSshControlPath() and a shorter location, often something like /tmp/dep.sock.

host('very-long-hostname.region.cloud.example.com')
->setSshControlPath('/tmp/dep-%h.sock');

Stale sockets. If the master process crashes (kill -9, OOM, container restart), the socket file is left behind and looks valid until you try to use it. With ControlMaster=auto, OpenSSH detects the failure and creates a new master, but you may see a one-time error like "mux_client_request_session: read from master failed". Re-run usually resolves it. If it does not, just delete the socket file.

Jump hosts and bastions. If your SSH config uses ProxyJump, multiplexing still works, but the master holds the proxied connection open. That is usually what you want. If your bastion enforces short session timeouts on the jump leg, you may see deploys fail mid-task because the master died on the jump host. Disable multiplexing for those hosts:

host('production')
->set('ssh_multiplexing', false);

Disabling it

You can turn multiplexing off globally:

set('ssh_multiplexing', false);

Per host:

host('weird-server')
->set('ssh_multiplexing', false);

Or just for one run:

dep deploy -o ssh_multiplexing=false

Why this is in a deploy tool

OpenSSH has supported ControlMaster since 2005. Anyone deploying over SSH could have wired up these three options in their ~/.ssh/config and gotten the same speedup, decades ago. Many people did not. Some did and discovered the socket-path issue, gave up, and turned it off.

Deployer turns it on by default, picks a sensible socket location, handles the CI case, and keeps the lifecycle short enough that stale sockets are rare. None of that is interesting on its own. Together it is the difference between a deploy that takes 25 seconds and a deploy that takes 90.

The lesson, if there is one, is that good defaults compound. A tool that asks you to read a config doc to make it fast is almost always slower than the same tool with the right defaults baked in.

1

Discuss on GitHub →