Skip to main content

The atomic symlink swap

· 7 min read
Anton Medvedev
Deployer Maintainer

If you have ever wondered why a Deployer release is "atomic", the answer is one syscall: rename(2). The whole zero-downtime story is built on the fact that, on a single filesystem, swapping a symlink with mv -T is indivisible from the kernel's point of view. Either every request sees the old release, or every request sees the new one. Never half of each. This post explains exactly how that works, and what Deployer does on the systems where it does not.

The directory layout

Every Deployer-managed deploy looks like this on disk:

/var/www/app/
├── current -> releases/42
├── releases/
│ ├── 40/
│ ├── 41/
│ └── 42/
├── shared/
│ ├── .env
│ └── storage/
└── .dep/
└── releases_log

Your web server points its document root at current. Each deploy unpacks code into a brand new numbered directory under releases/, prepares it (vendors, migrations, asset builds, shared file links), and only at the very end repoints current to the new directory. Old releases stick around for a few generations so you can roll back without redeploying.

The interesting question is the last step. How do you change where current points without ever, even for a microsecond, having current point at nothing?

What does not work

The obvious thing is to delete the old symlink and create a new one:

rm current
ln -s releases/42 current

This is the classic broken approach. There is a window, however small, between the rm and the ln where current does not exist. A request that hits the web server during that window gets a 404 or a 500. On a busy site, that window is hit dozens of times per deploy.

The next attempt usually involves ln -sf:

ln -sf releases/42 current

The -f flag means "force, overwrite if exists". This sounds atomic but is not. Under the hood, ln -sf does unlink(current) followed by symlink(target, current). Same window, same problem, just hidden inside a single command.

What actually works: rename(2)

POSIX defines rename(oldpath, newpath) with a strong guarantee: if newpath already exists, it is atomically replaced. Any process opening newpath either gets the old file or the new file, never an error from "file does not exist", and never a partially constructed result. The kernel updates the directory entry in one step.

The catch: this only works inside a single filesystem. Across filesystems, rename falls back to copy-then-unlink, which is not atomic. For Deployer this is never a problem, because everything lives under deploy_path.

The next catch: rename works on directory entries, including the entries that point at symlink inodes. So if release is a symlink and current is a symlink, rename("release", "current") atomically swaps the directory entry for current to point at the symlink that release was pointing at. From the perspective of any reader, current always resolves to a real symlink, and that symlink always resolves to a real release directory.

Why mv -T and not just mv

The shell command for rename(2) is mv. So you would think this works:

ln -s releases/42 release
mv release current

It does not, in a subtle way. If current is a symlink to a directory, plain mv treats it as the directory and does the equivalent of mv release current/release, which moves the new symlink inside the old release. Now you have a release inside a release, and the original symlink is unchanged.

GNU mv has a flag for this: -T, also spelled --no-target-directory. With -T, mv treats the destination as a path to overwrite, not a directory to move into. The actual call becomes rename("release", "current") with no target-is-a-directory check, and the swap is atomic.

This is the line in Deployer's deploy:symlink task:

run("mv -T {{deploy_path}}/release {{current_path}}");

A few steps earlier, in deploy:release, Deployer creates that intermediate release symlink:

run("{{bin/symlink}} $releasePath {{deploy_path}}/release");

So at the moment mv -T runs, the layout is:

deploy_path/
├── current -> releases/41 (the old release)
├── release -> releases/42 (the new release, fully prepared)
└── releases/
├── 41/
└── 42/

After mv -T:

deploy_path/
├── current -> releases/42 (the new release, atomically swapped)
└── releases/
├── 41/
└── 42/

There is no intermediate state visible to anyone. The kernel's directory entry for current flips from pointing at one symlink inode to pointing at another in a single operation.

The fallback

Not every system has GNU mv -T. macOS without coreutils installed does not. Some BusyBox builds do not. Deployer probes for support at runtime:

set('use_atomic_symlink', function () {
return commandSupportsOption('mv', '--no-target-directory');
});

task('deploy:symlink', function () {
if (get('use_atomic_symlink')) {
run("mv -T {{deploy_path}}/release {{current_path}}");
} else {
run("cd {{deploy_path}} && {{bin/symlink}} {{release_path}} {{current_path}}");
run("cd {{deploy_path}} && rm release");
}
});

The fallback uses ln -nfs new current followed by rm release. This is the same unlink + symlink race we already said does not work, but in practice it is the best you can do without mv -T. The window is microseconds, and the cleanup of the temporary release link is separate from the swap itself, so the symlink is at least never empty for long.

If you are deploying to a target where atomicity matters (and at this point if you are reading this, it does), install coreutils so mv -T is available. On macOS that is brew install coreutils. On Alpine it is part of the base coreutils package. On any standard Linux distribution it is already there.

Detection at runtime

The commandSupportsOption helper is just a man / --help scrape:

function commandSupportsOption(string $command, string $option): bool
{
$man = run("(man $command 2>&1 || $command -h 2>&1 || $command --help 2>&1) | grep -- $option || true");
return !empty($man) && str_contains($man, $option);
}

Crude but reliable. It runs once per host and the result is cached in the host config. The cost is a single extra SSH command on the first deploy.

The same helper detects whether ln supports --relative:

set('use_relative_symlink', function () {
return commandSupportsOption('ln', '--relative');
});

set('bin/symlink', function () {
return get('use_relative_symlink') ? 'ln -nfs --relative' : 'ln -nfs';
});

--relative is worth a sentence on its own. By default, ln -s /var/www/app/releases/42 /var/www/app/current creates a symlink with an absolute target. If you ever move /var/www/app to a new location (different disk, snapshot restore, container migration), every symlink under it is broken until rewritten. --relative makes ln compute the shortest relative path between source and target, so current becomes releases/42 instead of /var/www/app/releases/42. Move the whole tree, the symlinks still resolve.

What about open file descriptors

A common worry: what happens to a long-running PHP-FPM worker that opened a file under the old release just before the swap?

Nothing. It keeps reading the file it had open. On Unix, an open file descriptor refers to an inode, not a path. The directory entry that pointed at that inode can be renamed, deleted, or replaced, and the inode stays alive until the last descriptor is closed. New requests will resolve current to the new release and open files there; in-flight requests will finish reading from the old release.

This is also why old release directories are never deleted immediately. The cleanup task waits a generation or two before removing them, so any worker that was still reading from a release at swap time has time to finish.

The whole picture

Put together, a Deployer release does this:

  1. Prepare the new release in releases/N/ from scratch. Take as long as needed; nothing visible has changed.
  2. Create release -> releases/N/ as a side-channel symlink. Still not visible to the web server.
  3. mv -T release current. Single rename(2) syscall. Web server now sees the new release.
  4. Old releases/N-1/, N-2/, etc. linger for keep_releases deploys, then get cleaned up.

Step 3 is the only step that anyone outside the deploy box can observe. It is one syscall, and the kernel guarantees it is indivisible.

That is the whole zero-downtime story. Three lines of bash and one POSIX guarantee.

1

Discuss on GitHub →