Problems with escapeshellarg()
If you ever passed a user-supplied string into a shell command from PHP, you have probably written something like this:
$cmd = 'grep ' . escapeshellarg($pattern) . ' file.log';
shell_exec($cmd);
It looks right, and most of the time it works. But escapeshellarg() has a few quirks that matter a lot when you are
running commands across many hosts, with arbitrary user input, in arbitrary locales. Deployer v8 replaces every internal
call to escapeshellarg() with a new quote() function. This post explains why.
What escapeshellarg() actually does
On Unix, escapeshellarg() wraps the input in single quotes and escapes any single quote inside the string by closing
the quoted block, inserting a literal escaped quote, and reopening:
escapeshellarg("it's");
// 'it'\''s'
That output is correct shell, but it has two well-known problems.
Problem 1: it strips bytes in non-UTF-8 locales
This is the one that bites people in production. From the PHP source, escapeshellarg() on Unix replaces any character
that the current locale considers "non-printable" with nothing. If your server runs in the C (POSIX) locale, every
byte with the high bit set, every byte from 0x80 upward, gets silently dropped.
setlocale(LC_CTYPE, 'C');
echo escapeshellarg('héllo');
// 'hllo'
The é is gone. No error. No warning. The command runs against a different string than the one you passed in.
This has been a known issue for over a decade. The workaround is to make sure
your locale is UTF-8 aware before any call to escapeshellarg(). But that is a global, process-wide setting, and "make
sure the locale is right" is a deeply unsatisfying answer when you are deploying to dozens of hosts with no guarantee of
how their PHP CLI is configured.
Problem 2: it produces ugly output
Even when it works, the output is hard to read in logs:
escapeshellarg("it's a test");
// 'it'\''s a test'
When you are debugging a deploy that failed three commands deep, that visual noise adds up.
Problem 3: Windows is a completely different function
On Windows, escapeshellarg() does not use single quotes at all. It wraps in double quotes and does its own escaping.
Code that quotes correctly on Linux can produce invalid commands on Windows and vice versa. Deployer almost always runs
against Unix hosts, but PHP's behavior differing by host platform is still surprising.
ANSI-C quoting
Bash and zsh support a string syntax called ANSI-C quoting. It looks like this:
echo $'hello\nworld'
The $'...' form treats the contents like a C string literal. Backslash escapes are processed (\n, \t, \\, \',
\0, and the rest), and everything else is taken verbatim. Critically, it does not depend on the locale, and it gives
you a clean way to encode every byte, including control characters and embedded quotes.
That is what Deployer's new quote() function uses. The whole implementation is small enough to read in one screen:
function quote(string $arg): string
{
if ($arg === '') {
return "\$''";
}
if (str_contains($arg, "\0")) {
throw new \InvalidArgumentException('quote(): null byte is not allowed in shell arguments');
}
if (preg_match('/^[\w\/.\-+@:=,%]+$/', $arg)) {
return $arg;
}
return "\$'" . strtr($arg, [
'\\' => '\\\\',
"'" => "\\'",
"\f" => '\\f',
"\n" => '\\n',
"\r" => '\\r',
"\t" => '\\t',
"\v" => '\\v',
]) . "'";
}
Three branches:
- Empty string becomes
$''. - Strings made up entirely of safe characters (alphanumerics, plus
/.-+@:=,%) pass through unquoted. Most paths, hostnames, and version numbers fall in this bucket. - Anything else gets wrapped in
$'...'with a small set of escapes.
The pass-through case is what keeps logs readable. A command like git log --format=%H stays as git log --format=%H
instead of becoming git $'log' $'--format=%H'.
Side by side
quote(''); // $''
quote('hello'); // hello
quote('/usr/local/bin'); // /usr/local/bin
quote('hello world'); // $'hello world'
quote("it's"); // $'it\'s'
quote("line1\nline2"); // $'line1\nline2'
quote('héllo wörld'); // $'héllo wörld'
quote('$(cat /etc/passwd)'); // $'$(cat /etc/passwd)'
The Unicode case matters. With escapeshellarg() in a C locale, that string would have come out as 'hllo wrld'.
With quote(), you get exactly what you put in, regardless of the locale.
The injection case also matters. $'$(cat /etc/passwd)' is a literal string of seventeen characters, not a command
substitution. The shell does not expand $(...) inside $'...' quotes, the same way it does not expand inside regular
single quotes. The user's payload is data, not code.
Using it
You call quote() exactly where you used to call escapeshellarg():
// Before:
run('echo ' . escapeshellarg($message));
// After:
run('echo ' . quote($message));
Inside templates, there is also a quote filter, so you can keep the recipe readable:
set('message', "it's deployed");
run('echo {{ message | quote }}');
The filter calls the same function, so it has the same guarantees.
A few caveats
The $'...' syntax is a Bash and zsh feature. It is also defined in the upcoming POSIX revision, and most modern shells
implement it (including dash on recent Debian and Ubuntu), but a strict /bin/sh from an older system might not.
Deployer assumes a Bash-compatible shell on the remote host, which has been the case since the beginning. If you need to
override this per host, v8 also adds Host::setShellPath().
quote() only handles UTF-8 byte sequences correctly when they are valid UTF-8 to begin with. If you pass in a binary
blob with arbitrary bytes, the output will still be a syntactically valid ANSI-C string, but whether your remote shell
interprets it the way you expect depends on its locale settings. For shell arguments, this is almost never a real
concern. For piping binary data, use the standard input stream instead.
Why bother
Most of the time, escapeshellarg() is fine. The locale bug is rare in practice, and most servers run in a UTF-8 locale
these days. But Deployer is a tool people trust to run privileged commands across production servers, often with values
that came from CI environment variables, git refs, or user-supplied config. A function that silently drops bytes under
conditions you cannot easily detect is not a great fit for that job.
quote() is small, deterministic, and produces output that is easy to read in a deploy log. It does the same job in
fewer surprises. That seemed like a good trade.
