NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
The curious case of shell commands, or how "this bug is required by POSIX" (2021) (notes.volution.ro)
1vuio0pswjnm7 4 minutes ago [-]
"bash, obviously for scripting;"

It's possible to use bash for both interactive use and scripting. For example, this author claims to use bash for scripting.

But Debian and the popular Debian-derived distributions do not use bash for scripting.

The interactive shell may be bash, but the scripting shell is not bash.

https://www.man7.org/linux/man-pages/man1/dash.1.html

https://wiki.ubuntu.com/DashAsBinSh

https://wiki.archlinux.org/title/Dash

https://www.oreilly.com/library/view/shell-scripting-expert/... ^1

https://www.baeldung.com/linux/dash-vs-bash-performance

https://en.wikipedia.org/wiki/Almquist_shell

https://lwn.net/Articles/343924/

https://scriptingosx.com/2020/06/about-bash-zsh-sh-and-dash-... ^2

I use an Almquist shell, not bash, for both interactive use and scripting. I often write scripts interactively. I use the same scripts on Linux and BSD. I restored tabcomplete and the fc builtin to dash so it feels more like the shell from which it was derived: NetBSD sh.

1. "This makes it smaller, lighter and faster than bash."

2. "... this is strong indicator that Apple eventually wants too use dash as the interpreter for sh scripts."

zahlman 3 minutes ago [-]
> No, let's just try it out (I've put both the Python and the plain sh -c invocations):

  > python2 -c 'import os; os.system("-x")'
  > sh -c -x
  sh: -c: option requires an argument
I can't reproduce this in Python (including my local 2.7 build), only using sh directly. Going through Python, `sh` correctly tells me that the `-x` command isn't found.

But now I'm wondering: how is one supposed to use `which` (or `type`, or `file`) for a program named `-x`, supposing one had it?

o11c 4 hours ago [-]
This is woefully misguided. Half the time passing it to the shell is explicitly a feature, e.g. `popen("gzip > foo.gz")`. If you have user input you should always sanitize it regardless of API.

But `ssh` does deserve all the shame. It's a pity the real problems are hard to find in an article full of nonsense.

Note also that if you're using a deficient shell that supports neither `printf %q` nor `${var@Q}` it's still easy to do quoting in `sed`. GNU `./configure` scripts do this internally, including special-casing to only quote the right side of `--arg=value`.

akdev1l 3 hours ago [-]
> Note also that if you're using a deficient shell that supports neither `printf %q` nor `${var@Q}` it's still easy to do quoting in `sed`. GNU `./configure` scripts do this internally, including special-casing to only quote the right side of `--arg=value`.

With the assumption that:

1. The person knows to do this weird thing 2. They do it consistently every time 3. They never forget

Also not sure how to use those solutions for the popen() example you provided.

The correct way is:

    subprocess.run([
       "gzip",
       "-c",
       "—-to-stdout",
       user_input
    ], stdout=open("foo.gz"))

And now I don’t have to worry about any of these weird things
panzi 3 hours ago [-]
You need to ensure that user_input doesn't start with `-`. You can do that by forcing an absolute path. Some programs accept `--` as a marker that any arguments after that are non-options.
theamk 1 hours ago [-]
The idea is you have some 3rd-party app, which might accept a parameter and pass it to popen as-is, something like "--data-handler=command"

In this case, current "popen" semantics - a single shell command - works pretty well. You can pass it a custom process:

    --data-handler="scripts/handle_data.py"
or a shell fragment:

    --data-handler="gzip -c > last_data.gz"
or even mini-shell script:

    --data-handler "jq .contents | mail -s 'New data incoming' data-notify"
this is where the "shell command" arguments really shine - and note you cannot simulate all of this functionality with command vector.
hello_computer 3 hours ago [-]
> article full of nonsense

Pls elaborate. Seems like a decent list of shell gotchas to me.

bee_rider 3 hours ago [-]
It is kind of a journey in an annoying way. Like, do we really need to know all the stuff about: this man page says to sanitize input, this one doesn’t, blah blah blah.

Or, let’s just look at an excerpt, here’s the section “proper solution:”

I’ve emphasized the actionable advice.

> The proper solution would be dropping that broken tool immediately, securely erasing it from your hard-drive, then running and screaming that tool's name out-loud in shame... (Something akin to Game of Throne's walk of atonement...)

Joke

> I'm not kidding... This kind of broken tools are the cause of many stupid bugs, ranging from the funny ups-rm-with-spaces (i.e. rm -Rf / some folder with spaces /some-file), to serious security issues like the formerly mentioned shellshock...

Joke/contentless stakes raising.

> So, you say someone holds you at gun point, thus you must use that tool? Check if the broken tool doesn't have a flag that disables calling sh -c, and instead properly executes the given command and arguments directly via execve(2). (For example watch has the -x flag as mentioned.)

Here it is, the paragraph that has something!

> Alternatively, given that most likely the tool in question is an open-source project written by someone in his spare time, perhaps open a feature request describing the issue, and if possible contribute with a patch that solves it.

This doesn’t seem practically actionable, at least in the short term—most projects might ignore your patch, or maybe it will take multiple years to get pushed out to distros.

> Still no luck? Make some popcorn and prepare for the latest block-buster "convoluted solutions for simple problems in UNIX town"...

Dramatic buildup/joke.

steamrolled 1 hours ago [-]
The original post asserted the article is nonsense; you're trying to justify that by saying you don't like the author's writing style. Two separate things...

The article is mostly correct, although it makes some weird claims (e.g., the Shellshock bug had nothing to do with the class of bugs the article is complaining about - it was a vulnerability in the shell itself). It definitely has a "newcomer hates things without understanding why they are the way they are" vibe, but you actually need that every now and then. The old-timers tend to say "it was originally done this way for a reason and if you're experienced enough, you know how to deal with it", but what made sense 30-40 years ago might not make much sense today.

theamk 1 hours ago [-]
I dunno, "mostly" normally means some large fraction, maybe 50% or 90% depending on the person.. Given that executing commands by itself is neither a bug nor a security vulnerability (those only occur from bad/lack-of quoting), the majority of the article is wrong.
panzi 3 hours ago [-]
Yeah, system() should definitely be deprecated and you should never use it if you write any new program. At least there is exec*() and posix_spawn() under POSIX. Under Windows there is no such thing and every program might parse the command line string differently. You can't naively write a generic posix_spawn() like interface for Windows, see this related Rust CVE: https://blog.rust-lang.org/2024/04/09/cve-2024-24576/ Why is it a CVE in Rust, but not in any other programming language? Did other language handle it better? Dunno, I just know that Rust has a big fat warning about this in their documentation (https://doc.rust-lang.org/std/process/struct.Command.html#me...), but e.g. Java doesn't (https://docs.oracle.com/javase/8/docs/api/java/lang/ProcessB...).
3 minutes ago [-]
steamrolled 2 hours ago [-]
The main reason system() exists is that people want to execute shell commands; some confused novice developers might mix it up with execl(), but this is not a major source of vulnerabilities. The major source of vulnerabilities is "oh yeah, I actually meant to execute shell".

So if you just take away the libcall, people will make their own version by just doing execl() of /bin/sh. If you want this to change, I think you have to ask why do people want to do this in the first place.

And the answer here is basically that because of the unix design philosophy, the shell is immensely useful. There are all these cool, small utilities and tricks you can use in lieu of writing a lot of extra code. On Windows, command-line conventions, filesystem quirks, and escaping gotchas are actually more numerous. It's just that there's almost nothing to call, so you get fewer bugs.

The most practical way to make this class of bugs go away is to make the unix shell less useful.

Galanwe 3 hours ago [-]
I don't understand what you are complaining about. I don't understand what the article is complaining about either.

exec* are not "better replacements" of the shell, they are just used for different use cases.

The whole article could be summarized to 3 bullet points:

1) Sanitize your inputs

2) If you want to execute a specific program, exec it after 1), no need for the shell

3) Allow the shell if there is no injection risk

jcranmer 2 hours ago [-]
The article spends a lot of time dancing around its central points rather than addressing them directly, but the basic problems with shell boil down to this:

There's two ways to think of "running a command:"

1. A list of strings containing an executable name (which may or may not be a complete path) and its arguments (think C's const char **argv).

2. A single string which is a space-separated list of arguments, with special characters in arguments (including spaces) requiring quoting to represent correctly.

Conversion between these two forms is non trivial. And the basic problem is that there's a lot of tools which incorrectly convert the former to the latter by just concatenating all of the arguments into a single string and inserting spaces. Part of the problem is that shell script itself makes doing the conversion difficult, but the end effect is that if you have to with commands with inputs that have special characters (including, but not limited to, spaces), you end up just going slowly insane trying to figure out how to get the quoting right to work around the broken tools.

In my experience, the world is so much easier if your own tools just break everything up into the list-of-strings model and you never to try to use an API that requires single-string model.

What GP is referring to is the fact that that solution doesn't work as well on Windows, because the OS's native idea of a command line isn't list-of-strings but rather a single-string, and how that single string is broken up into a list-of-strings is dependent on the application being invoked.

theamk 1 hours ago [-]
I think "non trivial" and "slowly going insane" parts only happen if you don't have right tools, or not using POSIX-compatable system.

In python you have "shlex.quote" and "shlex.join". In bash, you have "${env@Q}". I've found those to work wonderfully to me - and I did crazy things like quote arguments, embed into shell script, quote script again for ssh, and quote 3rd time to produce executable .sh file.

In other languages.. yeah, you are going to have bad time. Especially on Windows, where I'd just give up and move to WSL.

jcranmer 46 minutes ago [-]
To be honest, I've never heard of Bash's @Q solution before today--I can't find it in https://tldp.org/LDP/abs/html/, which is my usual goto guide for "how do I do $ADVANCED_FEATURE in bash?"
panzi 3 hours ago [-]
I'd say: Don't use the shell if what you want to do is to execute another program.

You don't need to handle any quoting with exec*(). You still need to handle options, yes. But under Windows you always have to to handle the quoting yourself and it is more difficult than for the POSIX shell and it is program dependent. Without knowing what program is executed you can't know what quoting syntax you have to use and as such a standard library cannot write a generic interface to pass arguments to another process in a safe way under Windows.

I just felt it sounded like POSIX is particularly bad in that context, while in fact it is better than Windows here. Still, the system() function is a mistake. Use posix_spawn(). (Note: Do not use _spawn*() under Windows. That just concatenates the arguments with a space between and no quoting whatsoever.)

oguz-ismail 2 hours ago [-]
>Still, the system() function is a mistake. Use posix_spawn().

They are entirely different interfaces though. If you'd implemented system() using posix_spawn() it'd be just as bad as system()

panzi 2 hours ago [-]
Why would you implement system() at all?
theamk 1 hours ago [-]
parse commands from config file? command-line arguments for hooks?

https://news.ycombinator.com/item?id=44239036

oguz-ismail 2 hours ago [-]
Because I don't want to implement a shell???
panzi 1 hours ago [-]
If you want to run a shell script, run a shell script. I.e. a text file with the executable bit set and a shebang. If you want to generate a shell script on the fly to then run it, take a step back and think about what you're doing.
chubot 3 hours ago [-]
The article mentioned printf '%q ', but it is a bit hard to find. Here is a handy way to remember it.

First, define this function:

    quote-argv() { printf '%q ' "$@"; }
    # (uses subtle vectorization of printf over args)
Now this works correctly:

    ssh example.com "$(quote-argv ls 'file with spaces')"
    ls: cannot access 'file with spaces': No such file or directory
In contrast to:

    $ ssh example.com ls 'file with spaces'
    ls: cannot access 'file': No such file or directory
    ls: cannot access 'with': No such file or directory
    ls: cannot access 'spaces': No such file or directory
And yes the "hidden argv join" of ssh is VERY bad, and it is repeated in shell's eval builtin.

They should both only take a SINGLE arg.

It is basically a self-own because spaces are an OPERATOR in shell! (the operator that separates words)

When you concatenate operators and variables, then you are mixing code and data, which is a security problem.

---

As for the exec workaround, I think this is also deficiency of shell. Oils will probably grow an 'invoke' builtin which generalizes 'command' and 'builtin', which are non-orthogonal.

'command true' means "external or builtin" (disabling shell function lookup), but there should be something that means "external only".

o11c 2 hours ago [-]
Use ' %q' and you also fix the problem of program names starting with a dash.
chubot 43 minutes ago [-]
Ah yes, that's clever:

    $ sh -c "$(quote-argv -echo 'file with spaces')"
    sh: 0: Illegal option -h

    $ sh -c "$(quote-argv-left -echo 'file with spaces')"
    sh: 1: -echo: not found
Over ssh:

    $ ssh example.com "$(quote-argv-left -dashtest 'file with spaces')"
    -dashtest
    file with spaces
hackernudes 3 hours ago [-]
I think this is a topic that every Linux user eventually stumbles into. It is indeed quite frustrating.

I found the article hard to follow, but maybe because I was already familiar with the problem and was just skimming. Skip to "Some experiments..." for the actual useful examples.

I disagree with the conclusion, though. I think there should just be more obvious ways to escape the input so one can keep their sanity with nested 'sh -c' invocation. Maybe '${var@Q}' and print '%q' are enough (can't believe I didn't know those existed!)

mrspuratic 1 hours ago [-]
A Long Time Ago I used to admin Apache httpd (back when "Apache" meant "httpd") before it could self-chroot. One of the issues when you did a manual chroot was piped logs (|rotatelogs) was invoked via "/bin/sh -c". I wrote a stub "sh" that allowed only "sh -c command ..." which it passed to execv(). Just primitive [ \t] argument splitting, no funny business, and ideally statically linked. Also worked well with PHP (e.g. SquirrelMail invoking, er, sendmail).
panzi 3 hours ago [-]
I knew print '%q' but not ${var@Q}, so that is good to know!
Wicher 3 hours ago [-]
For SSH specifically (ssh user@host "command with args") I've written this workaround pseudoshell that makes it easy to pass your argument vector to execve unmolested.

https://crates.io/crates/arghsh

theamk 1 hours ago [-]
Note that at least in python, you can use "shlex.quote" instead - it's in stdlib and does not need any extra tools.

    >>> import subprocess
    >>> import shlex
    >>> subprocess.run(['ssh', 'host', shlex.join(['ls', '-la', 'a filename with spaces'])])
    ls: cannot access 'a filename with spaces': No such file or directory
works nested, too

    >>> layer2 = ['ls', '-la', 'a filename with spaces']
    >>> layer1 = ['ssh', 'host1', shlex.join(layer2)]
    >>> layer0 = ['ssh', 'host0', shlex.join(layer1)]
    >>> subprocess.run(layer0)
(I am not sure if Rust has equivalent, but if it does not, it's probably easy to implement.. Python version is only a few lines long)
eternauta3k 45 minutes ago [-]
This just confirms my habit of switching to python as soon as a shell script reaches any level of complexity
a_t48 2 hours ago [-]
I've definitely gone down the rabbit hole of trying/being forced to fix issues like this. It starts off as just someone taking a shortcut of doing a little shell scripting in a python program or whatever. Generally the best tool I've found for fixing this is python's shlex.quote - https://docs.python.org/3/library/shlex.html but YMMV (multiple levels may be needed). The real best solution is not to shell out from your program when possible. :)
blueflow 2 hours ago [-]
Its not the manual pages that are ambiguous on this, its the author who used the word "command" but seemingly had a mental model of it as if it was an argument vector. A command and an argument vector are different things....
endiangroup 2 hours ago [-]
AD: Huh! I just wrote a utility cmd [1] this weekend to deal with restricting ssh keys to executing only commands that match a rule set via `ForceCommand` in `sshd_config` or `Command=""` in `authorized_keys`. I'm curious to see how susceptible it is to the aforementioned issues, it does delegate to `<shell> -c '<cmd>'` under the hood [2], but there are checks to ensure only a single command option argument `--` is passed (to mitigate metacharacter expansions) [3].

Note this tool is only intended to be another layer in security.

[1] https://github.com/endiangroup/cmdjail [2] https://github.com/endiangroup/cmdjail/blob/main/main.go#L30... [3] https://github.com/endiangroup/cmdjail/blob/main/config.go#L...

hello_computer 3 hours ago [-]
There are so many neo-shells that go crazy with colors, autocompletions, & SQL-like features while the most basic problems (like handling of newlines/spaces/international chars) are mostly swept under the rug with -null/-print0, which is more hack than solution. I think Tom Duff's rc shell was an excellent start in that direction, which sadly went nowhere.
chubot 2 hours ago [-]
YSH addresses the "string safety" problem:

What is YSH? https://oils.pub/ysh.html

I am writing a quoting module now, but the key point is that it's a powerful enough language to do so. It is more like Python or JS; you don't have to resort to sed to parse and quote strings.

I posted the quote-argv solution above -- in YSH it will likely be:

    var argv = :| ls 'arg with space' |   # like bash argv=()
    ssh example.com $[quote.sh(argv)]
But you can write such a function NOW if you like

---

quote.sh follows the (subtle) idiom of replacing a single quote ' with

    '\''
 
which means it works on systems with remote POSIX sh, not just YSH !

e.g. "isn't" in POSIX shell is quoted as

    'isn'\''t'
which is these three word parts:

    'isn' \' 't'
YSH also has:

- JSON, which can correctly round trip every Unicode string, without writing your own parsing functions

- JSON8, an optional extension that can round trip every byte string you get from the Unix kernel

https://oils.pub/release/latest/doc/j8-notation.html

hello_computer 2 hours ago [-]
I like it. Hope it gets some traction.
chubot 52 minutes ago [-]
It also fixes the problem with eval that is shared with ssh:

    ysh-0.29$ eval ls $dir
      eval ls $dir
      ^~~~
    [ interactive ]:11: 'eval' requires exactly 1 argument
And it fixes word evaluation

YSH Doesn't Require Quoting Everywhere - https://www.oilshell.org/blog/2021/04/simple-word-eval.html

https://oils.pub/release/latest/doc/simple-word-eval.html

o11c 2 hours ago [-]
To be fair, most of those "basic problems" have basic solutions as long as you're not trying to avoid GNU tools.

The nastiest case is probably `globasciiranges`.

degamad 4 hours ago [-]
(2021)
kouteiheika 53 minutes ago [-]
> Wall of shame: Ruby's backtick feature -- provides easy access to system(3);

It also provides easy access to escape whatever arguments you want to pass:

    out = `bash -c #{arg.shellescape}`
...here "arg" will be always passed as a single argument.
3 hours ago [-]
3 hours ago [-]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 18:23:03 GMT+0000 (Coordinated Universal Time) with Vercel.