Tl,DR: Yes. But beware that you misunderstand what's going on and this may be a source of vulnerabilities.
An argument of a program can be an arbitrary byte string that doesn't contain a null byte. The limitation with null bytes comes from the operating system. Depending on your programming language and its standard library, a null byte may be rejected, or it may be passed to the operating system which will truncate the argument at the first null byte.
Bash has no limitations on the content of arguments or more generally on the content of string variables, other than null bytes, which won't reach it anyway. However, beware that when using a locale with a multibyte character encoding such as UTF-8, string operations may give unexpected results on operands that contain byte sequences that aren't valid in the chosen encoding. ASCII strings (containing only code points 1–127) are always safe. Strings containing arbitrary sequences of non-null bytes are safe as long as you set LC_ALL=C
or LC_CTYPE=C
before manipulating them.
In bash, remember to always use double quotes around variable expansions and command substitutions (i.e. "$foo"
, not $foo
). When you pass an argument to a program that takes command line options following Unix conventions, pass --
before the first non-option argument. See Why does my shell script choke on whitespace or other special characters? and Security implications of forgetting to quote a variable in bash/POSIX shells.
There is absolutely no problem with passing the argument <user_input&&malicious_command>
to a bash script. That's a perfectly ordinary string. There is, however, a problem if you interpolate this string into a shell script. Passing an argument to a program and interpolating a string into a script are completely different operations. If you call your program through a function that wraps around exec
(perhaps combined with fork
as in spawn
), this passes a list of strings as argument to the program and no interpolation is done. On the other hand, functions like system
and popen
take a single string as argument and they call a shell on that string. If you build this string by combining the name of a program and the argument(s) that you pass to this program, you need to take be careful with the content of the arguments. This has nothing to do with the program that you're calling, bash string or otherwise: the problem is that the intermediate shell parses the string. It's about how you call the program, not about the program you call.
If you call exec("foo.bash", my_string)
, everything is fine as long as my_string
is a character string that doesn't contain nulls. If you call system("foo.bash " + my_string)
(where +
is string concatenation), this only works to pass the value of my_string
as an argument if it doesn't contain characters that have a special meaning to the intermediate shell created by system
.
The following characters have a special meaning to Unix shells, at least in some circumstances: null bytes, tab, newline (LF), space, !"#$&'()*;<=>?[\]^`{|}~
. In particular, strings containing only digits, letters and -
are safe. A safe way to protect a character string that doesn't contain null bytes is to replace each single quote '
by the 4-character sequence '\''
and surround the result with single quotes '…'
. Note that this applies to Unix systems (including Linux, macOS, etc.), not to Windows which has a completely different shell. Note that in this paragraph, “safe” solely refers to the shell; as mentioned above, strings beginning with -
are problematic when you invoke typical programs that take options following a Unix convention.
Avoid using functions that spawn an intermediate shell if you can. To call an external program, prefer functions that take the path to the program and the list of arguments separately.