Shell (POSIX)¶
POSIX defines a Shell Command Language.
The interpreter should be in /bin/sh, so executable shell scripts start with #!/bin/sh.
Given the POSIX definition is rather minimal, there are POSIX-compatible shell like bash and zsh which offer more features.
Shells have been rather stable, provide a rather simple interface, and don’t use much space on disk - they also should start faster than interpreters of heavier languages.
Tip
Note
Various “programs” you call in your shell might not actually run executables, but are builtins in your shell. Sometimes because they have to (accessing state of the shell itself), sometimes because for performance reasons.
Shell builtins don’t have real manpages - they are usually documented in the manpage of their shell.
They also don’t have a --help option or similar.
bash has a help builtin though which can be used to show a short help for a builtin.
POSIX (unix) basics¶
Starting a process (i.e. running/calling an executable) is done using execve in C (or some variant of it). This takes the path to an executable, an array of arguments, and a list of environment variables.
The kernel will clear most of the current state of a process (memory, threads), and then loads the new executable. The argument and environment list and (NUL-terminated) strings are copied to the new process. File descriptors (including network sockets) are kept by default (unless FD_CLOEXEC was set).
Note
Often you want to run a new process as child process of yourself; so usually one starts with fork, and then runs execve in the child process (where fork() returned 0).
Shells fork a new process to run (external, non-builtin) commands by default; avoid this by prefixing the command with exec, which will replace the shell with a new program (this will skip EXIT traps though).
Note
$PATH evaluation to locate executables is done by variants including a p modifier like execvpe.
When you run cp a b in a shell, the shell has to split this into the argument array (["cp", "a", "b"]), and then runs execvp("cp", ["cp", "a", "b"]), which should end up in calling something like execve("/usr/bin/cp", ["cp", "a", "b"], environ).
The target program (cp) will then receive ["cp", "a", "b"] as int argc, char **argv arguments to main.
Warning
The name of the called program in argv[0] (here: cp) is provided by the caller, and isn’t necessarily the name of the binary. Depending on the context this shouldn’t be trusted.
As the strings are NUL-terminated, neither argument nor environment variables can contain NUL-bytes, and are therefore not binary safe.
Filenames can never contain NUL though, so this is usually not a problem.
Note
Windows handles this differently; CreateProcess is similar to a combination of fork and execve, but takes a single commandline string instead of an argument list. The new process can use CommandLineToArgvW to split that string into an argument list, and the escaping rules are rather crazy (\ only escapes a following \, if a block of \ is followed by a "…).
Windows doesn’t provide fork itself; but cygwin provides a way to emulate it.
Filedescriptors¶
Each open file (disk, special devices, network sockets, …) is assigned a file descriptor (“fd”), which is an integer. The kernel will use the smallest free fd for a new file, so they can be used as array indices.
By convention fd 0 is stdin, fd 1 is stdout and fd 2 is stderr.
On linux you can check the list of open files in your current shell with ls -l /proc/$$/fd - usually stdin, stdout and stderr point to the pseudo terminal (TTY).
Argument handling¶
After receiving a list of arguments the exact interpretation is up to the program.
Often one splits the arguments into “options” (which might have a fixed number of addition value parameters; an option without value is often called “flag”) and positional arguments (filenames, commands to run “within”).
Options often have “short names” (a single character) and long names.
Option handling often is done like this:
- Short options are prefixed with
-, e.g.-s - Long options are prefixed with
--, e.g.--long-opt1 - Multiple short flags can be joined and prefixed by a single
-, like-stv- The last option in such combined list sometimes is allowed to have a value, e.g.
tar -xf archive.tar ...
- The last option in such combined list sometimes is allowed to have a value, e.g.
- Options taking a single value might support the following styles:
-s value,--long-opt value-svalue(unless value is the empty string)-s=value,--long-opt=value
--ends all options handling; the remaining parameters are treated as positional arguments, never as options. This should be used to separate options from untrusted input like filenames.- Programs should complain when unknown options (anything starting with
-) are passed, and not treat them like positional arguments.
Common options are:
-?,--help: ask for short help-hoften is another alias for this, but some programs also use this as short name for options like--host(name)to connect to
--version: print version and other meta data (author, upstream contact, build configuration)--verbose,-v: verbose logging--debug,-d: debug logging
Conditions¶
Often you will see if statements like if [ -f "/etc/my/config.txt" ]; then ... fi.
if actually executes a passed program; if the program returns with exit code 0, it is considered “successful” (i.e. true). Codes different from 0 are treated as failures.
[ is a “normal” executable, often the same as test (both might be builtins in your shell for performance reasons). In the example above the shell might run the executable /usr/bin/[ with the arguments ["[", "-f", "/etc/my/config.txt", "]"].
Warning
Treating 0 as success / true and everything else as failure / false is unintuitive for many programmers (as many other programming languages will do the opposite).
But this is actually the standard way to handle exit codes. The idea here is that to encode a failure reason in the exit code you need many codes for failures, but only one for success).
Proper languages don’t let you treat integers as booleans in the first place
Variables¶
Variables in a shell contain simple (NUL-terminated) strings; unset variables expand to empty strings.
Variables are inherited from the environment, and shells often set certain variables explicitly.
Variables are not exported automatically to the environment; when calling other executables only environment variables are passed on.
Exporting can be done either with export VAR (combined with setting the value: export VAR="...") or exporting a variable just for a single command as in VAR=value command arg1 arg2.
Expanding a variable name VAR is done with the construct ${VAR}; expansion is done outside of quotes and inside of double quoted strings, but not in single-quoted strings. In most cases the curly braces could be omitted, but try the example - for consistency I recommend always using curly braces.
VAR=value
echo test 1: ${VAR}
echo test 2: "${VAR}"
echo test 3: '${VAR}'
echo test 4: "${VAR}_suffix"
echo test 5: "$VAR_suffix"
Depending on the shell echo supports parameters. This doesn’t matter if the echoed string doesn’t start with an untrusted variable, but otherwise prefer something like printf "%s\n" "${VAR}" over echo "${VAR}".
Shells often support additional features in ${...} constructs like default values (and assignment), replacing patterns and more - check the manpage of your shell.
Quoting¶
The shell splits by whitespace in many places, so quoting is needed. In double-quoted strings variables are expanded (unless $ is escaped by \$), in single-quoted strings no expansion is done.
Note
I recommend quoting by default (double-quoted - unless otherwise needed).
Sometimes you might want to gather arguments for a command like this:
args=""
if [ ... ]; then
args="${args} --flag1"
fi
if [ ... ]; then
args="${args} --flag2"
fi
runcommand ${args}
In this case you want the shell to split the gathered arguments by whitespace - runncommand "${args}" would quite likely not work.
This means the collected arguments should contain spaces, or need to be carefully quoted / escaped again.
Note
zsh doesn’t split words in unquoted parameter expansion by default, but sh and bash do.
To build proper argument lists you need arrays:
Arrays¶
Not supported by POSIX sh, except for the array of passed arguments:
bash and zsh support named arrays:
pa() { local elem; for elem in "$@"; do echo "Element: ${elem}"; done; } # (1)!
# create an array with zero elements
a=()
# append elements to an array
text="abc xyz"
a+=("$text" Z)
# expand each array entry to a separate parameter
pa "${a[@]}"
echo "Number of entries: ${#a[@]}"
print each parameter on a single line
If you need two dimensional arrays you have to use some way to “flatten” it - if the second dimension is fixed this is easy, otherwise you could store offset and length of each subarray in the flattened array.
Associative arrays are also supported:
-
One has to explicitly tell the shell that a variable is an associative array.
If the variable was a normal array before
declare -Awould fail -unsetfixes that.
You can get a list of all keys with "${!t[@]}" in bash and "${(k)t[@]}" in zsh. The list of all entries works the same as in normal arrays.
set -e - “abort on error”¶
Abort shell scripts if a command fails. It doesn’t trigger in anything called from if, elif, while and until conditions, and only in the last component of && and || expressions (read the man page for details of your shell).
Given the rather complex restrictions this should only be used in simple scripts.
Warning
See https://mywiki.wooledge.org/BashFAQ/105 for more example what can go wrong.
Info
Shells usually accept -e as parameter too, like #!/bin/bash -e
- POSIX set builtin
- bash set
- Part of the zshoptions “sh/ksh emulation set” (ERR_EXIT)
Subshells¶
There is probably much more to say on this subject.. so just some small examples.
Subshells can be created with (commandlist), also asynchronous commands are executed in a subshell. Changes in subshells don’t affect the parent:
Commands in a pipe may be executed in a subshell (bash does this for all commands in a pipe, zsh for all but the last one), so this works in zsh but not in bash:
Subshells inherit everything (including non-exported variables) from the parent shell apart from the trap functions which are reset to their default behaviour.
Redirects¶
Note
POSIX only requires support for file descriptors 0 - 9 - for example dash and zsh are limited to single digit file descriptors.
This is only a short list - for details lookup some shell man pages (this is from the dash man page).
[n]> fileRedirect standard output (or n) to file; fails if “noclobber” option (-C) is active and the file exists and is a regular file.[n]>| fileSame, but ignores noclobber.[n]>> fileAppend standard output (or n) to file.[n]< fileRedirect standard input (or n) from file.[n]<&mDuplicate standard input (or n) from file descriptor m.[n]<&-Close standard input (or n). Not POSIX.[n]>&mDuplicate standard output (or n) to m.[n]>&-Close standard output (or n). Not POSIX.[n]<> fileOpen file for reading and writing on standard input (or n).
Reference
Special redirects¶
“Executing” redirects applies the redirects to the current (sub)shell.
Redirect stderr to a logfile:
Redirect stderr to stdout
Moving filedescriptors (bash only)¶
Move fd 6 to fd 5. Same as exec 6>&5 5>&-:
Reverse pipe, process substitution (bash, zsh)¶
Above there was an example with subshells that fails in bash. Here is how to fix it:
<(...) (and >(...)) is replaced by a filename to a pipe which is connected to the specified command.
Redirect multiple pipes to one process:
Can be quite useful to find the difference in the output of two commands:
Tip
Also read Process Substitution.