Bash redirections
File descriptors
An open file requires a "file descriptor", which is a special kind of data structure stored at the kernel level and mapped to some integer value for processes that have access to it.
For various reasons (i.e. performance), file descriptors are stored in specialized, fixed-size data structures, which is why it's important to remember to close files when you're finished with them.
Traditionally, unix processes are spawned with at least three open file
descriptors: 0 is the standard input (a.k.a. stdin
), 1 is the standard output
(a.k.a. stdout
), and 2 is a secondary output stream meant for error reporting
(a.k.a. stderr
). Most processes are created using a variation on the fork
system call, which copies all file descriptors into the new process. In the
case of simple Bash commands, this means that by default the stdxx
of a
command you run is the stdxx
of the current Bash session (which is generally
what you want for interactive programs).
Pipes
The most common form of redirection is the pipe: |
. The expression
proc_a | proc_b
will create a special 'pipe' file, then start process proc_a
with that file
as its standard output and process proc_b
with that file as its standard
input. Note that the stderr
of proc_a
is not redirected, which means it is
the same as the parent's (the Bash process that runs this command). For
example:
$ (echo "this goes to stdout"; echo "this goes to stderr" >&2) | tr [:lower:] [:upper:]
this goes to stderr
THIS GOES TO STDOUT
$
As you can see, only the text sent to stdout
is piped through the tr
command (which in this case turns any lowercase letter into the corresponding
uppercase letter). If you're wondering about the >&2
form, you're in luck, as
that is what the next next section is about. Don't skip the next one, though.
Redirecting to a file
Pipes are great for chaining programs together, but they don't get rid of the ephemeral nature of standard input and outputs. Sometimes, it's nice to save the output of a command to file, or to drive a program from an existing file rather than having to type things out.
The most common file redirections are >
and <
, which will, in their naked
form, respectively redirect stdout
and stdin
to/from the given file. We can
illustrate this with the rev
program, which reads its standard input one line
at a time and prints it to its standard output in reverse:
$ rev
hello
olleh
this is not a palindrome
emordnilap a ton si siht
$ rev > out
hello
this is not a palindrome
$ cat out
olleh
emordnilap a ton si siht
$ <out rev
hello
this is not a palindrome
$ rev < out
hello
this is not a palindrome
$
Note that, while it does not matter to Bash, it is usually considered better form to put the redirections after the command.
The >
and <
commands actually take arguments that make them a lot more
versatile than you might think at first. Obviously, from the examples above,
they take an argument to their right, which is a path to the file you may want
to open. There are, however, a couple variations here.
First off, it's probably better to think of them as "open for reading" and
"open for writing" than as "redirect input" and "redirect output". They both
can take another argument, to their left, to indicate which file descriptor
they are opening; <
just happens to default to 0 and >
to 1. So you could
start a process by making its file descriptor 0 (stdin
) write-only (echo "hello" 0>file
), or its file descriptor 1 (stdout
) read-only (echo "hello" 1<file
). Neither of these works in most circumstances because most programs
are written under the assumption that they can read from 0 and write to 1. So
that's not very useful so far.
This syntax is, however, useful in redirecting the stderr
of a program. If
you recall from the introduction of this post that stderr
is file descriptor
2, you can now understand the notation 2>error.log
as meaning "start this
program with file descriptor 2 pointing to the file error.log
in write-only
mode". Quite frankly this is by far the most common use of this "first
argument" of the >
and <
"redirections" (and the only one I have ever
used), but I can imagine scenarios where opening other file descriptors may
work, assuming the program you are running is designed to expect, say, a
special file on file descriptor 3, e.g. 3>/tmp/trace_level_log
or something.
Finally, it is worth noting that the >
operator will truncate the given file
if one already exists. This means that any existing content in the file is
lost. If you want to instead append to an existing file, you can use >>
instead.
Redirecting to a file descriptor
It is sometimes convenient to map a file descriptor to another, existing one.
The syntax for this uses the &
symbol followed by a number instead of a file
name. For example, 2>&1
will redirect stderr
on stdout
. Note that this is
actually done by cloning the file descriptor for stdout
(and possibly making
the result write-only if it wasn't already); it is not "piping" anything
written to file descriptor 2 through to file descriptor 1.
This is important because it means that further modifications of file descriptor 1 are not propagated to file descriptor 2. Witness:
$ (echo "stdout"; echo "stderr" >&2) >out_first 2>&1
$ (echo "stdout"; echo "stderr" >&2) 2>&1 >out_second
stderr
$ cat out_first
stdout
stderr
$ cat out_second
stdout
$
You can see that, in the second case, because stdout
is changed afterstderr
has been set, stderr
goes to the "old" stdout
, i.e. the terminal
instead of the file.
Note that the same syntax works for read redirection too, i.e. 4<&7
would
create a file descriptor 4 as a clone of the existing file descriptor 7 (but
read-only), but I have never had a need for that. Also, if using a -
instead
of a number to indicate which file descriptor to copy, this will close the file
descriptor to the left (such that accessing it is an error):
$ echo hello 1>&-
bash: echo: write error: Bad file descriptor
$ echo hello 2>&-
hello
$ echo hello 0<&-
hello
$
Because echo hello
does not try to write to stderr
or read from stdin
,
closing them is not an issue. Closing stdout
, however, does make it crash.
Note that you can create file descriptors that are not used by the program,
with no other adverse effect than an open file descriptor that won't be closed
until the end of life of that process. (Remember, file descriptors are a
precious resource.) This can be used, for example, to swap stdout
and
stderr
:
$ (echo stdout; echo stderr >&2) 3>&1 1>&2 2>&3 | sed 's/std//'
stdout
err
$
In this case we have created file descriptor 3 just to hold the data that was
associated with stdout
so we can swap stdout
and stderr
.
At this point, you may be wondering: if >
creates a write-only file
descriptor and <
creates a read-only one, wouldn't it also be useful to have
a way to create a read-write file descriptor? If so, you're in luck. Sort of.
Bash does have a way to create a read-write file descriptor, using the <>
operator, which takes the same arguments using the same syntax as the other
two. I've never had a use for it, though, so I'm not entirely sure about how
useful it is.
Finally, because the form >file 2>&1
is so common, there is a shorthand for
it: >&file
(where file
is not a number nor a dash). Or &>file
; both are
equivalent.
Subshells as files
Because everything is a file, we can use an entire subshell as a file descriptor. This is the same idea as a pipe, except that the pipe is strictly defined as connecting the file descriptor 0 of a process onto the file descriptor 1 of another process.
Many programs use other files than the three default ones. As a very simple
example, the cat
command takes a file name and displays its contents:
$ cat out
line 1
line2
line3
$
In any situation where you need a file to pass into a program for reading, you
can substitute a subshell using the syntax <()
:
$ cat <(echo "hello" | tr e z)
hzllo
$
Obviously the benefit of using cat
in this way is limited, but hopefully you
get the idea. This also works for output redirection, using the >()
syntax.
For example, the tee
command will copy its stdin
to its stdout
as well as
to any number of file names given as arguments. This is very useful for
extracting intermediate logs from long pipe expressions. Here is an example of
using tee
to illustrate the >()
syntax:
$ echo "hello" | tee >(cat) >(cat | tr [:lower:] [:upper:]) log > out
hello
HELLO
$ cat log
hello
$ cat out
hello
$
The first hello
output line is the result of the first argument to tee
,
namely >(cat)
. The second output line, HELLO
, is the result of the second
file tee
writes to: >(cat | tr [:lower:] [:upper:])
. Finally, the third
argument of tee
instruct it to write to the log
file, while the stdout
of
tee
is redirected to the out
file.
HERE documents and inline strings
Bash can also turn plain strings into input files. The <<<
notation will pipe
a string through to stdin
; for example:
$ bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
1+3
4
^D$ bc <<< "1 + 3"
4
$ echo "1 + 3" | bc
4
$
As you can see, piping the 1 + 3
expression using echo
is the same same as
creating it directly as stdin
using <<<
and (minus the banner) also the
same as just typing it manually as interactive input to the program.
If your string is a bit longer, you can use what is known as the "HERE
document" notation, for reasons that will hopefully become clear soon enough.
The syntax starts with the symbol <<
followed by a word, and ends with the
same word alone on a single line. The word used is arbitrary, but HERE
and
EOF
are the most common ones I've seen.
$ tr [:lower:] [:upper:] <<HERE
> first line
> second line
> third line
> HERE
FIRST LINE
SECOND LINE
THIRD LINE
$
There are a couple things to note about HERE documents. First, the document actually starts on the next line. So you can have more content on the original line:
$ tr [:lower:] [:upper:] <<HERE | sed 's/I/A/g'
> first line
> second line
> third line
> HERE
FARST LANE
SECOND LANE
THARD LANE
$
Second, by default, the HERE document behaves like a double-quoted string, meaning you can use Bash variables and subshells within it. If that is not what you want, you can surround the end word with single quotes:
$ var="replace me"
$ cat <<HERE
> var: $var
> HERE
var: replace me
$ cat <<'HERE'
> var: $var
> HERE
var: $var
$
Honorable mention: /dev/null
This is not a redirection feature, but it is often used with redirections, so
I think it is worth mentioning. On any unix system, there is a special device
file called /dev/null
that will accept any write and just discard it
immediately. This is useful when you are running a program and only care about
a subset of its (possibly many) output streams. For example:
$ du -hs /* 2>/dev/null | sort -h
In this case, because we are trying to collect information about /
, it is
very lilely there will be files du
cannot read. Normally, it would print a
line on stderr
for every such line. However, in this case, I don't really
care about that and I accept that the final result may not be entirely accurate
due to such errors.
If you had a program that requires multiple files to write to, you could also use it in combination with the subshell redirection feature:
$ ./annoying-program --info-logs /var/log/keep-this \
--debug-logs >(cat > /dev/null) \
--trace-logs >(cat > /dev/null)
Other useful devices are /dev/random
and /dev/zero
, which will both accept
any read request and respond to it with, respectively, random bytes and zeroed
bytes.