Programming, hacking, Linux and beyond: tee

Process substitution

Process substitution takes the form of <(list) or >(list). The process list is run with its input or output connected to a FIFO or some file in /dev/fd. The name of this file is passed as an argument to the current command as the result of the expansion. If the >(list) form is used, writing to the file will provide input for list. If the <(list) form is used, the file passed as an argument should be read to obtain the output of list. When available, process substitution is performed simultaneously with parameter and variable expansion, command substitution, and arithmetic expansion.

So what does this mean? Well basically that you can substitute files for command input/output. Imagine you want to compare the output of two files using diff you could do something like

$ diff <(echo afile) <(echo anotherfile)

Now, the real power comes with using process substitution in conjunction with tee. Just look at the next command and be stunned with the usefulness of this little nifty piece of feature.

$ ps aux | tee >(grep ^root > /tmp/rootps) >(grep ^simon > /tmp/simon)

Which will list all processes, write all processes owned by root to one file and the processes owned by simon to another file.

Convenient, huh?

5 Linux shell commands you should know about

Over the years I have gotten used to some very nifty commandline tools that I use more or less every day. Let's go through five of them right here.

This article will assume you have basic knowledge of the bash shell (good tutorial here http://mywiki.wooledge.org/BashGuide).

#1 tee

The `tee' command copies standard input to standard output and also to any files given as arguments. This is useful when you want not only to send some data down a pipe, but also to save a copy.

$ ./longrunningprocess | tee data.log

That is probably the simplest way you can use tee. Since it passes the stream on to stdout again you can use it as a proxy - storing the data to file but passing it on to the next command.

$ cat data | awk '{print $1+1}' | tee plusone | awk '{print $1-2}' > minusone

What that line does is - take the file data, containing some rows with numbers in it, pass it to awk and add one to each line, pass the modified stream to tee which stores it to file and once again passes it on to awk which subtracts two and writes it to another file.

Another more interesting way to do the same thing is

$ cat data | tee >(awk '{print $1+1}' > plusone) >(awk '{print $1-1}' > minusone)

which sends the output of data to two pipes running their separate versions of the awk script.

#2 wget

Wget is one of those tools that I use the most. Combining it with tee can make it a powerful tool.

$ wget www.kernel.org/pub/linux/kernel/v3.0/testing/linux-3.4-rc4.tar.bz2 -O - | tee kernel.tar.bz2 | tar xjvf -

That will download the linux kernel, have tee store it to file while tar decompresses it on the fly.

#3 awk

GNU awk is totally invaluable to me. I won't go in-depth on it here but look at my earlier post covering awk http://simonslinuxworld.blogspot.se/2012/04/awk-tutorial-by-example.html

#4 sed

Sed is a stream editor - that is - pass a stream to it, tell it what to edit and how to do it - store the output, or do something equally useful with it.

Let's say I know a guy, who knows a guy who once downloaded a season of a TV-show illegally. He told me how he needed to rename the files of the show to a specific naming convention for his XBMC media center to be able to download information about the show from some website. In this case (and many others) sed is needed.

$ ls | sed -r "s/(.+)_(.+).mkv/mv & \"Series (2012) \2.mkv\"/" | bash

Piece of cake - list files, substitute the filename series_S01EXX.mkv for mv series_S01EXX.mkv "Series (2012) S01EXX.mkv", pass it to bash for evaluation.

If you want bittorrent to still find the original files just make symlinks instead of moving the file (that's what he did).

#5 xargs

xargs reads items from stdin, delimited by blanks or newlines, and executes the command one or more times with any initial arguments followed by items read from standard input. Blank lines on the standard input are ignored.

Example: Remove all files matching pattern *~ (tempoary emacs files) recursively.

$ find . -name "*~" -type f -print | xargs rm -f

If you need more control over how the items are inserted into the command to execute you can use {}, which gets substituted by the actual item when the command is executed. E.g.

$ find . -name "*~" -type f -print | xargs -n 1 -I{} mv {} /tmp

Since mv only accepts two paths we add -n 1 to xargs to make it execute each command with one of the arguments. -I{} is used to tell xargs to use {} for substitution.

Final words

I hope you have enjoyed this little infomative post about useful linux commands. There's a lot of options to them so I suggest you check out the manpages to make full use of the commands.

2012-04-24

Process substitution

Process substitution

5 Linux shell commands you should know about

5 Linux shell commands you should know about

#1 tee

#2 wget

#3 awk

#4 sed

#5 xargs

Final words