Tar Pipe

This is the first in a series of posts I began this summer and only now have time to finish.

Every once in a while, I find myself needing to copy a large number of files from one Linux machine to another, ideally as fast as possible.  There are a lot of ways to do this, but the most common method usually goes something like this:

  • Tar everything up (with some form of compression if your network connection is slow).
  • (S)FTP/SCP  the file to the new server.
  • Move the file to the new location, making directories as needed.
  • Extract the tar file into the new directory.

This is all well and good, and it tends to work well in most cases–it’s just kind of laborious.  I prefer a simpler method that basically wraps everything up into a single step, affectionately known as a tar pipe.  The (admittedly somewhat complex) command follows.

  SRCDIR=  # fill in with your source directory
  DESTDIR= # fill in with your destination directory--note that your
           # uploaded directory will appear inside this one
  USER=    # fill in the your remote user name
  HOST=    # fill in with your remote host name
  tar -cvzf - $SRCDIR | ssh $USER@$HOST "mkdir -p $DESTDIR; tar -xz -C $DESTDIR

The variables are just to make things a little more easy to read (feel free to ignore them if you like), and I do recommend using a full path for the DESTDIR directory, but the basic process is ridiculously easy.  Here’s the breakdown on how the whole thing works.

  1. The tar -cvzf - $SRCDIR very obviously tars everything up, just like you normally would.  The key difference from the normal tar procedure is the fact that the “file” you’re creating with tar is actually sent to stdout (by the -f - option) instead of being written to the file system.  We’ll see why later.
  2. The | (pipe) passes everything on stdout on as stdin for the next command, just like normal.
  3. The ssh command is the fun part.
    1. We start an ssh session with $HOST as $USER.
    2. Once that’s established, we run two commands.
      1. mkdir -p $DESTDIR to make the destination directory, if needed.
      2. tar -xz -C $DESTDIR to untar something. What, we’re not sure yet.

What it untars is a bit of a mystery, as we don’t really tell it what it’s supposed to work on.  Or do we?  As it turns out, ssh passes whatever it receives on stdin on to the command it runs on the server.  I.e., all that stuff we just tar’red up gets passed along through the magic of piping from the local machine to the remote machine, then extracted on the fly once it gets to that machine.

You can see the benefit of this, I trust–instead of that whole four command process we detailed above, including manually logging into the remote server to actually extract the new file, we have one fairly simple command that handles taring, uploading, and extracting for us, with the added benefit of not requiring us to actually create any files we don’t have to create.  That’s kind of cool, right?

Note:  I’ve seen other implementations of the tar pipe, but this is the one I’ve used been using recently.  It’s worked for me on Red Hat 5, but your mileage may vary.

2 thoughts on “Tar Pipe

  1. I’m rather partial to:

    tar c . | pv -s (size) | ssh -C user@host ‘cd (destdir); tar xp’

    Pv gives you a progress meter, but you need to know the size of the data to calculate percentages.

    • Cool! The size shouldn’t be that big of a problem, as we can just grab a rough estimate from `ls` or `du` . We’d lose the benefits of gzip compression that way, though–hopefully ssh -C will make up some of the slack.

Comments are closed.