Sync Up Files with rsync

This command is awesome for backup with complicated file structure and frequently modifications. It’s more elegant and smart(skip unchanged files) than using portable storage or scp to transfer the files.

Note that SSHFS monut may also fit the needs.

Introduction

Let’s see what is rsync from wiki: rsync is a utility for efficiently transferring and synchronizing files between a computer and an external hard drive and across networked computers by comparing the modification times and sizes of files.

rsync will use SSH to connect. Once connected, it will invoke the remote host’s rsync and then the two programs will determine what parts of the local file need to be transferred so that the remote file matches the local one.

rsync can also operate in a daemon mode, serving and receiving files in the native rsync protocol (using the rsync:// syntax). Here I only talks SSH way.

How to Exclude Files and Directories with Rsync Rsync Command in Linux with Examples

Usage

To get rsync working between two hosts, the rsync program must be installed on both the source and destination, and you’ll need a way to access one machine from the other.

Copy files to remote home or from remote to local

1
2
3
rsync files remote:
rsync files user@remote:
rsync user@remote:source dest

If rsync isn’t in the remote path but is on the system, use --rsync-path=path to manually specify its location. Unless you supply extra options, rsync copies only files. You will see:

1
skipping directory xxx

To transfer entire directory hierarchies, complete with symbolic links, permissions, modes, and devices, use the -a option.

1
2
3
4
5
6
7
8
9
10
11
12
# -n: dry-run, this is vital when you are not sure.
# -P: show progress bar
# -v: verbose mode
# -z: compress during transfer
# -a: archive mode, equals -rlptgoD
# here rsync a file and a dir
rsync -n -P -vza file dir user@remote:<path>

# -q: quiet
# -e: choose a different remote shell
# for example remote ssh uses a port other than 22
rsync -q -e "ssh -p 2322" file user@remote:<path>

To make an exact replica of the source directory, you must delete files in the destination directory that do not exist in the source directory:

1
2
# --delete: delete extraneous files from dest dirs
rsync -v --delete -a dir user@remote:

Please use -n dry-run to see what will be deleted before performing command.

Be particular careful with tailing slash after dir:

1
2
# dir vs dir/
rsync -a dir/ user@remote:dest

This will copy all files under dir to dest folder in remote instead of copy dir into dest.

You can also --exclude/--include=PATTERN and --exclude-from/--include-from=PATTERN_FILEin command.

To speed operation, rsync uses a quick check to determine whether any files on the transfer source are already on the destination. The quick check uses a combination of the file size and its last-modified date.

When the files on the source side are not identical to the files on the destination side, rsync transfers the source files and overwrites any files that exist on the remote side. The default behavior may be inadequate, though, because you may need additional reassurance that files are indeed the same before skipping over them in transfers, or you may want to put in some extra safeguards:

  • --checksum(abbreviation: -c) Compute checksums (mostly unique signatures) of the files to see if they’re the same. This consumes additional I/O and CPU resources during transfers, but if you’re dealing with sensitive data or files that often have uniform sizes, this option is a must. (This will focus on file content, not date stamp)

  • --ignore-existing Doesn’t clobber files already on the target side.

  • --backup (abbreviation: -b) Doesn’t clobber files already on the target but rather renames these existing files by adding a ~ suffix to their names before transferring the new files.

  • --suffix=s Changes the suffix used with –backup from ~ to s.

  • --update (abbreviation: -u) Doesn’t clobber any file on the target that has a later date than the corresponding file on the source.

For example, sync my code repo in local host to remote for testing and developing, after verifying, sync back to local host to check in:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# forward sync source proj folder itself to dest
# result in remote: /home/chengdol/proj
rsync -vza \
./proj \
remote_user@remote:/home/chengdol

# then coding and editing

# backward sync remote proj folder itself to current directory
# result ./proj
rsync -vza \
# exclude folder inside remote proj
--exclude .terraform \
--exclude output \
--exclude utils/__pycache__ \
--exclude deployment/__pycache__ \
remote_user@remote:/home/chengdol/proj \
.

# you will see the incremental transferred files, as well as the .git changes
0%