R2sync protocol version 1

About r2sync

R2sync is a program to (automatically or interactively) synchronize two directories. It is similar to and inspired by unison. The directories may be on different machines, as long as they are reachable by ssh or a similar program. R2sync logs its runs and only copies files that have changed since the previous synchronization. Additionally, it uses the rsync algorithm to avoid copying parts of files that haven't changed.

R2sync server protocol

R2sync runs three processes, the client (which interacts with the user) and two servers (which may be on remote machines). Typically, those servers are instances of r2sync itself, running in daemon mode (option --daemon), but they may be other programs, including long-running ones. The servers interact with the file system on behalf of the client.

The protocol between the client and the servers is text-based. The text must be encoded in UTF-8. Each command from the client to the server and each reply consists of one or more text lines. Each line ends with a line feed (U+000A) or with a carriage return (U+000D) and a line feed. The line feed and carriage return are optional on the last line before the connection is closed (end-of-file). The text must not contain a NUL character (U+0000).

Both the client and the server can send an error message instead of data in certain circumstances, as indicated below. Such error messages take the form of a line of text of the form

? code some text

where code is a decimal number indicating the error and some text is a short description of the error, e.g., ‘Unknown command’. (See the list of error codes at the end of this page.)

The commands to the server and the replies are as follows:

(At start of connection)

When a connection is opened to a server, the server must immediately send a globally unique identifier for the computer on which it runs and the highest protocol it understands, as a line of the form

ready ID N

or

ready ID N nocase

Where ID is a 32-character, lowercase hexadecimal string and N is an integer > 0 in decimal notation. If the server runs on a file system that is case-insensitive (file ‘foo’ and file ‘FOO’ are the same file), it must include ‘nocase’. E.g., if it implements the protocol defined here and runs on Linux, it must send back

ready ID 1

The client must use a protocol that is no higher than N in its subsequent commands, but it may use a lower one (see version below).

The ID must not depend on the IP address or host name of the computer, but it may depend on the user under whose name the server runs. A good ID to use is the ID generated by systemd in /etc/machine-id (if the computer is a Linux computer running under systemd).

version N [noshortcuts]

This command specifies that the client is going to use protocol number N, where N must be a decimal number > 0.

If the keyword noshortcuts is present after the version number (separated from it by a space), then the server must change how it responds to subsequent delta commands: it must not return compute whole-file checksums and digests.

The server must reply with

OK

if it understands that protocol, or with an error.

remote target

This command identifies the other server the client is communicating with. The server uses this in its logs to find the previous synchronization with the same target.

target must be separated from remote by a single space and conists of zero or more characters. (I.e., spaces are allowed, as are all other characters other than the NUL character, carriage return and line feed.)

The target is an arbitrary string, but must be unique for each target. E.g., a good target could consist of the ID that the other server sent in it ready message and the path that is synchronized on that server.

E.g., a target could look similar to this:

19ec56762ecb838762c287eb00000004 /files/robert/

The server must answer with a line

OK

if it successfully accepted the command. Otherwise it must answer with an error.

TODO: List the possible errors.

local path

This command identifies the file or directory on the server that is going to be synchronized. There must be exactly one space before the path and the path may contain spaces (but no newlines). The server must send back one of four replies:

file realpath

The server must send this reply if the path identifies a file. realpath is what the server considers the canonical name for the file. The goal is that, if there are different ways to refer to some file, all of them return the same realpath. E.g., this could be the absolute path with all symbolic links resolved (as returned by the realpath(3) function in the stdio library).

directory realpath

The server must send this reply if the path identifies a directory. As above, realpath is the canonical name for path.

other realpath

The server must send this reply if the path appears to exists but identifies something else than a file or a directory. E.g., the path might point to a device.

The server must return an error if it cannot find path (e.g., if it cannot read its directory).

reset

This command tells the server to remove the log for the current remote and local targets. I.e., for the next list command using these targets it will be as if all files are new.

It is an error if the server receives this command before it has received both a remote and a local command.

The server must answer with a line

OK

if it successfully removed the log (or made it empty), or with an error otherwise.

list

This command directs the server to check if the file or directory tree identified by the most recent local command has changed since the last synchronization.

The server checks the file, or all files in the directory and its subdirectories, against its log to see if any file has changed. It may also detect new files or files that have been deleted.

The server's reply must either be a single line with an error or start with one of these lines:

comparing

This reply indicates the server has compared, or is still comparing, the file(s) against its logs.

creating

This reply indicates the server has no logs (e.g., because of a preceding reset command) and will assume all files are new or updated.

E.g., the server may reply with an error if it receives this command before it has received both a local command and a remote command.

If the server didn't send an error, it must then report the changed files, with lines of the form

s mode time size path

path is the path of a file, relative to the path given by the local command. s is a single letter ‘=’, ‘u’, ‘n’, ‘m’ or ‘d’, indicating, respectively, that the file has not changed, has changed, is new, has its permission bits changed (but has not otherwise changed), or has been deleted. mode is the mode in octal (as given by the lstat() system call, i.e., combining the type of resource and the permission bits). time is the last modification time of the file in decimal, as the number of seconds since the epoch (1 Jan 1970). size is the size in bytes of the file in decimal. If s is ‘d’, the mode, time, size, sum and digest must still be there, but are ignored. (They could be 0, e.g.)

The list of files must end with a line consisting of a single dot:

.
lstat path

This command tells the server to find the file type and mode bits, the modification time and the size of the file given by path.

The server must reply with a line of the form

= mode time size

where mode is the file type and mode in octal, as defined by the st_mode field in struct stat, time is the last modified time of the file in decimal, as seconds since the epoch (0:00 UTC on Jan. 1, 1970), and size is the size of the file in bytes, as a decimal number.

If the file doesn't exist, the directory cannot be read, or another error occurred, the server must instead return an error.

E.g., if the server receives this command before it has received a local command, it may reply with an error.

update blocksize mode time size checksum digest path

This command tells the server to update the file so that it has the given metadata. The server may have a local file with the correct size and checksums, or it can use the rsync algorithm: return a signature (a series of checksums) of the file given by path, then wait for and read a delta (a series of patches) from the client. When successful, it logs the new modification time and size of the file.

mode specifies the mode in octal (see the chmod(2) system call) of the updated file. The time is the last modification time that should be set on the file. The blocksize is a decimal number > 0 indicating the block size for the rsync algorithm. The checksum is the ‘rolling checksum’ of the updated file. And digest is the MD5 hash of the updated file. The server must reply with an error if the line is incorrect.

If the server receives the update command before it has received both a local command and a remote command (see above), it may also reply with an error.

If the server has a local file of the same size and with the same checksum and digest, it may update the path by copying that local file and then doesn't need anything from the client. In that case it sends an error message with code 200. This is only an error in the sense that it indicates to the client that the server will not send a signature and doesn't expect a delta. But it is also a message of success, in the sense that it indicates to the client that the file has been updated as requested.

If the server sent an error, it must go back to waiting for the next command. Otherwise it continues sending lines as defined below.

The update uses the rsync algorithm. For each block of blocksize bytes of the file (the last block may be shorter), the server must return a 4-byte ‘rolling’ checksum and a 16-byte MD5 checksum.

The rolling checksum is a simple but fast checksum, defined as follows. Assume d0 dN-1 are N bytes of data for which a checksum is to be computed. Let A = ( i=0 N-1 di ) mod 216 and let B = ( i=0 N-1 (N-i) di ) mod 216 Then the checksum C = 216 B + A . (It is the same one as in rsync 3.1.2 with rsync protocol version 31. It is based on Adler-32, but with computations modulo 216 instead of modulo 65521.)

The server must send a reply that consists of zero or more lines like this:

checksum digest blocksize

where checksum is the checksum in hexadecimal (between 1 and 8 digits), digest is the MD5 hash in hexadecimal (exactly 32 digits) and blocksize is the size of the block in decimal. All except the last line must have a blocksize equal to the blocksize passed in the update command. The last blocksize may be smaller. The three items are separated by single spaces.

The last line of the reply consists of a single dot to indicate that the signature was successfully computed:

.

or an error to indicate that an error occurred.

If it sent a ‘.’ (but not if it sent an error message), the server must then wait for a delta sent by the client.

The client, if it received a ‘.’, must send either an error or zero or more lines together containing a complete delta, ending with a line with a single dot:

.

The delta completely describes a new file and consists of lines that are either of the form

base64-encoded

or

*blocknumber(+N)? (blocknumber(+N)?)*

The base64-encoded lines must be a multiple of 4 characters long and decode to a series of bytes, which are to be written literally into the replacement for the file path.

The lines may be of any length (as long as it is a multiple of 4). Longer lines are more efficient, but shorter lines are easier for debugging.

A line that starts with an asterisk indicates that the next bytes to copy into the new file consist of the block numbered blocknumber of the old file path. (The first block is block number 1.) Each block is blocksize bytes long, except possibly for the last.

If the block number is followed by a ‘+’ and a number N, it means that the next bytes after that are to be copied from block blocknumber+1, blocknumber+2,… until and including blocknumber+N of the file path,

If this is followed by a space and another block number, it means the bytes after that are to be copied from that block number. That block number may be followed by a ‘+’ and a number, and by another space and another block number, etc.

blocknumber and N are decimal numbers.

Note that blocknumber+0 is the same as blocknumber and that blocknumber+N is the same as listing the increasing block numbers individually. E.g., *12+5 is the same as *12 13 14 15 16 17.

The client should send the shortest representation of block numbers possible (use ‘+N’ when possible but omit ‘+0’, put multiple block numbers on one line separated by spaces). But for easier debugging it may limit the length of the lines.

The server tries to use this delta to update the file given by path.

The updated file must be given the mode mode as indicated.

The server must reply either with a line:

OK

to indicate the patch was successful, or reply with an error.

The server should log a successful patch, so that it knows for the next list command (possibly in the next session) whether or not the file has changed again since this patch. If the update succeeded except for setting the mode, the server should still log the new state of the file, but using the actual mode of the file, not the value of mode.

update0 blocksize mode time size path

This command is like the update command, except that no checksum and digest are passed and the server can thus not use the shortcut of copying a local file with the same checksum and digest.

delta blocksize path

This command tells the server to compute a delta (i.e., a series of patches, using the rsync algorithm) of the given file relative to the signature that is given in the next lines.

If the last version command included the noshortcuts keyword, the server must respond with either an error or a line consisting of

OK

Otherwise, the server must respond with either an error or a line of the form

checksum digest

where checksum is the ‘rolling checksum’ in hexadecimal (1 to 8 digits) of the file indicated by path and digest is the MD5 hash (exactly 32 hexadecimal digits).

If the server didn't send an error, the client must then send zero or more lines together containing a signature, exactly in the format returned by the sig command, followed by a line with one dot:

.

Unless the client send an error, the server must reply with a delta (a series of patches, see under update for the format), ending with a line with one dot:

.

if it successfully computed the delta, or with an error.

del path

This command directs the server to delete the file indicated by path, which is relative to the root given by the earlier local command.

The server must reply with a line

OK

to indicate it successfully deleted the file (or if the file already didn't exist). Otherwise it must reply with an error.

chmod mode path

This command tells the server to set the permission bits of a file. mode is an octal number between 0 and 07777 (see the Unix manual for chmod). path is a path relative to the path given by the local command.

The server must reply with a line

OK

If it successfully set the mode on the file, or with an error.

symlink time path

This command tells the server to make path into a symlink, if it isn't already. If path doesn't exist, the server must create it as a symlink. If the path already exists, and isn't a directory, the server must try to delete it and recreate it as a symlink.

path is relative to the path given by the preceding local command. time is the last modification time (in seconds since 1 Jan. 1970, as a decimal number) that the symlink (not the target of the link) should be set to.

The client must follow this command with a line containing the path that the symlink must point to. This is an arbitrary string. It need not indicate an existing file or directory.

The server must return either a line

OK

if path is a symlink to the path given in the second line, or the server succeeded in making it so. Otherwise, the server must return an error.

This command tells the server to return what the symlink path points to. It must return a line of the form

= string

where string is the contents of the symlink path. (It doesn't matter whether string points to an actual file or is a ‘dangling link’.) If path is not a symlink or another error occurs (e.g., path cannot be opened), the server must return an error.

? code some text
log mode time size checksum digest path
This command tells the server to update the log for path with the given metadata.

Error codes

Both the server and the client can report errors, by means of lines of the form

? code some text

The following values for code are defined:

code meaning
200 Shortcut: update already done
300 Not enough data to compute a delta
301 Not enough data to patch a file
400 Syntax error
401 Command ‘remote’ was not yet given
402 Command ‘local’ was not yet given
403 Missing or incorrect block size
404 Unknown command
405 Unknown protocol version
406 Illegal value for file mode
407 Missing time value
408 Path is longer than FILENAME_MAX
409 No log available because of an earlier error
410 Tried to change mode of something other than a regular file
411 Invalid syntax for delta
412 Failed to get a unique system ID.
500 Server error
>500 (See below)

Codes above 500 represent system errors, such as out of memory or no permission to read a file. The code corresponds to errno + 500. See the Unix manual page for errno(3) for an explanation of errno.


Bert Bos
Last modified: Sat Jan 26 12:23:31 CET 2019