f78 Comparing and Merging Files

Contents|Index|Next

Overview of Comparing and Merging Files

Computer users often find occasion to ask how two files differ. Perhaps one file is a newer version of the other file. Or maybe the two files started out as identical copies but were changed by different people.

This documentation first concentrates on making diffs, and later shows how to use diffs to update files. You can use the diff command to show differences between two files, or each corresponding file in two directories. diff outputs differences between files line by line in any of several formats, selectable by command line options. This set of differences is often called a diff or patch. For files that are identical, diff normally produces no output; for binary (non-text) files, diff normally reports only that they are different.

See What Comparison Means as a general introduction, and then on to other detailed documentation in the following.

For specific keyword searches in this documentation, see Index for finding keyword text.

You can use the cmp command to show the offsets and line numbers where two files differ. cmp can also show all the characters that differ between the two files, side by side. Another way to compare two files character by character is the Emacs command, M-x compare-windows. In The GNU Emacs Manual, see "Comparing Files" for more information, particularly on that command. See Comparing Three Files, Invoking diff3 and Invoking cmp for specific details on comparing files. You can use the diff3 command to show differences among three files. When two people have made independent changes to a common original, diff3 can report the differences between the original and the two changed versions, and can produce a merged file that contains both persons' changes together with warnings about conflicts. See Merging From a Common Ancestor.

You can use the sdiff command to merge two files interactively. For more, see Interactive Merging with sdiff and Invoking sdiff.

You can use the set of differences produced by diff to distribute updates to text files (such as program source code) to other people. This method is especially useful when the differences are small compared to the complete files. Given diff output, you can use the patch program to update, or patch, a copy of the file. If you think of diff as subtracting one file from another to produce their difference, you can think of patch as adding the difference to one file to reproduce the other. See Merging with patch and Tips for Making Patch Distributions for more documentation on patch.

For discussion on future and on-going projects, see Future Projects.

GNU diff was written by Mike Haertel, David Hayes, Richard Stallman, Len Tower, and Paul Eggert. Wayne Davison designed and implemented the unified output format.

The basic algorithm is described in "An O(ND) Difference Algorithm and its Variations" by Eugene W. Myers, in Algorithmica; Vol. 1, No. 2, 1986; pp. 251–266; and in "A File Comparison Program" by Webb Miller and Eugene W. Myers, in Software—Practice and Experience; Vol. 15, No. 11, 1985; pp. 1025–1040.

The algorithm was independently discovered as described in "Algorithms for Approximate String Matching" by E. Ukkonen, in Information and Control; Vol. 64, 1985, pp. 100–118.

GNU diff3 was written by Randy Smith.

GNU sdiff was written by Thomas Lord.

GNU cmp was written by Torbjorn Granlund and David MacKenzie.

patch was written mainly by Larry Wall; the GNU enhancements were written mainly by Wayne Davison and David MacKenzie.

Parts of this documentation are adapted from writings by Larry Wall with his permission.

0