What is Text Diff?
A text diff tool compares two pieces of text and identifies exactly what changed between them. The output -- called a diff or a patch -- shows which lines were added, removed, or modified. This is the same fundamental operation that powers version control systems like Git, code review platforms, and document comparison features in word processors. The tool takes two inputs (often called the "original" and the "modified" or "left" and "right"), runs a comparison algorithm, and produces a structured representation of the differences. You see the changes highlighted visually: removed text in one color, added text in another, and unchanged context lines in between to help you orient yourself within the document.
The most widely used algorithm for computing diffs is the Myers diff algorithm, published by Eugene Myers in 1986. Myers' algorithm finds the shortest edit script -- the minimum number of insertions and deletions needed to transform the original text into the modified text. It works by modeling the problem as finding the shortest path through an edit graph, where moving right represents deleting a character from the original and moving down represents inserting a character from the modified text. Diagonal moves represent matching characters that require no edit. The algorithm runs in O(ND) time, where N is the total length of both inputs and D is the size of the minimum edit script. For inputs that are mostly similar -- which is the typical case in version control -- D is small and the algorithm is fast.
Under the hood, most diff algorithms rely on finding the Longest Common Subsequence -- or LCS -- between two sequences. The LCS is the longest sequence of elements that appears in both inputs in the same order, though not necessarily contiguously. Once you have the LCS, the diff is everything that is not part of it: elements in the original but not in the LCS are deletions, and elements in the modified text but not in the LCS are insertions. The classic dynamic programming solution for LCS runs in O(mn) time and space where m and n are the lengths of the two inputs. Myers' algorithm improves on this for the common case where the differences are small relative to the total size. Git uses a variant of Myers' algorithm by default, with an option to switch to a patience diff or histogram diff for cases where Myers produces confusing output.