Diff Algorithms Overview Diffs are essential for software development to represent changes between files, visualize failed tests, or apply patches automatically. Existing free diff libraries often fail to meet all needs, especially for non-text data and flexible output formats. This article presents the author's journey creating a new diff library for Go, highlighting key learnings about diff algorithms, performance, and readability. --- Existing Diff Libraries Key desired attributes for a diff library include: Support for arbitrary sequences, not just text. Output in unified diff format and as a structured result. Readability and minimal diffs. Simple API. Good performance (runtime and memory) even in worst cases. Comparison of Popular Go Libraries | Name | Input Support | Unified Output | API | Performance | Readability | Minimality | |--------------|-----------------------|----------------|----------|-------------|-------------|------------| | diffmatchpatch | No (only text) | No | Difficult | Moderate | Medium | Medium | | go-internal | No (only text) | No | Good | Good | Good | Good | | godebug | No (only text) | Yes | Good | Poor (high memory) | Good | Good | | mb0 | Yes (arbitrary slices) | No | Moderate | Moderate | Good | Good | | udiff | No (only text) | Yes | Good | Good | Poor | Poor | Note: The ratings are based on limited benchmarks and subjective assessments. --- Challenges Complexity Most Go libraries use Myers' Algorithm, a classic diff method with runtime complexity O(ND), where N is input size and D is edit distance. For similar inputs, the algorithm is fast; for very different inputs, it can approach O(N²) runtime and quadratic space complexity. Some libraries approximate the solution with heuristics to improve performance but sacrifice minimality. Alternatives like Patience Diff offer O(N log N) performance and improved readability but require hash maps (restricting applicability) and do not guarantee optimal diffs. Readability Multiple minimal diffs exist for the same inputs, but they differ in human readability. Implementations of the same algorithm can yield very different results due to post-processing and heuristics. For example, Michael Haggerty's indentation heuristic improves diff readability significantly by sliding changes in the diff for better human comprehension. --- A New Diffing Library for Go: znkr.io/diff Goals Support for arbitrary sequences and text. Output both unified diff and structured formats (edits and hunks). Simple, extensible API. Near-minimal or minimal diffs. Excellent runtime and memory performance. Modes of Operation | Mode | Input Support | Unified Output | API | Performance | Readability | Minimality | |---------|---------------|----------------|-------|-------------|-------------|------------| | Default | Yes | Yes | Good | Good | Good | Good | | Fast | Yes | Yes | Good | Excellent | Good | Moderate | | Optimal | Yes | Yes | Good | Moderate | Good | Optimal | Note: These ratings are for text inputs; non-text cases may differ. --- API Design Two main diff structures: Edit: Flat list representing each operation (Match, Insert, Delete) with elements from both sides. Hunk: Nested sequences of consecutive changes with surrounding context, useful for testing and visualization. Generic types support arbitrary slice elements with comparable constraint or with equality function for any type (EditsFunc and HunksFunc). Specialized support for text (string or []byte) provides both structured results and unified diff output