Auto-remind developers to edit another block of code

7 min readDec 23, 2023

Have you seen bugs due to someone forgetting to change an related file with their pull request (PR)? It happens far too often, especially in large repositories with many developers.

In almost every code repo, there will be files that tend to be modified together. This phenomenon is called “change correlation”. Often, you want to ensure it happens — if a pull request (PR) attempts to modify file A, then it should also edit file B.

Photo by Obi — @pixel8propix on Unsplash

No, you can’t lean on compilers on this one. While compilers may capture necessary correlated changes across code files, there they tend to ignore non-code files. Cases include:

Localization. When you edit the translation file for American English, you almost always want to mirror that change in the for British English.
Documentation-as-code. You may have README files in the same directories as your source code. These documentations may explain design decisions that may get reversed one day. When someone does, you want to ensure that they also remove the related paragraphs from the README files.
Code ownership changes. When you have multiple teams collaborating on one repo, each team may own multiple directories in the same codebase. The ownership may be declared in CODEOWNERS files (GitHub, GitLab). When teams reorganize or rebrand, you may want to make sure all CODEOWNERS files reflect the changes together.

Even between code snippets, people sometimes have to rely on comments to remind future developers to keep files in sync. Don’t take my words for it; look at this comment in Golang runtime:

// A hash iteration structure.
// If you modify hiter, also change cmd/compile/internal/reflectdata/reflect.go
// and reflect/value.go to match the layout of this structure.

Bottom line: There is a real need to have a good “change reminder”, asking the developer to modify other files based on what code has been modified.

I didn’t find one online, so I decided to build one myself.

Design

This mechanism looks like a good candidate for a pre-commit hook.

The idea is simple. In the staged changes, we scan each hunk. We check it against all rules of “if you modify X, also modify Y". If X overlaps with the hunk, we check whether Y also overlaps with any hunk. If Y does not have any changes staged, we complain.

Where should we store the rules? I love the Regular Expression Linter (relint), which uses a separate YAML file to define all the rules. If we follow suit in our change reminder, we may have:

- when-modified: X
  also-modify: Y
- when-modified: A
  also-modify: B

but I would prefer to keep rules closer to where they’d take effect, so let’s just follow what’s most intuitive for developers; that is, writing rules in code comments in the same file.

How should X look like? Intuitively, X should be a name of a symbol. (In the Golang example above, it would be hiter.) However, if we chose to reference symbols, we would have to parse the code in order to find the lines where X is defined before we can compare with git diff. Using parsers implies that our script will be language-aware, which complicates things.

To keep our script language-agnostic, we ask code authors to surround the code location with a pair of comments:

If you modify something from here...
[X]
... to here, then also modify `[Y]`.

This fenced design also forces rule creators to keep rules close to snippets where they should watch, which helps with readability.

How should Y look like? It should definitely contain a path to the target file. We also don't want irrelevant changes in the target file to accidentally "satisfy" a rule; we should also have a way to bookmark a certain block of code in the target file and reference it in a rule.

Example

In this section, we describe the desired behavior of our “related change reminder”.

Let’s say we have a src/java/main/Lorem.java that contains these lines:

enum Lorem {
    /* If you modify something from here... */
    A,
    B,
    // ... to here, then also modify block 1 in `src/java/main/Ipsum.java`.
}

and a src/java/main/Ipsum.java with the following snippet:

enum Ipsum {
    // Code block `names` starts here.
    Alice,
    Bob,
    // Code block `names` ends here.
}

When we modify a line between those two lines of comments:

enum Lorem {
    /* If you modify something from here... */
    A,
    B,
+    C,
    // ... to here, then also modify block `names` in `src/java/main/Ipsum.java`.
}

It should complain:

names in src/java/main/Ipsum.java should also be updated. If you decided otherwise, explain within names why no change is needed this time or modify the rule in src/java/main/Lorem.java in the same commit.

unless we have also modified Ipsum.java like this:

enum Ipsum {
    // Code block `names` starts here.
    Alice,
    Bob,
+    // `C` is not needed in `Ipsum` due to JIRA-12345.
    // Code block `names` ends here.
}

or delete the rule altogether from Lorem.java:

enum Lorem {
-    /* If you modify something from here... */
    A,
    B,
+    C,
-    // ... to here, then also modify block `names` in `src/java/main/Ipsum.java`.
}

Implementation

The two fences we described above can be translated into the following regular expressions:

import re

FENCE_PATTERN = re.compile(
    r"If you modify something from here\.\.\..*?\.\.\. to here, then also modify block `(?P<block_name>.+?)` in `(?P<target_location>.+?)`",
    re.DOTALL,
)

BOOKMARKS_PATTERN = re.compile(
    r"Code block `(?P<block_name>.+?)` starts here.*?Code block `(?P=block_name)` ends here",
    re.DOTALL,
)

Notes:

These patterns DOTALL, because a code block often contain multiple lines.
Regular expressions does not support nested usages.
These patterns do not attempt capture comment markers (such as //, /*, and */), which helps the script stay language-agnostic. To avoid accidentally matches, the patterns contain adequately long phrases in English instead.

Next, we want to parse the staged changes from git.

from git import Repo
from unidiff import PatchSet, PatchedFile

if __name__ == "__main__":
    # Interpret the current directory as a git repository.
    repo = Repo(".")
    # Get staged changes.
    diff = repo.git.diff("HEAD")
    # Parse the diff text with unidiff.
    patch = PatchSet(diff)
    # TODO

A patch set contains all files modified in the given git diff. We extract blocks of code from each file. For each block extracted, if any line in it has been modified, we yield the block itself.

from typing import Dict, Iterator

def get_blocks_modified(patch: PatchSet, pattern: re.Pattern) -> Iterator[Dict[str, str]]:
    for patched_file in patch:
        try:
            with open(patched_file.path, "r") as f:
                content = f.read()
        except (FileNotFoundError, UnicodeDecodeError):
            continue
        blocks = pattern.finditer(content)
        try:
            for block in blocks:
                start_line = content[: block.start()].count("\n") + 1
                end_line = content[: block.end()].count("\n") + 1
                if is_block_modified(patched_file, start_line, end_line):
                    yield block.groupdict() | {'file_path': patched_file.path}
        except StopIteration:
            # This file doesn't have any block.
            pass

Notice that I’ve declared the pattern as a parameter to the get_blocks_modified method. This is because both patterns are wielded in a similar way: Regardless of whether we are extracting code surrounded with rules or with bookmarks, we find their intersections with git diff.

In the snippet above, is_block_modified is defined as:

def is_block_modified(patched_file: PatchedFile, start_line: int, end_line: int) -> bool:
    for hunk in patched_file:
        for line in hunk:
            if line.is_context:
                continue
            if not line.target_line_no:
                continue
            if start_line < line.target_line_no < end_line:
                return True
    return False

Finally, the main loop of the code be written as:

    patch = PatchSet(diff)
    # Build a dictionary from file paths to names of blocks in them where code has been modified.
    blocks_modified = defaultdict(set
    for i in get_blocks_modified(patch, BOOKMARKS_PATTERN):
        blocks_modified[i["file_path"]].add(i["block_name"])
    # Look for rules.
    is_any_modification_needed = False
    for i in get_blocks_modified(patch, FENCE_PATTERN):
        if i["block_name"] in blocks_modified[i["target_location"]]:
            continue
        is_any_modification_needed = True
        print(f"""`{i["block_name"]}` in `{i["target_location"]}` should also be updated.
        If not needed, explain why in `{i["block_name"]}`.
        If this rule is outdated, remove it from `{i["file_path"]}`.""")
    exit(1 if is_any_modification_needed else 0)

Save this file as related_change_reminder.py to the root folder of your repo.

Usage

Assuming you have pre-commit installed, you can add the following entry to .pre-commit-config.yaml:

- repo: local
  hooks:
    - id: related-change-reminder
      name: "Related change reminder"
      entry: "python related_change_reminder.py"
      language: python
      pass_filenames: false
      additional_dependencies: [ "gitpython", "unidiff" ]

Now, let’s say I have updated the version numbers of a dependency in one build system but have forgotten to do so in the other (which I believe is a rare usage of build tools), related_change_reminder.py will prevent me from committing my changes:

$ git commit -m "Update log4j in WORKSPACE."
Related change reminder................................Failed
- hook id: related-change-reminder
- exit code: 1
`dependencies` in `java/pom.xml` should also be updated.
        If not needed, explain why in `dependencies`.
        If this rule is outdated, remove it from `java/WORKSPACE`.

Summary

By ensuring related files are also edited upon each commit, we can prevent various issues: Flat-out software bugs, misleading documentations, inconsistent L10N strings, and even accounting errors. They also give engineers a peace of mind, who as long as there are sufficient rules defined across the codebase.

In the wild (read: on GitHub.com), people have already been noting down related files in comments. Why not make those reminders automatically enforceable? In fact, I’m sure many companies have such checks in-house, but I wasn’t able to find one in the open source world. (Perhaps because the words “reminders”, “related changes”, and “correlations” are too ambiguous to take me anywhere.)

This is why I decided to implement one myself. If you enjoyed this one, you may also like AutoJavadoc, a script that writes Javadocs with GPT when the PR author doesn’t.

I hope the script described in this post can help developers in your team save big effort ensuring related changes are modified at the same time. This post also serves as a tutorial to get you started with writing your own pre-commit hooks and with dev-cycle automation in general. Happy coding!

Auto-remind developers to edit another block of code

Design

Example

Implementation

Usage

Summary

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Ming

No responses yet