December 18th, 2024

Is there a way to split the git history of a file or combine the histories of two files without a merge commit?

Some time ago, I showed how to combine two files in git while preserving their line history and how to split a file into two while preserving git line history. Both of these techniques rely on merge commits. But what if your team’s policy is to rebase or squash all commits? Can you accomplish these tasks without merge commits?

Git’s line attribution algorithm follows file history, so let’s look at how git tracks file history.

To determine the file history connections for a file between a commit and its parent or parents, git looks for the file in each parent commit at the same path. If it’s found there, then git considers the file to have been modified in place with respect to that parent. If it’s not present in the parent commit at the same path, then git looks to see if the file is similar¹ to a file that is present in the child commit but missing in the parent. If it finds one, then it considers the file to have moved from that similar file. Otherwise, the file is considered to have been deleted newly-created.

Note that git finds at most one match per parent commit. If it finds the file in a parent commit at the same path, it declares success for that parent commit and doesn’t keep looking for close matches.

Our tricks with either splitting or merging git line history are trying to create a Y-shaped history. Either two new files whose ancestors are a shared single file, or one new file with two distinct ancestors. But if each commit has only one parent, then your history diagram will just be a straight line. No Y-shaped history is possible given these constraints.

This means that if you do a squash or traditional rebase², you lose the ability to create nonlinear history. If you want to do history merging or history splitting, you need to use merge commits.

¹ Git identifies all the files which are present in the parent but which are missing in the child at the same path. These are the deletion candidates. It then looks for a deletion candidate that is identical to the file in the child commit. If there is no perfect match, then it looks for near matches among the deletion candidates according to options you specify like -M and -B.

² Traditional rebase creates a linear history, but you can use the --rebase-merges option to (try to) preserve the original merge history.

Topics
Other

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

2 comments

  • GL 27 minutes ago · Edited

    I'm confused by the description of how Git connects file history...

    > To determine the file history connections for a file between a commit and its parent or parents, git looks for the file in each parent commit at the same path. If it’s found there, then git considers the file to have been modified in place with respect to that parent. If it’s not present in the parent commit at the same path, then git...

    Read more