git multirepo to monorepo

2022-08-16 · 9 min read

the setup #

  • We currently have a private repo and a public repo, mono and pub.
  • The pub repo is already exists as a published and open-source repo.
  • We'd like to merge pub into mono as a subdirectory mono/pub-dir.
  • The conversion must not lose the original pub subrepo history.
  • The migrated pub subrepo will still be published as a public, mostly read-only open source repo.
  • The conversion should not require a force-push rewrite of the pub subrepo history.
  • For privacy, new commits in mono that don't touch the mono/pub-dir subdirectory must not appear in the published pub subrepo.
  • git blame must continue to work as expected.
  • Filtering and publishing changes from the monorepo to one of the subrepos should easily run in CI.
  • It should work with the built-in git tools.

(0) current state #

First, let's orient ourselves with the two repos:

~/pub$ exa --tree
└── client
   └── ..

~/pub$ git lg
* 5de880b - pub: add client (HEAD -> master)
* ff12305 - pub: initial commit

~/mono$ exa --tree
└── backend
   └── ..

~/mono$ git lg
* e89ca6f - mono: add backend (HEAD -> master)
* 5017487 - mono: initial commit

Ultimately we'd like our mono repo to look like:

$ ~/mono$ exa --tree
├── backend
  └── ..
└── pub-dir # `pub-dir` will contain the `pub` subrepo
   └── client
      └── ..

The published pub subrepo will look the same immediately after conversion, but will exist as a "projected" view of the mono repo.

(1) move all files in pub into the subdirectory #

The first step is to prepare the pub subrepo by creating a commit that moves all files into the desired future subdirectory of the monorepo.

For this example, we'll move all the files in pub into a new pub-dir subdirectory:

# prepare public repo by moving all files into new `pub-dir` subdir.
~/pub$ git branch migration
~/pub$ mkdir ./pub-dir
~/pub$ git ls-tree -z --name-only HEAD \
	| xargs -0 -I{} git mv "{}" ./pub-dir/
~/pub$ git commit -m "pub: prepare for monorepo migration"

~/pub$ git lg
* 33708aa - pub: prepare for monorepo migration (HEAD -> migration)
* 5de880b - pub: add client (master)
* ff12305 - pub: initial commit

This approach preserves the histories in both repos and doesn't require rewriting the public repo's existing history. When we push new changes from the mono repo to the pub subrepo, the existing public repo will continue to have its original history, just with new commits added on top!

(2) git merge --allow-unrelated-histories #

Once the pub repo is prepared, we move into the mono repo and merge the two independent histories into one.

The key trick here is to use git merge --allow-unrelated-histories, which lets us have two "root" initial commits in the repo. In other words, our git history is no longer a single tree, but a "forest", which eventually grows together into one linear history.

In diagram form:

* - mono+pub: merged linear history
* - mono+pub: merge commit
| *
*  \
|   *
⁞   ⁞
⁞   * - pub: initial commit
* - mono: initial commit

Let's do that now. We import all commits on master from the subrepo into the main mono repo. Best part: this works flawlessly with git blame!

# Fetch the public repo from a local directory or remote url.
# In our case `pub` is just in the parent directory:
~/mono$ git remote add pub ../pub
~/mono$ git fetch --no-tags pub migration
~/mono$ git merge --allow-unrelated-histories pub/migration

# See the two independent histories merged into one linear
# history, without losing either history.
~/mono$ git lg
*   3888a45 - mono: merge pub repo into subdir (HEAD -> master)
| * 33708aa - pub: prepare for monorepo migration (pub/migration)
| * 5de880b - pub: add client (pub/master)
| * ff12305 - pub: initial commit
* e89ca6f - mono: add backend
* 5017487 - mono: initial commit

# Push the migrated `mono` history.
~/mono$ git push origin master

Note: you can do this without git push'ing until the end, but it's somewhat tedious to describe in tutorial format.

(3) git push new subrepo history upstream #

Now that we have all the subrepos merged into mono, we only need to make changes in one place--no more PRs split across repos!

Of course we still want changes to the subrepos inside mono to get pushed out to the public repos. We can accomplish this with git subtree split. We'll do this in a fresh clone, since it mangles the current repo state...

# We project the monorepo history into it's public-only form.
# This approach completely preserves the original public commits!
# NOTE: the subshell:
# ```
# git rev-list --max-parents=0 pub/migration | tail --lines=1
# ```
# simply picks out the root initial commit from the `pub`
# subrepo history.

# Get a fresh clone of the `mono` repo.
~/$ git clone <origin-mono-repo-git-url> mono && cd mono
~/mono$ git remote add pub <origin-pub-repo-git-url>
~/mono$ git fetch pub
~/mono$ git switch master

# Project the `pub` subrepo history from the `mono` repo history.
~/mono$ git subtree split \
	--prefix=pub-dir/ \
	--onto=$(git rev-list --max-parents=0 pub/migration | tail --lines=1)
~/mono$ git reset --hard 198017b02ffa2d45f301f6b9ba6424c4e849251b

# This is the "projected" `pub` repo history with new commits from
# the `mono` repo.
# Note: commits that don't touch the `pub-dir/` subrepo will be
# safely ignored.
~/mono$ git lg
* 198017b - mono+pub: this commit touches files on / and pub-dir/ (HEAD -> master)
* 3269fbc - pub: this commit only touches files in pub-dir/
* c7b8b55 - pub: prepare for monorepo migration
* 5de880b - pub: add client (pub/master)
* ff12305 - pub: initial commit

# Push the projected `pub` history to the upstream `pub` repo.
~/mono$ git push pub master

Note: If we use git subtree split without --onto=<pub-initial-commit>, it will fail to determine the original subrepo root commit. Instead it will just squash all changes pre-merge, which is not what we want!

appendix #

This section contains other alternative approaches that didn't work well for one reason or another. Feel free to skip everything below.

(1) more janky git mv all files into subdir #

~/pub$ git ls-files \
	| sed -n -E "s/^([^\/]*)\/?.*$/\1/p" \
	| uniq \
	| grep -v "^pub-dir" \
	| xargs -I{} git mv "{}" ./pub-dir/

(1) (alt) rewrite history into subdirectory #

You can also use git filter-repo to rewrite the whole commit history, as-if it had been written inside a subdirectory from the beginning:

# install the `git filter-repo` tool, which replaces the (apparently
# unsafe?) built-in filter-branch
(deb)$ sudo apt install git-filter-repo
(macOS)$ brew install git-filter-repo

# After this command, all files in the current commit and all past
# commits will be moved into the public-dir/ subdirectory.
# You'll need to do this in fresh repo clone, since this command trashes
# all repo commits.
~/pub$ git filter-repo --to-subdirectory-filter pub-dir

This approach will require a force-push, overwriting the original pub repo history as the later git subtree split doesn't appear to be fully lossless in this case, causing the commit hashes change. Note: this will only affect the commit hashes of the pub repo history after push. The changes in each commit will still appear (almost) exactly the same, i.e., they're still in the root directory.

(2) (alt) git subrepo #

Skip step (1) if you use this approach.

The git subrepo tool is a user-friendly alternative to the git submodule and git subtree commands.

(macOS)$ brew install git-subrepo

# git subrepo clone <remote-url> <subdir>
~/mono$ git subrepo clone ../pub ./pub-dir

Unfortunately, git subrepo clone squashes all the commit history before the merge. If you're just looking to vendor a dependency or don't care about the original history, this approach might be for you.

(2) (alt) git subtree add #

Skip step (1) if you use this approach.

git subtree is a built-in tool that lets you keep other repos phyiscally inside your main repo. If you don't care about git blame working, then this is an easier method.

# git subtree add --prefix=<subdir> <remote-url> <remote-ref>
~/mono$ git subtree add --prefix=pub-dir/ ../pub master

git subtree add at least preserves the original history, but git blame only works down to the merge commit for subtree files. :'(

(2) (alt) git read/write/commit-tree #

Skip step (1) if you use this approach.

Using some low-level git porcelain, we can merge the subrepo into a subdirectory without touching the subrepo. This method effectively creates a new tree object in the git db whose contents are a single subdirectory with the pub/master root tree. We then create a merge commit pointing to (1) the existing mono repo master commit and (2) the new manually created commit.

Unfortunately, while the end result looks almost the same as [[#(1) move all files in pub into the subdirectory]], it appears to break git blame. :'(

~/mono$ git remote add public ../public
~/mono$ git fetch --no-tags public master
~/mono$ git read-tree --prefix=public/ public/master
~/mono$ git write-tree

~/mono$ git commit-tree f1b1ee12e1d679ec84c282049102e2654efc4b67 \
	-p master -p public/master -m "monorepo merge"

~/mono$ git reset 046d3298b9976d50fa379fa071f57bc1158aca5b && git restore .

~/mono$ git lg
*   854b553 - monorepo merge (HEAD -> master)
* | 5b02f51 - mono: add backend (origin/master)
| * 7709d49 - pub: add client (public/master)

other monorepo tools #

  1. (java) google/copybara
  2. (hack) facebook/fbshipit - Copy commits b/w repos
  3. (php) symplify/monorepo-split-gh-action
  4. (sh) ingydotnet/git-subrepo
  5. (py) newren/git-filter-repo
  6. (go) splitsh/lite
  7. (sh) SimonBaeumer/sync-monorepo
  8. \[ \]