git multirepo to monorepo
2022-08-16 · 9 min read
the setup #
- We currently have a private repo and a public repo,
mono
andpub
. - The
pub
repo is already exists as a published and open-source repo. - We'd like to merge
pub
intomono
as a subdirectorymono/pub-dir
. - The conversion must not lose the original
pub
subrepo history. - The migrated
pub
subrepo will still be published as a public, mostly read-only open source repo. - The conversion should not require a force-push rewrite of the
pub
subrepo history. - For privacy, new commits in
mono
that don't touch themono/pub-dir
subdirectory must not appear in the publishedpub
subrepo. git blame
must continue to work as expected.- Filtering and publishing changes from the monorepo to one of the subrepos should easily run in CI.
- It should work with the built-in
git
tools.
(0) current state #
First, let's orient ourselves with the two repos:
~/pub$ exa --tree
.
├── README.md
└── client
└── ..
~/pub$ git lg
* 5de880b - pub: add client (HEAD -> master)
* ff12305 - pub: initial commit
~/mono$ exa --tree
.
├── README.md
└── backend
└── ..
~/mono$ git lg
* e89ca6f - mono: add backend (HEAD -> master)
* 5017487 - mono: initial commit
Ultimately we'd like our mono
repo to look like:
$ ~/mono$ exa --tree
.
├── README.md
├── backend
│ └── ..
└── pub-dir # `pub-dir` will contain the `pub` subrepo
├── README.md
└── client
└── ..
The published pub
subrepo will look the same immediately after conversion, but will exist as a "projected" view of the mono
repo.
(1) move all files in pub
into the subdirectory #
The first step is to prepare the pub
subrepo by creating a commit that moves all files into the desired future subdirectory of the monorepo.
For this example, we'll move all the files in pub
into a new pub-dir
subdirectory:
# prepare public repo by moving all files into new `pub-dir` subdir.
~/pub$ git branch migration
~/pub$ mkdir ./pub-dir
~/pub$ git ls-tree -z --name-only HEAD \
| xargs -0 -I{} git mv "{}" ./pub-dir/
~/pub$ git commit -m "pub: prepare for monorepo migration"
~/pub$ git lg
* 33708aa - pub: prepare for monorepo migration (HEAD -> migration)
* 5de880b - pub: add client (master)
* ff12305 - pub: initial commit
This approach preserves the histories in both repos and doesn't require rewriting the public repo's existing history. When we push new changes from the mono
repo to the pub
subrepo, the existing public repo will continue to have its original history, just with new commits added on top!
(2) git merge --allow-unrelated-histories #
Once the pub
repo is prepared, we move into the mono
repo and merge the two independent histories into one.
The key trick here is to use git merge --allow-unrelated-histories
, which lets us have two "root" initial commits in the repo. In other words, our git history is no longer a single tree, but a "forest", which eventually grows together into one linear history.
In diagram form:
* - mono+pub: merged linear history
|
⁞
|
* - mono+pub: merge commit
|\
| *
* \
| *
⁞ ⁞
⁞ * - pub: initial commit
|
* - mono: initial commit
Let's do that now. We import all commits on master
from the subrepo into the main mono
repo. Best part: this works flawlessly with git blame
!
# Fetch the public repo from a local directory or remote url.
# In our case `pub` is just in the parent directory:
~/mono$ git remote add pub ../pub
~/mono$ git fetch --no-tags pub migration
~/mono$ git merge --allow-unrelated-histories pub/migration
# See the two independent histories merged into one linear
# history, without losing either history.
~/mono$ git lg
* 3888a45 - mono: merge pub repo into subdir (HEAD -> master)
|\
| * 33708aa - pub: prepare for monorepo migration (pub/migration)
| * 5de880b - pub: add client (pub/master)
| * ff12305 - pub: initial commit
* e89ca6f - mono: add backend
* 5017487 - mono: initial commit
# Push the migrated `mono` history.
~/mono$ git push origin master
Note: you can do this without git push
'ing until the end, but it's somewhat tedious to describe in tutorial format.
(3) git push new subrepo history upstream #
Now that we have all the subrepos merged into mono
, we only need to make changes in one place--no more PRs split across repos!
Of course we still want changes to the subrepos inside mono
to get pushed out to the public repos. We can accomplish this with git subtree split
. We'll do this in a fresh clone, since it mangles the current repo state...
# We project the monorepo history into it's public-only form.
# This approach completely preserves the original public commits!
#
# NOTE: the subshell:
#
# ```
# git rev-list --max-parents=0 pub/migration | tail --lines=1
# ```
#
# simply picks out the root initial commit from the `pub`
# subrepo history.
# Get a fresh clone of the `mono` repo.
~/$ git clone <origin-mono-repo-git-url> mono && cd mono
~/mono$ git remote add pub <origin-pub-repo-git-url>
~/mono$ git fetch pub
~/mono$ git switch master
# Project the `pub` subrepo history from the `mono` repo history.
~/mono$ git subtree split \
--prefix=pub-dir/ \
--onto=$(git rev-list --max-parents=0 pub/migration | tail --lines=1)
198017b02ffa2d45f301f6b9ba6424c4e849251b
~/mono$ git reset --hard 198017b02ffa2d45f301f6b9ba6424c4e849251b
# This is the "projected" `pub` repo history with new commits from
# the `mono` repo.
#
# Note: commits that don't touch the `pub-dir/` subrepo will be
# safely ignored.
~/mono$ git lg
* 198017b - mono+pub: this commit touches files on / and pub-dir/ (HEAD -> master)
* 3269fbc - pub: this commit only touches files in pub-dir/
* c7b8b55 - pub: prepare for monorepo migration
* 5de880b - pub: add client (pub/master)
* ff12305 - pub: initial commit
# Push the projected `pub` history to the upstream `pub` repo.
~/mono$ git push pub master
Note: If we use git subtree split
without --onto=<pub-initial-commit>
, it will fail to determine the original subrepo root commit. Instead it will just squash all changes pre-merge, which is not what we want!
appendix #
This section contains other alternative approaches that didn't work well for one reason or another. Feel free to skip everything below.
(1) more janky git mv all files into subdir #
~/pub$ git ls-files \
| sed -n -E "s/^([^\/]*)\/?.*$/\1/p" \
| uniq \
| grep -v "^pub-dir" \
| xargs -I{} git mv "{}" ./pub-dir/
(1) (alt) rewrite history into subdirectory #
You can also use git filter-repo
to rewrite the whole commit history, as-if it had been written inside a subdirectory from the beginning:
# install the `git filter-repo` tool, which replaces the (apparently
# unsafe?) built-in filter-branch
(deb)$ sudo apt install git-filter-repo
(macOS)$ brew install git-filter-repo
# After this command, all files in the current commit and all past
# commits will be moved into the public-dir/ subdirectory.
#
# You'll need to do this in fresh repo clone, since this command trashes
# all repo commits.
~/pub$ git filter-repo --to-subdirectory-filter pub-dir
This approach will require a force-push, overwriting the original pub
repo history as the later git subtree split
doesn't appear to be fully lossless in this case, causing the commit hashes change. Note: this will only affect the commit hashes of the pub
repo history after push. The changes in each commit will still appear (almost) exactly the same, i.e., they're still in the root directory.
(2) (alt) git subrepo #
Skip step (1) if you use this approach.
The git subrepo
tool is a user-friendly alternative to the git submodule
and git subtree
commands.
(macOS)$ brew install git-subrepo
# git subrepo clone <remote-url> <subdir>
~/mono$ git subrepo clone ../pub ./pub-dir
Unfortunately, git subrepo clone
squashes all the commit history before the merge. If you're just looking to vendor a dependency or don't care about the original history, this approach might be for you.
(2) (alt) git subtree add #
Skip step (1) if you use this approach.
git subtree
is a built-in tool that lets you keep other repos phyiscally inside your main repo. If you don't care about git blame
working, then this is an easier method.
# git subtree add --prefix=<subdir> <remote-url> <remote-ref>
~/mono$ git subtree add --prefix=pub-dir/ ../pub master
git subtree add
at least preserves the original history, but git blame
only works down to the merge commit for subtree files. :'(
(2) (alt) git read/write/commit-tree #
Skip step (1) if you use this approach.
Using some low-level git porcelain, we can merge the subrepo into a subdirectory without touching the subrepo. This method effectively creates a new tree
object in the git db whose contents are a single subdirectory with the pub/master
root tree
. We then create a merge commit pointing to (1) the existing mono
repo master commit and (2) the new manually created commit.
Unfortunately, while the end result looks almost the same as [[#(1) move all files in pub
into the subdirectory]], it appears to break git blame
. :'(
~/mono$ git remote add public ../public
~/mono$ git fetch --no-tags public master
~/mono$ git read-tree --prefix=public/ public/master
~/mono$ git write-tree
f1b1ee12e1d679ec84c282049102e2654efc4b67
~/mono$ git commit-tree f1b1ee12e1d679ec84c282049102e2654efc4b67 \
-p master -p public/master -m "monorepo merge"
046d3298b9976d50fa379fa071f57bc1158aca5b
~/mono$ git reset 046d3298b9976d50fa379fa071f57bc1158aca5b && git restore .
~/mono$ git lg
* 854b553 - monorepo merge (HEAD -> master)
|\
* | 5b02f51 - mono: add backend (origin/master)
| * 7709d49 - pub: add client (public/master)