How do git submodules even work?
I’ve been writing software for ~9 years, and for ~8 of them, I’ve been using git as my main VCS. I’m just about used to git. But submodules are a part of git that I hadn’t used at all, so when I discovered that my current team makes some use of submodules, I thought I’d better write a “how do these actually work” post, mostly so that I can stop feeling like a newb around this weird feature of git.
Why Submodules
Git submodules solve a very specific problem: I have a project tracked in git, and one of its subdirectories is also a project tracked in git. Let’s pretend we didn’t have submodules, and we had project where one subdirectory had to contain another git project.
Attempt to not learn submodules: Just clone both and symlink
What if, instead of having submodules, we do the following:
- Clone repo1, the parent, at, say
~/work/parent
- Clone repo2, the submodule, at say
~/work/subproj
- Run something like
ln -s ~/work/subproj ~/work/parent/subproj
Why can’t we just do this instead of having submodules?
Let’s try it with hugo themes, since those are often persisted via submodules:
% hugo new site no-submodules
% ls no-submodules/themes # no themes :(
% git clone git@github.com:CaiJimmy/hugo-theme-stack.git # next to the site
% cd no-submodules/themes
% ln -s ../../hugo-theme-stack ./hugo-theme-stack
% file ./hugo-theme-stack
hugo-theme-stack: directory
This looks great, right? Now the theme is where it needs to go, and I didn’t have to learn how to use git submodules at all.
But then I’m setting up the project and clone it to a different directory:
% git clone ~/Desktop/examples/hugo-without-submodules/no-submodules
Cloning into 'no-submodules'...
done.
% file no-submodules/themes/hugo-theme-stack
no-submodules/themes/hugo-theme-stack: broken symbolic link to ../../hugo-theme-stack
Oops! Now the symlink is broken, because the symlink just stores the relative path as a string. So now everyone who works on my repository has to clone the theme I’m using to where the symlink expects it to be.
Also, nothing keeps it up to date! Let’s say I make some changes that are coupled to changes in the theme. How can I communicate to my teammates that they need to pull in the new change to the theme? It’s almost like this subdirectory should be under version control!
So the symlink approach is fine except:
- Nothing automatically puts the target of the symlink where it needs to be
- Nothing automatically tracks changes to or updates the repo that’s the target of the symlink
Someone should make a system that puts files where they go and tracks and applies changes to them! Since git already does this, I think I’m convinced that git should offer some support here.
How they actually work
So now I’m convinced that I need submodules, or at least that making my own equivalent system out of symlinks will be just as much work as learning how git already does this. So let’s learn how git already does this!
We’ll start with an empty Hugo site, and then add a theme via git submodules, and try to see what changes.
% hugo new site with-submodules && cd with-submodules && git init
Let’s add the boilerplate this made to our initial commit:
% git add .
% git status -v
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: archetypes/default.md
new file: hugo.toml
diff --git a/archetypes/default.md b/archetypes/default.md
new file mode 100644
index 0000000..c6f3fce
--- /dev/null
+++ b/archetypes/default.md
@@ -0,0 +1,5 @@
++++
+title = '{{ replace .File.ContentBaseName "-" " " | title }}'
+date = {{ .Date }}
+draft = true
++++
diff --git a/hugo.toml b/hugo.toml
new file mode 100644
index 0000000..7e568b8
--- /dev/null
+++ b/hugo.toml
@@ -0,0 +1,3 @@
+baseURL = 'https://example.org/'
+languageCode = 'en-us'
+title = 'My New Hugo Site'
% git commit -m "initial commit"
I want the output of git status -v
to show what git looks like when we add a
normal file versus when we add a submodule. We have no theme, so lets add one
via git submodules! Following instruction on the theme’s
repo,
option 2:
% git submodule add --depth=1 https://github.com/adityatelange/hugo-PaperMod.git themes/PaperMod
Cloning into '/Users/will/Desktop/examples/with-submodules/themes/PaperMod'...
... snip
% git status -v
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: .gitmodules
new file: themes/PaperMod
diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..89af1b0
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "themes/PaperMod"]
+ path = themes/PaperMod
+ url = https://github.com/adityatelange/hugo-PaperMod.git
diff --git a/themes/PaperMod b/themes/PaperMod
new file mode 160000
index 0000000..24f3096
--- /dev/null
+++ b/themes/PaperMod
@@ -0,0 +1 @@
+Subproject commit 24f3096e33bca0b3a83c90ea59f9fffe2df08333
That’s interesting! Adding a submodule to the project made two changes that
show up in git diff
: We created a file called .gitmodules
in the root of
the repo; that file knows the local path and remote URL of the submodule. And
we created a “new file” at /themes/PaperMod
with file mode 160000
. Note in
the output above when I added a normal file, it was with mode 100644
.
According to the git
book, 160000
is a magic
file mode that tells git that this is a submodule. In other words, git tracks
the submodule as an ordinary file in its object database, but uses a magic
number in the file mode field to record the state that it’s a submodule. Let’s
commit this and keep exploring: git commit -m "track theme submodule"
.
Using git ls-tree -r main .
we can see this output again:
% git ls-tree -r main .
100644 blob 89af1b0cdcdba0b5457d5bb4375ca4ca5f6116f9 .gitmodules
100644 blob c6f3fcef6e3aac0c52c7ac1af2f71b58f8572fec archetypes/default.md
100644 blob 7e568b837c4698c6292b05c681d4b42148c8c98f hugo.toml
160000 commit 24f3096e33bca0b3a83c90ea59f9fffe2df08333 themes/PaperMod
We can see three regular files and one file with type commit
instead of
blob
, and the magic mode 160000
.
So let’s recap:
- We need submodules because automatically pulling down and tracking changes in a subdirectory that’s also a git repo would be cumbersome without some support from a feature in git itself
- Git creates a
.gitmodules
file that lists what repos are cloned down to what paths - Git tracks the submodule as if there were a file at that path, with a magic mode so that git knows it’s not really a normal file.
How to actually use git submodules
Now that we understand why we need submodules, and how submodules work, we’re ready to try to learn to use submodules. This section will try to be a bit of a cheat sheet.
- Clone a repo so that it just has submodules pulled down to begin with:
git clone --recurse-submodules ...
- Initialize all the submodules if you forgot to pass
--recurse-submodules
on a clone:git submodule update --init --recursive
. - Make git show a snippet from the log of the submodule if there’s a diff:
git config --global diff.submodule log
- Pull in changes that might have submodule updates:
git pull --recurse-submodules
- Git push, but fail if there are submodules that have unpushed commits
git push --recurse-submodules=check
- Make git always update submodules when you do a pull:
git config --global submodule.recurse true
(unfortunately this doesn’t work forgit clone
) - Lots more at https://git-scm.com/book/en/v2/Git-Tools-Submodules
One thing that always confused me when working with git submodules is: doing
something like git stash
or git checkout .
doesn’t update the submodule! To do
that, cd
into the submodule directory, and tell the git repo there to stash
or discard changes or whatever. This strikes me as extra annoying, because
what I often want to tell git is: “go back to the state of the submodule that
was committed to the parent repo”. The best way I’ve found to do this is: Run
git diff
in the parent repo, and see that the submodule changed commits. Then
cd
into the submodule and git checkout <old commit>
.
Anyway, hopefully this helps you use submodules! Till next time, happy learning!
– Will
Love this post? Hate it? Is someone wrong on the internet? Is it me? Please feel free to @me on mastodon
I used to host comments on the blog, but have recently stopped. Previously posted comments will still appear, but for new ones please use the mastodon link above.
Discussion