Will Murphy's personal home page

How do git submodules even work?

I’ve been writing software for ~9 years, and for ~8 of them, I’ve been using git as my main VCS. I’m just about used to git. But submodules are a part of git that I hadn’t used at all, so when I discovered that my current team makes some use of submodules, I thought I’d better write a “how do these actually work” post, mostly so that I can stop feeling like a newb around this weird feature of git.

Why Submodules

Git submodules solve a very specific problem: I have a project tracked in git, and one of its subdirectories is also a project tracked in git. Let’s pretend we didn’t have submodules, and we had project where one subdirectory had to contain another git project.

What if, instead of having submodules, we do the following:

  1. Clone repo1, the parent, at, say ~/work/parent
  2. Clone repo2, the submodule, at say ~/work/subproj
  3. Run something like ln -s ~/work/subproj ~/work/parent/subproj

Why can’t we just do this instead of having submodules?

Let’s try it with hugo themes, since those are often persisted via submodules:

% hugo new site no-submodules
% ls no-submodules/themes # no themes :(
% git clone git@github.com:CaiJimmy/hugo-theme-stack.git # next to the site
% cd no-submodules/themes
% ln -s ../../hugo-theme-stack ./hugo-theme-stack
% file ./hugo-theme-stack
hugo-theme-stack: directory

This looks great, right? Now the theme is where it needs to go, and I didn’t have to learn how to use git submodules at all.

But then I’m setting up the project and clone it to a different directory:

% git clone ~/Desktop/examples/hugo-without-submodules/no-submodules
Cloning into 'no-submodules'...
done.
% file no-submodules/themes/hugo-theme-stack
no-submodules/themes/hugo-theme-stack: broken symbolic link to ../../hugo-theme-stack

Oops! Now the symlink is broken, because the symlink just stores the relative path as a string. So now everyone who works on my repository has to clone the theme I’m using to where the symlink expects it to be.

Also, nothing keeps it up to date! Let’s say I make some changes that are coupled to changes in the theme. How can I communicate to my teammates that they need to pull in the new change to the theme? It’s almost like this subdirectory should be under version control!

So the symlink approach is fine except:

  1. Nothing automatically puts the target of the symlink where it needs to be
  2. Nothing automatically tracks changes to or updates the repo that’s the target of the symlink

Someone should make a system that puts files where they go and tracks and applies changes to them! Since git already does this, I think I’m convinced that git should offer some support here.

How they actually work

So now I’m convinced that I need submodules, or at least that making my own equivalent system out of symlinks will be just as much work as learning how git already does this. So let’s learn how git already does this!

We’ll start with an empty Hugo site, and then add a theme via git submodules, and try to see what changes.

% hugo new site with-submodules && cd with-submodules && git init

Let’s add the boilerplate this made to our initial commit:

% git add .
% git status -v
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   archetypes/default.md
	new file:   hugo.toml

diff --git a/archetypes/default.md b/archetypes/default.md
new file mode 100644
index 0000000..c6f3fce
--- /dev/null
+++ b/archetypes/default.md
@@ -0,0 +1,5 @@
++++
+title = '{{ replace .File.ContentBaseName "-" " " | title }}'
+date = {{ .Date }}
+draft = true
++++
diff --git a/hugo.toml b/hugo.toml
new file mode 100644
index 0000000..7e568b8
--- /dev/null
+++ b/hugo.toml
@@ -0,0 +1,3 @@
+baseURL = 'https://example.org/'
+languageCode = 'en-us'
+title = 'My New Hugo Site'
% git commit -m "initial commit"

I want the output of git status -v to show what git looks like when we add a normal file versus when we add a submodule. We have no theme, so lets add one via git submodules! Following instruction on the theme’s repo, option 2:

% git submodule add --depth=1 https://github.com/adityatelange/hugo-PaperMod.git themes/PaperMod
Cloning into '/Users/will/Desktop/examples/with-submodules/themes/PaperMod'...
... snip
% git status -v
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	new file:   .gitmodules
	new file:   themes/PaperMod

diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..89af1b0
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "themes/PaperMod"]
+	path = themes/PaperMod
+	url = https://github.com/adityatelange/hugo-PaperMod.git
diff --git a/themes/PaperMod b/themes/PaperMod
new file mode 160000
index 0000000..24f3096
--- /dev/null
+++ b/themes/PaperMod
@@ -0,0 +1 @@
+Subproject commit 24f3096e33bca0b3a83c90ea59f9fffe2df08333

That’s interesting! Adding a submodule to the project made two changes that show up in git diff: We created a file called .gitmodules in the root of the repo; that file knows the local path and remote URL of the submodule. And we created a “new file” at /themes/PaperMod with file mode 160000. Note in the output above when I added a normal file, it was with mode 100644. According to the git book, 160000 is a magic file mode that tells git that this is a submodule. In other words, git tracks the submodule as an ordinary file in its object database, but uses a magic number in the file mode field to record the state that it’s a submodule. Let’s commit this and keep exploring: git commit -m "track theme submodule".

Using git ls-tree -r main .we can see this output again:

% git ls-tree -r main .
100644 blob 89af1b0cdcdba0b5457d5bb4375ca4ca5f6116f9	.gitmodules
100644 blob c6f3fcef6e3aac0c52c7ac1af2f71b58f8572fec	archetypes/default.md
100644 blob 7e568b837c4698c6292b05c681d4b42148c8c98f	hugo.toml
160000 commit 24f3096e33bca0b3a83c90ea59f9fffe2df08333	themes/PaperMod

We can see three regular files and one file with type commit instead of blob, and the magic mode 160000.

So let’s recap:

  1. We need submodules because automatically pulling down and tracking changes in a subdirectory that’s also a git repo would be cumbersome without some support from a feature in git itself
  2. Git creates a .gitmodules file that lists what repos are cloned down to what paths
  3. Git tracks the submodule as if there were a file at that path, with a magic mode so that git knows it’s not really a normal file.

How to actually use git submodules

Now that we understand why we need submodules, and how submodules work, we’re ready to try to learn to use submodules. This section will try to be a bit of a cheat sheet.

One thing that always confused me when working with git submodules is: doing something like git stash or git checkout . doesn’t update the submodule! To do that, cd into the submodule directory, and tell the git repo there to stash or discard changes or whatever. This strikes me as extra annoying, because what I often want to tell git is: “go back to the state of the submodule that was committed to the parent repo”. The best way I’ve found to do this is: Run git diff in the parent repo, and see that the submodule changed commits. Then cd into the submodule and git checkout <old commit>.

Anyway, hopefully this helps you use submodules! Till next time, happy learning!
– Will

Comments

Note: recently submitted comments may not be visible yet; the approval process is manual. Please be patient, and check back soon!

Join the conversation!