Will Murphy's personal home page

Adding Comments Part 3: Existing Posts

This is part of a series on building my own comment engine for Hugo. You can start with part 1.

Last time, we made changes to the file at archetypes/posts.md so that hugo will generate a unique ID for each post. But I already have a number of posts, and I want to make sure that they get IDs as well. So this post is about writing a once off shell script or similar to add new IDs to these existing posts.

I found a post about using sed to insert a line after a match, which is just about what I need to do. So let’s take a look at this shell script. (For those of you following along at home, I use zsh, not bash, so if there’s something weird, that might be why). One good thing is that everything is under version control, so if I screw this up, I can always go back, which is nice. So before I start, I’m going to make sure everythinng is set committed :)

All my posts have a line like draft: true or draft: false in the front matter, so we should be able to use that.

Let’s start with a find command to list all the posts:

$ find ./content/posts -name '*.md'
./content/posts/hugo-workflow.md
./content/posts/primeversary-state-machine.md
./content/posts/adding-comments-3-existing-posts.md
./content/posts/science-journalism-and-cherry-picking-cases.md
... # snip

Now, let’s try to get the sed command right one one post before we try running it on all the posts.

It seems that macOS sed is different from GNU sed, so we’ll start with a brew install to get the better-documented GNU sed:

$ brew install gnu-sed
$ gsed -i '/^draft:/a some-string: foo/' content/posts/hugo-workflow.md 
$ git diff content/posts/hugo-workflow.md 

produces:

diff --git a/content/posts/hugo-workflow.md b/content/posts/hugo-workflow.md
index 4f2c23a..5675862 100644
--- a/content/posts/hugo-workflow.md
+++ b/content/posts/hugo-workflow.md
@@ -2,6 +2,7 @@
 title: "Hugo Workflow"
 date: 2022-01-09T05:16:30-05:00
 draft: false
+some-string: foo
 layout: single
 ---
---

great, so that adds some-string: foo to the front matter.

Now let’s try generating an md5 of the path as we go:

# first, get the hash command right:
$ md5 -q -s content/posts/hugo-workflow.md # -q: quiet, -s: input is string, not path
f6e23ced7418aa38d4f19e73fea380d1
$ gsed -i "/^draft:/a post_id: $(md5 -q -s content/posts/hugo-workflow.md)" content/posts/hugo-workflow.md
$ git diff content/posts/hugo-workflow.md
...
+post_id: f6e23ced7418aa38d4f19e73fea380d1

This uses a different format than Hugo (it looks like it prints as a hex number instead of as a base-ten number), but these are just IDs. As long as they’re different from each other, I really don’t care. To the application, they’re just opaque strings with the only requirements that they are unique and stable.

Now lets try to glue this together with find -exec which, to be honest, I always screw up at least a bit.

$ find content/posts -name '*.md' -exec bash -c 'POST={}; gsed -i "/^draft:/a post_id: $(md5 -q -s $POST)" $POST \;

I have to use -exec bash -c so make find able to execute the more complicated command.

And do we have victory?

diff --git a/content/posts/adding-comment-capability.md b/content/posts/adding-comment-capability.md
index c5f8776..dca7fa9 100644
--- a/content/posts/adding-comment-capability.md
+++ b/content/posts/adding-comment-capability.md
@@ -2,6 +2,7 @@
 title: "Adding Comment Capability"
 date: 2022-01-18T06:09:24-05:00
 draft: false
+post_id: 40c2e7cef071f8c3fe397b1b0706121d
 layout: single
 ---
 
diff --git a/content/posts/adding-comments-2-post-ids.md b/content/posts/adding-comments-2-post-ids.md
index 2b72edf..080c986 100644
--- a/content/posts/adding-comments-2-post-ids.md
+++ b/content/posts/adding-comments-2-post-ids.md
@@ -2,6 +2,7 @@
 title: "Adding Comments Part 2: Post IDs"
 date: 2022-01-19T03:30:48-05:00
 draft: false
+post_id: 474009447dc18cbf7c97173d628a2409
 layout: single
 ---

Let’s make sure that everything has a unique post ID, since on my first version I message that up, and everything ended up with the same MD5.

$ grep -r 'post_id:' content/posts | rev | cut -d ' ' -f 1 | rev | sort | uniq -c
   1 0421c363b141633403a253ea6d9c30a0
   1 22d8fe126df6bcf225c28ab3b7417156
   1 2e9bd3a6d0ede8423cd3e556d36a70c1

Prints a bunch of lines starting with 1. The only lines that don’t start with 1 are the two that appear in the git diff output pasted above.

Now that I’ve run my one-time script, I can just commit those changes, plus the changes to archetypes.md, and I’ll have unique IDs on all the posts. Next time, we’ll figure out how to ship these IDs down to the client side.

Till then, happy learning!
– Will

Discussion

Love this post? Hate it? Is someone wrong on the internet? Is it me? Please feel free to @me on mastodon

I used to host comments on the blog, but have recently stopped. Previously posted comments will still appear, but for new ones please use the mastodon link above.

Join the conversation!