Insomnia Cleanup

Let’s fix some mistakes in the previous post where I made an Atom feed for my blog. I haven’t found many (yet), but I still want to rectify them here.

Soooo, where do I start? How about checking what RSS readers actually parse out of my “definitely good” feed, that should be a decent start. So I opened my blog in my browser which happens to have a built-in feed reader that I don’t use, and would you look at that! It doesn’t show the feed button!

Which makes sense, I’ve never modified my homepage (or the template for post pages for that matter) to link to my feed, and I wouldn’t expect my browser to just randomly try to load arbitrary URLs just because there might be a feed at /rss.xml (it’s annoying enough that it’s automatically trying to load /favicon.ico… which reminds me I should probably make a favicon for my blog at some point). Anyway, the fix is easy - just spit out <link> with type set to application/atom+xml while generating the header.

print('  <link rel="alternate" type="application/atom+xml" href="https://markaos.cz/rss.xml">')

Yeah, my “shell script static website generator” uses Python for generating the home page. I wasn’t quite comfortable with shell scripts back in 2018 when I wrote it. Maybe I also wanted to practice my Python skills, I really don’t remember. Wait, maybe I just didn’t know about tac? This line of code might be the whole reason for using Python: entries.reverse()

The Actual Feed

Nice, now people visiting my blog can actually discover the feed. Next step is checking what the feed readers might get, let’s use the built-in feed renderer:

Okay, at least the W3C validator (which I didn’t mention I used in the previous post, but I did) didn’t lie to me and the feed got parsed successfully. But there’s two things I don’t like: the posts are ordered from oldest to newest, which is technically fine (Atom doesn’t dictate the order and feed readers should sort the posts by their own criteria, most likely by date), but I’d still like for the newest posts to come first. I don’t want somebody checking out the feed before adding it to their reader to be confused by seeing years old posts first. From what I’ve seen, other websites also put the entries in reverse chronological order, so there’s even a precendent for doing it this way.

The second problem comes from an implementation detail of my blog - if you look at the newest post, you can notice that it doesn’t have a date. See, my idea with the Git repo is that each new post gets exactly one commit, and every commit represents a valid and consistent state of the whole blog. So the commit includes the new homepage AND the new feed. Which does pose a problem for my feed generator script, because it wants to get the date of the last commit for a given Markdown file from the Git repository, but that file doesn’t exist yet.

I didn’t catch this while writing the script because all the posts were in Git at the time. And you can’t really expect me to think too hard about where the data comes from while I’m finally starting to fall asleep (I find it kind of funny how you can tell the exact point where I got tired and my goal switched from “let’s do something productive while I can’t sleep” to “let’s get this done ASAP and go to bed” in that post).

The Fix

cat entries.list | BLOG_DIR=$(pwd) blog-atom > "html/rss.xml"

tac entries.list | BLOG_DIR=$(pwd) blog-atom > "html/rss.xml"

That’s it. Now the second problemIt actually shows a design issue with my decision to store the generated files along with the sources, practically duplicating the information. I don’t think I thought of the generated files as being easily regenerated at any time when I made the decision to do it this way (also I probably didn’t know how to make the VPS that hosted my blog at that point run scripts upon getting commits pushed to the Git repo; maybe I didn’t even realize it was possible - that would definitely push me towards the current way of doing things).

Nowadays I much prefer having a “single source of truth, no data duplication” architecture which would dictate that the generated files don’t get pushed to the repository and simply get generated whenever I want to push updates to the actual website. And that would be after each commit, which solves the problem. I could even bring back some automation for my blog - currently my blog is served by my managed web host, and I just manually copy the generated html directory to the FTP server after each commit (it’s not like doing this once a year, or once in five years, has a significant toll on me). But I could have my home server just generate the blog after each Git push, and then copy the result to the web host.

I mean, there’s nothing stopping me from doing that with the current model where generated files are in the repository, I was just lazy to implement it. It’s just that if I’m making changes to the setup, I could as well make some bigger changes.

Right, that solves everything - I’ll just split the publishing step in my blog script into publish and generate, remove the html directory from the Git repo and add it to .gitignore. And then I’ll add some hooks to the server to push the updates to the website. Should be easy. There isn’t really much to write about that, so I don’t think I will write anything - maybe if I learn something interesting about Git hooks that doesn’t seem to be comprehensively put together elsewhere on the web, but that seems pretty unlikely.