Atom From Scratch

Wednesday. Soon Thursday. Can’t sleep. Let’s make an Atom feed generator for my blog - that should be fun, right?

First, let’s set today’s goals: I want to write a script that takes the list of blog entries currently used for generating the main page and turns it into an Atom feed. I don’t expect it to be challenging in any way and wouldn’t write a post about it if it wasn’t for the fact that I want to get into the habbit of writing more - not necessarily blog posts, just non-technical text in general.

Getting Started

The RFC 4287 should be a good starting point of today’s adventure. And would you look at that, there’s even a helpful example of a tiny valid Atom feed right on the third page, so we can get an idea of what’s required even without studying the RFC in-depth (or reading it at all, really).

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Example Feed</title>
  <link href="http://example.org/"/>
  <updated>2003-12-13T18:30:02Z</updated>
  <author>
    <name>John Doe</name>
  </author>
  <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>

  <entry>
    <title>Atom-Powered Robots Run Amok</title>
    <link href="http://example.org/2003/12/13/atom03"/>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <summary>Some text.</summary>
  </entry>

</feed>

Well, most of it looks pretty self-describing. Which, by the way, means it’s a great example. However, there’s one thing that caught my eye which I’d rather not deal with: the entry and feed IDs are UUIDs. I have nothing against UUIDs as such, but using them for my posts would require me to store them somewhere (most likely together with creation dates in the entry list). And I don’t want to do that if I can easily avoid it - permalinks would be unique and stable enough for my use case, so let’s check what kinds of IDs can actually be used.

IRIs, URIs and URLs

The <id> element is described in Section 4.2.6 as an IRI (Internationalized Resource Identifier), which is basically a URI (Uniform Resource Identifier) with the allowed character set extended to large part of UCS (Unicode). And any URL (Uniform Resource Locator) is also a URI - URIs just allow for schemas that only provide an object name without representing its location. It seems to me like the difference is purely in semantics.

All my links consist purely of ASCII characters with no need for percent-encoding, so the URLs of my posts are also IRIs, and therefore can be used for the <id>s (as long as I can guarantee they’re unique, which I can).

The Wrap-up

Yeah, that’s pretty much all there is to say, and the script is trivial - look:

#!/bin/sh
set -ueo pipefail

SITE_ROOT="https://markaos.cz"

if [[ ! -d "$BLOG_DIR" ]]; then
  echo 'BLOG_DIR must be pointing to a directory' >&2
  exit 2
fi

META_DIR="$BLOG_DIR/meta"
cd "$BLOG_DIR"

echo '<?xml version="1.0" encoding="utf-8"?>'
echo '<feed xmlns="http://www.w3.org/2005/Atom">'
echo "  <title>Markaos's Blog</title>"
echo '  <link href="https://markaos.cz/" />'
echo '  <link rel="self" href="https://markaos.cz/rss.xml" />'
echo "  <updated>$(date -Iseconds)</updated>"
echo '  <author>'
echo '    <name>Marek Černoch</name>'
echo '  </author>'
echo '  <id>https://markaos.cz/</id>'

while IFS= read line; do
  # Yeah, this could be done with read. Fight me
  entry_name=$( cut -d' ' -f1 <<< $line )
  entry_date=$( cut -d' ' -f2 <<< $line )

  entry_meta="$META_DIR/${entry_name}.meta"
  if [[ ! -f "$entry_meta" ]]; then
    echo "$entry_meta doesn't exist" >&2
    exit 3
  fi

  entry_title=$( head -n1 "$entry_meta" )
  entry_summary=$( tail -n+2 "$entry_meta" )
  entry_link="$SITE_ROOT/${entry_name}.html"

  echo
  echo '  <entry>'
  echo "    <title>$entry_title</title>" # No <> allowed in titles and summaries
  echo "    <link href=\"${entry_link}\" />"
  echo "    <id>${entry_link}</id>"
  echo "    <updated>$( git log --format=%aI "md/${entry_name}.md" )</updated>"
  echo "    <summary>${entry_summary}</summary>"
  echo '  </entry>'
done

echo '</feed>'

One new thing I did actually learn from writing this script is that you can override Git’s repository detection by setting GIT_DIR environment variable, which I intended to use instead of BLOG_DIR. It took me a bit until I realized why Git suddenly complains that ~/git/blog isn’t a Git repository.

Here’s also the generated feed, just for future reference:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Markaos's Blog</title>
  <link href="https://markaos.cz/" />
  <link rel="self" href="https://markaos.cz/rss.xml" />
  <updated>2024-09-12T01:11:51+02:00</updated>
  <author>
    <name>Marek Černoch</name>
  </author>
  <id>https://markaos.cz/</id>

  <entry>
    <title>Pet Language</title>
    <link href="https://markaos.cz/pet-language.html" />
    <id>https://markaos.cz/pet-language.html</id>
    <updated>2018-09-30T11:33:03+02:00</updated>
    <summary>It is probably more than a year since I started development (or designing  phase) of my last pet language, Nothing 5 , so I decided it's time to start a new one - this time with a "purpose" (more on that later). </summary>
  </entry>

  <entry>
    <title>Purpose of This Blog</title>
    <link href="https://markaos.cz/purpose-of-this-blog.html" />
    <id>https://markaos.cz/purpose-of-this-blog.html</id>
    <updated>2018-09-30T11:33:03+02:00</updated>
    <summary>There are a few questions that have been bugging me for a while now: why am I writing this blog? Who will ever read it? Why would anyone ever read it? </summary>
  </entry>

  <entry>
    <title>The 2019 Blog Update</title>
    <link href="https://markaos.cz/2019-update.html" />
    <id>https://markaos.cz/2019-update.html</id>
    <updated>2019-12-22T00:32:41+01:00</updated>
    <summary>Over a year has passed since the last blog post and I've just remembered this blog exists. What a time to make a new meta entry... </summary>
  </entry>

  <entry>
    <title>Fun With a Chinese Bluepill Clone</title>
    <link href="https://markaos.cz/bluepill-fun.html" />
    <id>https://markaos.cz/bluepill-fun.html</id>
    <updated>2020-01-07T19:56:02+01:00</updated>
    <summary>When I saw a Bluepill board for under two dollars, I simply couldn't resist. I had to buy it even though I knew it was a clone so there were bound to be some issues eventually. </summary>
  </entry>

  <entry>
    <title>CPLD Adventures</title>
    <link href="https://markaos.cz/cpld-adventure.html" />
    <id>https://markaos.cz/cpld-adventure.html</id>
    <updated>2024-08-03T14:57:20+02:00</updated>
    <summary>I bought a cheap minimal board with EPM240 CPLD for some experiments. Fun ensued.</summary>
  </entry>
</feed>

Alright, time to try falling asleep again.

Bye