alexwlchan

Using static websites for tiny archives

2024-10-16T18:26:52+00:00

In my previous post, I talked about how I’m trying to be more intentional and deliberate with my digital data. I don’t just want to keep everything – I want to keep stuff that I’m actually going to look at again. As part of that process, I’m trying to be better about organising my files. Keeping something is pointless if I can’t find it later.

Over the last year or so, I’ve been creating static websites to browse my local archives. I’ve done this for a variety of collections, including:

paperwork I’ve scanned
documents I’ve created
screenshots I’ve taken
web pages I’ve bookmarked
video and audio files I’ve saved

I create one website per collection, each with a different design, suited to the files it describes. For example, my collection of screenshots is shown as a grid of images, my bookmarks are a series of text links, and my videos are a list with a mixture of thumbnails and text.

These websites aren’t complicated – they’re just meant to be a slightly nicer way of browsing files than I get in the macOS Finder. I can put more metadata on the page, and build my own ways to search and organise the files.

Each collection is a folder on my local disk, and the website is one or more HTML files in the root of that folder. To use the website, I open the HTML files in my web browser.

This is what my screenshots website looks like. The individual images are stored in per-year folders, there's some machine-readable metadata in metadata.js, and I can double-click index.html to open the file in my browser and use the website. The HTML file uses the metadata to render the grid of images.

I’m deliberately going low-scale, low-tech. There’s no web server, no build system, no dependencies, and no JavaScript frameworks. I’m writing everything by hand, which is very manageable for small projects. Each website is a few hundred lines of code at most.

Because this system has no moving parts, and it’s just files on a disk, I hope it will last a long time. I’ve already migrated a lot of my files to this approach, and I’m pleased with how it’s going. I get all the simplicity and portability of a file full of folders, with just a bit of extra functionality sprinkled on top.

How did I get to static websites?

Before static websites, I tried other approaches for organising my files, but they never stuck.

I’ve made several attempts to use files and folders, the plain filesystem. Where I always struggled is that folders require you to use hierarchical organisation, and everything has to be stored in exactly one place. That works well for some data – all my code, for example – but I find it more difficult for media. I could never design a hierarchy that I was happy with. I’d stall on organising files because I was unsure of which folder to put them in, and I ended up with a disorganised mess of a desktop.

I much prefer the flexibility of keyword tagging. Rather than put a file in a single category, I can apply multiple labels and use any of them to find the file later. The macOS Finder does support tagging, but I’ve always found its implementation to be a bit lacklustre, and I don’t want to use it for anything serious.

When I was younger, I tried “everything buckets” like DEVONThink, Evernote, and Yojimbo. I know lots of people like this sort of app, but I could never get into them. I always felt like I had to wrap my brain around the app’s way of thinking – changing myself to fit the app’s approach, not the other way round.

Once I had some programming experience, I tried writing my own tools to organise my files. I made at least a dozen attempts at this, the last of which was docstore. Building my own tool meant I got something that was a closer match to my mental model, but now I was on the hook for maintenance. Every time I upgraded Python or updated macOS, something would break and I’d have to dive into the the code to fix it. These tools never required a lot of ongoing work, but it was enough to be annoying.

Every time I gave up on an app, I had another go at using plain files and folders. They’re the default way to organise files on my Mac. They’re lightweight, portable, easy to back up, and I expect to be able to read them for many years to come. But the limited support for custom metadata and keyword tags was always a deal breaker.

At some point I realised I could solve these problems by turning folders into mini-websites. I could create an HTML file in the top-level folder, which could be an index – a list of all the files, displayed with all the custom metadata and tags I wanted.

This allowed me to radically simplify the folder structure, and stop chasing the perfect hierarchy. In these mini-websites, I use very basic folders – files are either grouped by year or by first letter of their filename. I only look at the folders when I’m adding new files, and never for browsing. When I’m looking for files, I always use the website. The website can use keyword tags to let me find files in multiple ways, and abstract away the details of the underlying folders.

HTML is low maintenance, it’s flexible, and it’s not going anywhere. It’s the foundation of the entire web, and pretty much every modern computer has a web browser that can render HTML pages. These files will be usable for a very long time – probably decades, if not more.

(I still have the first website I made, for a school class in 2006. It renders flawlessly in a modern browser. I feel safe betting on HTML.)

Emphasis on “tiny”

I’m doing a lot of this by hand – organising the files, writing the metadata, building the viewers. This doesn’t scale to a large collection. Even storing a few hundred items this way takes a non-trivial amount of time – but I actually like that.

Introducing a bit of friction is helping me to decide what I really care about saving. What’s worth taking the time to organise properly, and what can’t I be bothered with? If I don’t want to take even a minute to save it, am I going to look at it again? But if I do save something, I’ve become more willing to take time to write proper metadata, in a way that will make it easier to find later.

I used to have large, amorphous folders where I collected en masse. I had thousands of poorly organised files and I couldn’t find anything, so I never looked at what I’d saved. Now I have tiny websites with a few hundred items which are carefully selected and usefully described.

Even though I usually love automation, I’m enjoying some of the constraints imposed by a more manual process.

Prior art

Using a static website like this isn’t new – my inspiration was Twitter’s account export, which gives you a mini-website you can browse locally. I’ve seen several other social media platforms that give you a website as a human-friendly way to browse your data.

I think this could be a powerful idea for digital preservation, as a way to describe born-digital archives. All the benefits of simplicity, longevity, and low maintenance are even more valuable in a memory institution where you want to preserve something for decades or centuries. (And HTML is so low-tech, you can create a basic HTML website on any computer with just the built-in notepad or text editor. No IT support required!)

It’s been exciting to explore this idea at work, where we’re building larger static websites as part of our Data Lifeboat project. This is a way to package up an archival sliver from Flickr. Where my local archives are typically just a list view, the website inside a Data Lifeboat has more pages and functionality. And while I was finishing this article, I saw a post from Ed Summers about creating static sites as a way to preserve Historypin.

I’d love to this static websites get more use as a preservation tool.

I already have a lot of files, which are sprawled across my disk. I’d love to consolidate them all in this new approach, but that would be a tremendous amount of work. My colleague Jessamyn wrote about this in a follow-up to my digital decluttering article: “no one is ever starting at the beginning, not in 2024”.

Rather than moving everything at once, I’m doing a little at a time. As I create new files, I’m saving them into static websites. As I look for older files, I’m pulling them out of their existing storage and moving them into the appropriate static site folder.

I’m enjoying this approach, so I’m going to keep using it. What I particularly like is that the maintenance burden has been essentially zero – once I set up the initial site structure, I haven’t had to do anything to keep it working.

If you’ve never written a website and it’s something you want to try, have a look at Blake Watson’s new book HTML for People. “I feel strongly that anyone should be able to make a website with HTML if they want. This book will teach you how to do just that.”. I love that philosophy. I’m only a third of the way through, but already I can tell this is a great resource.

For a long time, I thought of HTML as a tool for publishing on the web, a way to create websites that other people can look at. But all these websites I’m creating are my local, personal archives – just for me. I’m surprised it took me this long to realise HTML isn’t just for sharing on the web.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Making alt text more visible

2024-10-14T19:26:43+00:00

I add alt text to every image on this site. I have an automated check to remind me to add alt text before I publish the site, but that means alt text has often been an afterthought – something I’d dash out at the very end of writing a post. I wanted to give it more attention, and make it part of my editing workflow.

When I’m doing a local build of the site, I now add this JavaScript snippet to every page:

Here’s what it looks like on an image, before and after I added alt text:

This snippet will look for any tags on the page, and check them for alt text. If the image has alt text, it adds the text below the image on a green background. If the image doesn’t have alt text, it adds a warning below the image on a red background.

The snippet only gets included in the page if I’m doing a local build. This means that I’m the only person who sees these labels – they aren’t shown on the live site.

These labels make the alt text more visible to me, and remind me to write the alt text as part of writing the article. It also means that I can see the alt text when I’m editing the article. Previously the alt text was only visible in my Markdown source files, so once written I’d never review it or get a chance to improve it. This means that I’m spending more time on my alt text, I’m getting more practice at writing it, and hopefully it’s improving as a result.

Although I use Jekyll to build my site, there’s nothing Jekyll-specific about it. You could use this snippet to add alt text labels to any site.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Flickr Foundation at iPres 2024 →

2024-10-07T14:47:18+00:00

I wrote about a recent conference trip for the Flickr.org blog:

In September, Tori and I went to Belgium for iPres 2024. We were keen to chat about digital preservation and discuss some of our ideas for Data Lifeboat – and enjoy a few Belgian waffles, of course!

We ran a workshop called “How do you preserve 50 billion photos?” to talk about the challenges of archiving social media at scale. We had about 30 people join us for a lively discussion. Sadly we don’t have any photos of the workshop, but we did come away with a lot to think about, and we wanted to share some of the ideas that emerged.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Two examples of hover styles on images

2024-10-06T21:22:28+00:00

I enjoy adding :hover styles to my websites. A good hover style reminds me of how fast and responsive our computers can be, if we let them. For example, I add a thicker underline when you hover over a link on this site, and it appears/disappears almost instantly as I move my cursor around. It feels snappy; it makes me smile.

I want to show you a pair of hover states I’ve been trying for images.

Adding an image border on hover

If I’m showing a small preview of an image that’s a clickable link to the full-sized version, I like to add a coloured border when you hover over it. It’s a visual clue that something will happen when you click, and “see the big version of the image” is a pretty normal thing to happen.

Initially I implemented this by adding a border property on hover, for example:

a:hover img {
  border: 10px solid red;  /* don't do this */
}

But a border takes up room on the page, which causes everything to get rearranged around it. Everything moves to make space for the border that just appeared, which is precisely what I don’t want. I want a subtle hover, not a disruptive one!

There are ways you can prevent this movement, but I couldn’t get them to work in a way I found satisfactory. For example, you could add a negative margin to offset the border, or an always-on transparent border that only changes colour when you hover – but those can interfere with other CSS rules. It became a game of whack-a-mole to make all my margins work in a consistent way.

The better approach I’ve found is to add a box-shadow with no blur – this looks like a border, but it’s purely visual and doesn’t take up any space on the page. This is the rule I use:

a:hover img {
  /* Three length values and a colour */
  /*  |  |  |  */
  box-shadow: 0 0 0 10px red;
}

Here’s a demo of both approaches. Notice how the rest of the page moves around when you add a border, but not when you add a box-shadow:

border

box-shadow

(If you’re on a device that doesn’t support hovering, you can toggle the hover styles manually.)

Changing the colour of icons on hover

A while ago I added social media links to the footer of this website. I displayed them as subtle, monochrome icons, to avoid overwhelming the footer with an explosion of different brand colours. I thought it would be fun to show the site’s brand colour when you hovered over the icon. If, say, you hover over the bird icon and see Twitter’s shade of blue, it’s a subtle confirmation that this is indeed a link to my Twitter profile.

Here are all the icons I had in this system:

(If you’re on a device that doesn’t support hovering, you can toggle the hover styles manually.)

The brand icons come from the websites themselves; the generic icons are from the Noun Project. When I downloaded the icon, I typically got an SVG file with one or more path elements that defined the icon’s shape.

For example, this is what I got in the SVG file for the email icon:

 width="1200pt" height="1200pt" version="1.1" viewBox="0 0 1200 1200" xmlns="http://www.w3.org/2000/svg">
  d="m323.46 411.07 …"/>

To turn this into my footer icon, I wrapped the path in a slightly more complex SVG:

 width="30px" height="30px" version="1.1" viewBox="0 0 950 950" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
  
     id="envelopeIcon">
       x="-125" y="-125">
         d="m323.46 411.07 …"/>
      
     id="envelope">
       x="0" y="0" width="950" height="950" fill="white"/>
       xlink:href="#envelopeIcon" fill="black"/>
    
   cx="475" cy="475" r="450" class="background"/>
   cx="475" cy="475" r="475" class="foreground" mask="url(#envelope)"/>

First I define a shape envelopeIcon which uses that original path, and defines the shape of the element as a reusable group. Then I define an SVG mask envelope that blocks out the envelope shape. Finally, I define two shapes that actually get drawn: a foreground and a background.

Here’s a 3D view of the two shapes, so you can see more clearly how they form a set of layers:

Now I have two elements that I can style independently. For example:

.email .foreground { fill: gray; }
.email .background { fill: none; }

a:hover .email .foreground { fill: blue;  }
a:hover .email .background { fill: white; }

This technique can be extended to more complex icons – split it into multiple elements, and style each one independently.

It took me a while to get these icons working as I wanted, and I remember trying some quite fiddly hacks. As I was writing this post, I was pleasantly surprised to discover that the hover styles aren’t as complicated as I thought. I just had to figure out the general approach.

One thing I’m proud of is how readable this site is without CSS. I made these icons look good on a sans-CSS page by adding inline fill attributes to the background/foreground elements (e.g. ). These inline styles get applied even if CSS is broken or disabled. I lose the interactivity, but the icon is still legible.

By the time you read this, those icons will have vanished from the footer. They were fun for me to work on, but almost nobody used them and they added almost 8KB of HTML to every page. That might not seem like much, until I tell you the average page size was a slender 21.1KB – nearly 40% of my HTML was spent on social media links nobody was clicking!

[If the formatting of this post looks odd in your feed reader, visit the original article]

Drawing a better bandwidth graph for Netlify

2024-09-23T08:37:05+00:00

I currently host this site on Netlify’s Starter plan, which means I can serve 100GB of bandwidth per month. That’s usually plenty, and I’ve only exceeded it a few times – this site is mostly text, and I only have a modest audience.

I can see how much bandwidth I’ve used in the Netlify dashboard:

Because this graph is only available in the Netlify dashboard, I don’t see it very often. I log into this dashboard very rarely, because all of my day-to-day activity is automated using GitHub Actions.

Even if I did log into the dashboard, I don’t like this graph – my billing periods start on the 17th of the month, and it’s hard to make a quick assessment of how likely I am to exceed my bandwidth allowance for the month. If I’ve used 16GB on the last day of the month, I’m fine. If I’ve used 16GB on the first day of the month, I’m going to have issues.

I want to know if I’m using too much bandwidth, because there might be changes I can make to reduce how much bandwidth I’m using. For example, if a post has gone viral, I might be able to make the images in that post smaller.

Fortunately my Netlify bandwidth data is exposed through an API, so I was able to build my own version of this graph. My new graph makes it easier for me to see if I’m likely to exceed my bandwidth allowance, and it lives in my analytics dashboard where I’m more likely to see it.

Getting my bandwidth data

Netlify exposes your bandwidth data through an undocumented API endpoint. There are three steps to use this API:

Create a personal access token in your user settings.
Get your team account slug. You can find this in the Netlify dashboard under Team settings > General > Team details > Team information.

Call the Netlify API to get the data. You should get a 200 OK and a JSON response:

$ curl \
    --header "Authorization: Bearer $NETLIFY_TOKEN" \
    "https://api.netlify.com/api/v1/accounts/$TEAM_ACCOUNT_SLUG/bandwidth"
{
  "used":                                 17783181573,
  "included":                            107374182400,
  "additional":                                     0,
  "last_updated_at":   "2024-09-21T04:57:41.130+00:00",
  "period_start_date": "2024-09-17T00:00:00.000-07:00",
  "period_end_date":   "2024-10-17T00:00:00.000-07:00"
}

The used value is how many bytes you’ve used, and included is how many bytes are available in your plan. Note that Netlify’s limits are in gibibytes, not gigabytes – so although their marketing page says “100 GB”, it’s actually 100 GiB, or 100 × 2^30 bytes.

I’ve never seen additional be anything other than 0 – I suspect you’d see something there if you buy additional bandwidth packs, but at $55/100GB that’s always been too pricey for me. (If I ever exceed my 100GB allowance, I pay $19 for a month of a Pro subscription which gets me 1TB bandwidth – saving nearly $500. I’ve never exceeded 1TB of bandwidth in a month.)

The bandwidth data isn’t updated in real time; as best I can tell it’s every few minutes. The API response includes a Retry-After header to help you avoid wasting time on unnecessary requests.

Drawing this as a pie chart

I wanted a chart that would show me how much of my bandwidth I’d used, and how much of the billing period had passed. I could do this with a pair of bars next to each other, but I prefer circular pie charts for this sort of data – I find it easier for me to judge the proportions. Is it a quarter filled, a third, a half, and so on.

I decided to draw the data as a two-part pie chart. There’d be an inner wedge which shows how much of my bandwidth I’ve used, and an outer arc which shows how much of the month has passed. If the wedge is smaller than the arc, everything’s fine – I’m on track to use less bandwidth than my allowance for the month. If the wedge is larger than the arc, I need to keep an eye on things, in case I use all my bandwidth.

Let’s look at a few examples, which will make this clearer.

This is my current usage – I’m using bandwidth slightly faster than the billing period is passing, but not by much. I need to keep an eye on this as the month passes.

Here I'm only just halfway through the month, but I’ve used nearly all my allowance – I need to slow down, and maybe buy an extra bandwidth pack. Fortunately, this is rare.

I’m three-quarters through the month, but I’ve only used about half my bandwidth allowance. This is pretty normal, and I don't need to make any changes.

These charts aren’t meant to convey detailed information; just a quick glance to help me decide whether I need to do something. I have them embedded in my analytics dashboard, which I check at least once a week.

Several years ago I wrote notes on drawing circular arcs in SVG, which were useful when drawing these graphs. This particular implementation is JavaScript, and if you want to read the code in more detail, I’ve created a little demo page.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Going between Finder and the Terminal

2024-09-18T06:45:46+00:00

Earlier this week, Dr. Drang wrote a post about a few automations he uses to go between the Terminal’s command line and the Finder’s GUI. He has some neat ideas, and I particularly like his AppleScript sel for selecting multiple items in Finder. I’ve written a couple of similar scripts, and I thought they were worth sharing.

`reveal` (aka `open -R`)

The open command is a versatile tool. It’s normally used to open files, but you can pass the -R flag to reveal a file or folder in Finder instead:

$ open -R ~/repos/alexwlchan.net/README.md

When you run this command with the path to a folder, it activates Finder and opens a window to that folder.

When you run this command with the path to a file, it activates Finder and opens a window to the folder that contains that file. It also highlights the file in the folder:

This is useful if I’m working on a file in the terminal, and I want to do something with it in the Finder. I’ve used it as I clean up my messy data – I run terminal scripts to identify files I could purge, then I open them in Finder and check they’re the right file before actually deleting them. I also use it when I’m looking for files – I have directories with hundreds of files, and path-completion in the fish shell can make it easier to find a file in Terminal than in Finder.

I wrap this in a small shell script called reveal:

#!/usr/bin/env sh
# Reveal the file/folder in the Finder.
#
# Usage examples:
#
#     reveal .                    # reveal the current folder
#     reveal ~/Desktop            # reveal a folder
#     reveal ~/Desktop/cat.jpg    # reveal a file
#

set -o errexit
set -o nounset

open -R "$@"

This alias might seem redundant – reveal is only one character shorter than open -R – but the purpose of aliases isn’t just to save typing; it’s also to make things easier to find.

I’ve forgotten open -R half a dozen times because I think of “open” as “open a document in an application”. Browsing files in the Finder feels like a different activity, so I never remember I can do Finder-related things with the open command.

The word “reveal” is a better fit for how I think of this action, so I actually remember to use it.

A couple of notes:

You can use open -R with an absolute path, or with a path that’s relative to your working directory. Both of these commands reveal the same file:
```
$ open -R ~/repos/alexwlchan.net/README.md
$ cd ~/repos/alexwlchan.net; open -R README.md
```
It also supports passing file:/// URIs, which can point to folders or files:
```
$ open -R file:///Users/alexwlchan/repos/alexwlchan.net
$ open -R file:///Users/alexwlchan/repos/alexwlchan.net/README.md
```
I use these a lot when I’m testing static websites on my Mac.
open -R is smart enough to know whether you already have the relevant folder open in a Finder window. If you already have a Finder window open in the folder, it brings that window to the front and highlights the file, rather than creating a new window.
I only use open -R with one file at a time. If you try to reveal multiple files this way, it opens a new Finder window for every file – even if they’re in the same folder. If you want to select multiple files in a single Finder window, you should look at Dr. Drang’s sel script.

`ffile`

Suppose I’m working in the Finder, and I want to run a script against a file. I structure a lot of my scripts so they take a path to a file as a command-line argument, then do some processing on that file. If it’s easier to find the file in Finder than in the Terminal, I need a way to get the path of files I’m looking at into the Terminal. I could find the file manually, or drag the file’s icon into the Terminal (which inserts the complete path for you), but I find both of those too slow.

I have an AppleScript ffile that prints the path of the file I have selected:

#!/usr/bin/env osascript
# Get the path to the first item which is selected in the
# frontmost Finder window.

tell application "Finder"
  get POSIX path of first item of (selection as alias list)
end tell

I picked this name because it makes me think “finder file”.

I can use ffile to get the path to the selected file I have in Finder, or pass that path to another script:

$ ffile
/Users/alexwlchan/repos/alexwlchan.net/README.md

$ python3 run_my_script.py (ffile)

This gives me an efficient way to run the same script over multiple files, one-by-one. I select the first file in Finder, and write my command using ffile. Then when I select the next file in Finder, I can use up arrow, enter to select the previous command from history, and run it unmodified – because ffile will get the path to the new file. I can run the same command repeatedly, without having to copy a new path each time.

This only gets the first item of the selection, which is fine for me. Here “first” means “the first item you clicked” – Finder remembers the order that you selected your files. If I want to get the paths of multiple files, I usually drag the selection from Finder into the Terminal.

`trash`

There are two ways to delete files on macOS.

If you’re in the Terminal, you can use the Unix rm command to remove a file, which deletes it immediately. If you’re in Finder, you can move the file to the Trash. That leaves the file on disk, but gives you a safety net to recover the file until you empty the Trash.

If I want the safety net of the Trash when I’m deleting a file from the Terminal, I could use ffile to reveal the file in Finder and delete it from there – but that’s one too many steps. Instead, I wrote an AppleScript trash.scpt that moves files and folders to the Trash:

#!/usr/bin/env osascript
# Move one or more files to the Trash in macOS.
#
# Examples:
#
#     osascript trash.scpt ~/Desktop/cat.jpg     # Trash a single file
#     osascript trash.scpt ~/Desktop/photos      # Trash a folder
#     osascript trash.scpt dog.png fish.gif      # Trash multiple files at once
#

on run argv
  repeat with filePath in argv
    set posixPath to (POSIX file filePath)

    tell application "Finder"
      if posixPath exists
        log filePath
        delete posixPath
      end if
    end tell
  end repeat
end run

The key is the delete command in the Finder’s AppleScript dictionary, which deletes a file by moving it to the Trash rather than deleting it from disk.

I get the command-line arguments with on run argv, and iterate over them using repeat with filePath in argv.

I have to convert the path to a POSIX file before I pass it to the delete command. I don’t really understand how paths and aliases work in AppleScript; it’s always trial and error until I get something that works.

The delete command fails if you try to delete a non-existent file, so I added a manual check that the file exists. In theory this exposes a race condition, where the file could be deleted between my check and calling the delete command, but that’s very unlikely to happen on my local disk.

The output from this script is a bit strange – I’m manually printing the names of the files as they get deleted (log filePath), but it also prints an AppleScript log for the last file that gets deleted:

$ osascript trash.scpt dog.png fish.gif
dog.png
fish.gif
document file fish 07.51.23.gif of item .Trash of folder alexwlchan of folder Users of startup disk

To improve the output, I’ve wrapped the AppleScript in a shell script:

#!/usr/bin/env bash
# Move one or more files to the Trash in macOS.
#
# Examples:
#
#     trash ~/Desktop/cat.jpg     # Trash a single file
#     trash ~/Desktop/photos      # Trash a folder
#     trash dog.png fish.gif      # Trash multiple files at once
#

osascript trash.scpt "$@" 2>&1 1>/dev/null

The "$@" forwards all the arguments from the shell script to the underlying AppleScript. The AppleScript log (document file …) is being printed to stdout, which I’m discarding with 1>/dev/null. My manual logs (log filePath) are being printed to stderr, which I’m redirecting to stdout with 2>&1. It’s awkward but it does the job.

This script isn’t as robust as I’d like, but it’s good enough, and I have an easy exit hatch if something goes wrong. I’ve used it plenty of times and never had any issues, but if it ever breaks I can always open the file in Finder and trash it that way instead.

You can also move an item to the Trash using Swift, so maybe I’ll rewrite this script if AppleScript ever flakes out on me – but for now, why replace something that works?

[If the formatting of this post looks odd in your feed reader, visit the original article]

Digital decluttering

2024-09-13T17:38:59+00:00

I spent a lot of my formative Internet years in online fandom. I read novel-length stories about Doctor Who characters; I swooned over fan art of the Lizzie Bennet Diaries; I pored over in-depth analyses of each episode of Carmilla.

Most of that is gone now.

Links rot quickly, and much of that early-2010s fan culture has vanished from the Internet. Accounts get deleted, websites go down, domain names expire. Every fan learns that if there’s something you love, you can’t rely on the Internet to keep it safe. You have to save your own copy.

So I started saving, and I kept saving. Digital storage was cheap and abundant, and I could afford to keep local copies of everything. I didn’t have to be picky with what I was saving.

I kept every photo I’d taken, every tweet I’d written, every link I’d read. If you look at my early posts or my old GitHub repos, you’ll see my excitement at code that could scrape down data and squirrel it away on my hard drive. I always thought “what if I want to go back to this someday?”

In hindsight, I was being was excessive. I could throw away 95% of this data and I’d be fine. There are some things that I’m glad I saved, but there’s an awful lot of other stuff that I don’t actually care about. I’ve never gone back to it, and I know I’m not going to look at it again.

I was also disorganised. I was more interested in making sure I had a copy of each file somewhere, and less so whether I could actually find it. My files were stored in a messy collection of folders, and it was difficult to find that 5% I actually care about.

I want to simplify, and store less data.

As I save new stuff today, I’m trying to be more intentional and selective. I ask myself “when will you want to look at this again?” If I can’t imagine a scenario in which I’d be glad to have saved this particular thing, I don’t save it. I’ve been collecting for fifteen years, and I know what I go back to. Heartfelt stories and in-depth essays? Yes. Current affairs and political news? Not so much.

If I decide to save something, I make sure to write a brief description and notes on why I thought it was worth saving, and to organise it properly so I can find it later. I’m saving less stuff, but it’s all the stuff I really love, and I can find it when I want to.

I’ve been retroactively applying similar rules to old stuff – what have I thought about since I saved it, and what have I completely forgotten? I’ve been going through all my old data, deleting what I don’t want and organising what I do. It’s a slow and arduous process, because there’s no easy way to automate it – ultimately, I have to look at every item and decide if it’s worth keeping.

Storage is only cheap at the point of purchase – the costs are felt elsewhere. The material used to make hard drives; the electricity and water to power the cloud data centres; the cognitive load of owning so much stuff. I don’t think it’s sustainable for me to have such a fast-growing pile of data. I’m feeling the questionable choices made by my younger self.

Modern storage has made it possible for me to keep everything, but that doesn’t mean I should. “As much as I can get” isn’t a collecting strategy; it’s hoarding.

Three examples of digital clutter

It took me years to accumulate my data, so I’m not going to clear it in a day. I’ve already spent months chipping away at it, and there’s plenty still to go.

Let’s look at a few examples of what I’ve been tidying.

1. My photo library

I used to keep every photo I took. I was snapping away on my phone, and uploading everything to iCloud Photo Library and my local hard drive.

Some of those photos record precious memories, and I’m so glad that I have them. But a lot of them are complete junk – blurry shots, duplicate pictures, long-passed reminders of something I had to do. (I had so many photos of books I wanted to read.)

The bad photos were drowning out the good ones, so I’ve been deleting them. I wrote a small Mac app to go through every photo in my camera roll, and help me choose what to keep.

Photos of places and objects were easy to review. If I didn’t recognise the subject, I deleted it. I binned a lot of generic landscapes and builings that I’ve forgotten in the intervening years.

The photos of people were harder. I could see special moments, how much I’ve grown, and how much happier I became after I transitioned. But I was also embarrassed by the cringe things I did as a teenager, and reminded of the mistakes I’ve made as an adult. There are dear friends in those photos I haven’t spoken to in years. We didn’t fall out; our lives just drifted apart and we fell out of touch. It hurt to see all the good people who are no longer in my life.

It took eleven months to go through everything, and I’ve trimmed my library from 32,000 photos to 25,000. That still feels like a big number – more than 2 photos for every day I’ve been alive – but it’s an improvement.

I’m thinking about ways to be even more selective, and how to highlight the photos I actually care about – like putting them on my walls, or making some printed albums. I want to do more here.

2. My podcast collection

I used to keep every podcast episode I listened to.

I can trace this behaviour to a specific podcast called IRL Talk, which I listened to around 2012. It came at a formative moment in my life, and it’s associated with some fond memories – but it disappeared from the Internet after one of the hosts passed away. I was sad when I couldn’t listen to it any more, and delighted when somebody uploaded the entire run to the Internet Archive.

To avoid ever losing a podcast again, I wrote a Python script that would download every episode I played. I could get a complete list of those episodes from my podcast app, and it was easy to parse that list and shove the files in a folder.

But I never actually looked in the folder. A lot of podcasts I subscribe to are very timely, like tech news and politics – interesting in the moment, but I rarely listen more than once. I had thousands of MP3 files, but there weren’t many I wanted to listen to again. When I did want to go back to an episode, it was easier to find it again on the web than look in my local archive.

I threw the folder away, and built a new archive which is much more selective. It only contains a hundred or so episodes, and it’s just my favourites – the timeless episodes I’ve already listened to multiple times. These are the files I know I’ll want to go back to in future.

3. My bookmark archive

I’ve saved a lot of links, but I’m not going to read all of them again. News articles; outdated reference material; fics I don’t want to re-read – I saved them because it was so easy to create bookmarks, but I’ve never gone back to them.

I’m going through my bookmarks, reducing it to a more focused collection. What’s the stuff I actually want to remember? Like podcasts, I’m mostly keeping stuff I’ve already looked back on. I’m saving evergreen writing that doesn’t go out of date, and web pages which have lodged themselves in my memory.

As I go, I’ve been improving the metadata of bookmarks I decide to keep. I’m adding proper descriptions and summaries, so I know what a link contains and why it’s worth keeping. This is a useful litmus test – if I don’t care enough to write this small amount of metadata, it’s probably not that important.

I’m also checking my backups. Between my own scripts and Pinboard’s archiving service, I should have an offline copy of each bookmark, in case the original goes down. In practice, about a third of those backups are unusable – what’s saved is broken pages, 404 errors, paywalls and login screens. This is the risk of bulk, automated scraping of the web – nobody is checking whether the scraped page is actually useful. I’ve been checking each offline copy, and replacing it if it’s broken. It takes longer, but now I know that all of those files are useful backups, and not dead weight.

These two steps add more friction to the process, but I think that’s a good thing. It’s slowing me down, and making me more thoughtful about what I save – will I really want to read this link later, or am I just saving it for the sake of saving it?

I’ve significantly shrunk my digital footprint. My data used to be split across my internal disk and multiple external hard drives, and now it fits on my internal disk – much simpler. And the data I’m keeping is better organised, so I can actually find stuff I care about.

I have to be a bit careful. Refining data is a boundless task – there’s always more you could do. I don’t want to let my hyperfocus take over and make this the only thing I do. I’ve been doing it slowly, only taking a few minutes at a time.

This digital decluttering is part of a broader goal to be more intentional about how I use technology, and I’ve found it useful to introduce some constraints. I’m not buying the most powerful computer, or the fastest Internet connection, or the biggest disk – I’m adding deliberate limits, and that helps me stay focused on what’s really important.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Mountaintop moments

2024-08-26T10:06:14+00:00

Last weekend, I was in Edinburgh for the Fringe, and I’m so glad I went. I met new people, hung out with some Internet friends in-person, and saw some great theatre. It was fun, but also exhausting. By the final day, I was worn out and ready to sleep in my own bed – but now I’m home, I feel a bit sad to have left it all behind. I’ve had a deliberately quiet week – partly to rest, partly to give those happy memories time to sink in.

This happens to me a lot: I do something exciting, it tires me out and I want to go home. But once I’m back, I miss the excitement and I’m sad that it’s over.

I call events like these mountaintop moments. They’re any event or experience that lets us escape from our normal lives. Atop the metaphorical mountain, the air is cleaner, the sky and stars get closer, and the view stretches out in front of us. Given a temporary break from our usual routine, the world seems larger and more intense.

Being on the mountaintop gives us space to grow, and we can’t help but be changed by the experience. We have new adventures, we try new ideas, we meet new friends – and we learn more about ourselves. We need these transformative events in our lives.

But exciting as they are, eventually we have to come down from the mountain, and that hurts. All the energy and emotion is drained away as we return to our regular lives.

After a mountaintop moment, normal life can feel dull and subdued. I always feel a bit miserable after a big event, and it takes me a few days to readjust. I’m glad to return to the familiarity of my routine, and have a chance to rest – but I remember the emotional high, and that’s hard to match in my day-to-day.

In the immediate aftermath, it’s tempting to wonder if we can stay on the mountain. What if we didn’t come down? What if we made this our life?

But we can’t stay there – if we want them to be elevated, transformative experiences, they have to be set apart from our daily lives. If we stay on the top of the mountain, it stops being a mountaintop to us. It would stop feeling special, and start to feel mundane. (How many people see less of their home than tourists, because it doesn’t seem remarkable to them?)

The point of the mountaintop is that we come back down. It’s a place to escape from our lives, yes, but it’s also a place to learn and grow. We come down, and we bring everything we learnt with us. We need to use our experience on the mountaintop to find ways to make our normal life better. Otherwise, what was the point of going up there?

We had an amazing moment, and eventually it’s over, and that’s sad. What comes next is focusing on the lessons we learned, the friendships we built, the experiences we had – and how we bring those into our lives, and use them to become a better person.

Mountains surrounding Lake Bohinj in Slovenia. This trip was a metaphorical mountaintop moment, even if I didn't climb high enough to make it a literal one. I’ve visited Slovenia twice, and it’s such a beautiful and peaceful place that I’m always charmed and rested by its vibes.

Before COVID, I was a regular attendee at PyCon UK, and those events were always a mountaintop moment. I got to spend a week in Cardiff, hang out with friends I only saw once a year, and learn cool stuff about Python. I always enjoyed myself, but I was worn out by the end of the final day.

PyCon UK 2017 ended on October 30th, and that same week Ash McAllan tweeted a thread about “mountaintop experiences”. She was talking about a completely different conference, but it was the same pattern – a big and exciting event, followed by an emotional crash.

When I read her thread, there was an instant resonance – I was having the feelings that she described. She talked about post-conference depression, and invited us to consider how we can grow from this sort of experience:

You go to the mountaintop to escape your life and to change. If you don’t go back down, and take that change, that growth, that development, those opportunities with you, what was the point?

This framing stuck in my brain, and I still think about it regularly.

Before I read this thread, I’d wallow in sadness after a big event, and wait for the memories to fade. Now, I still let myself grieve for the thing that’s over – but I also ask myself what I want to change in my life, and what I want to learn from those feelings.

This week I’m sad that I’m no longer in Edinburgh, because there were more shows I wanted to see. (I think everyone who goes to Fringe feels that.) While I was there, I saw several plays that dealt with heavy topics like heartbreak and grief. They left me really emotional, and in three days I sobbed my way through a whole packet of tissues.

I love that sort of theatre, and it makes me want to write something that makes people feel things. That’s a change from the technical prose I usually write, where I’m trying to be informative, not evocative. Writing about feelings and emotions is daunting, which is why I keep shying away from it. I’m only going to get better if I practice, and that’s why I’m writing this article.

I want to describe the feeling of mountaintop moments my own words, and reflect on what they’ve meant to me. I’ve shared this idea with friends when they’re struggling with post-event blues, and I’d like to think it’s helped at least a few of them. I hope it can help you too.

A view from the top of the Uetliberg, at the end of a trip to Zurich to see my friends Rae and David. Fun fact: it was Rae who sent me a link to Ash's Twitter thread back in 2017.

I’ve learnt lots of things from mountaintop moments, and I’ve definitely changed my behaviour as a result of them. It’s hard to point to a single, major change – it’s more a series of small course corrections, gradually finding ways to improve my life.

I wanted to share one example of these changes, to ground this discussion.

Mountaintop moments help me spot habits I should try to kick. Whenever I go on a trip, I’m more intentional about how I spend my time – I’m trying to make good use of the limited time I have in a new place. This means I end up dropping a lot of my routines, and not all of them need to come back when I’m home.

For example, I’ve wasted a lot of time with screens, mindlessly scrolling on social media. I don’t do much of that when I go away, because I’m trying to make the most of the place I’m visiting. I still use my technology, but in a more pragmatic way – for example, in Edinburgh, I opened Discord to arrange to meet friends, then closed it as soon as we agreed a time and place.

I’m trying to be more intentional about how I use digital technology, and the times when I go without it have definitely helped. I go away for a week, I stop looking at social media, and nothing bad happens. Do that enough times, and the message sinks in.

When I went to Paris, I logged out of all social media on my phone, and I never logged back in. I haven’t stopped using social media entirely, because there are still good things about it – but I’m not the Twitter completionist I used to be.

I have a complicated relationship with technology, and I’m still struggling to find the right balance. Whenever I go away, I’m reminded of how nice it feels not to fritter away my time, and it motivates me to keep trying to find a better balance in my daily life.

I think I can see small signs of progress. I think I’m getting better. But I can’t take these changes for granted – I need to remind myself of these lessons, and make sure I don’t forget what I learned on the mountaintop.

[If the formatting of this post looks odd in your feed reader, visit the original article]

create_thumbnail: create smaller versions of images

2024-08-20T17:16:50+00:00

I’ve made a new command-line tool: create_thumbnail, which creates thumbnails of images. I need image thumbnails in a lot of projects, and I wanted a single tool I could use in all of them rather than having multiple copies of the same code.

It takes three arguments:

Your original image;
The directory where you’re storing thumbnails;
The max allowed height or width of the thumbnail you want. You constrain in one dimension, and it resizes the image to fit, preserving the aspect ratio of the original image.

The tool prints the path to the newly-created thumbnail. Here are two examples:

$ create_thumbnail clever_cat.jpg --out-dir=thumbnails --width=100
./thumbnails/clever_cat.jpg

$ create_thumbnail dappy_dog.png --out-dir=thumbnails --height=250
./thumbnails/dappy_dog.png

It supports JPEG, PNG, TIFF, WEBP, and both static and animated GIFs. Thumbnails match the format of the original image, except for animated GIFs, which become MP4 movies.

How does it work?

The heavy lifting is done by the image crate and ffmpeg.

Here’s how you can use the image crate to resize an image in Rust:

let image_path = "pineapple.jpg";
let thumbnail_path = "thumbnail.jpg";

let (width, height) = (306, 204);

image::open(image_path)?
    .resize(width, height, image::imageops::FilterType::Lanczos3)
    .save(thumbnail_path)?;

The documentation for the resize() method has some good examples of the different filter options – how they affect the quality of the image and the time taken to create it. I’m using Lanczos with window 3 because it creates the best looking image (to my eyes at least), and I’m not particularly performance constrained.

Here’s how you can use ffmpeg to create an MP4 video of an animated GIF:

gif_path="animated_squares.gif"
thumbnail_path="animated_squares.mp4"

width="16"
height="16"

ffmpeg \
  -i "$gif_path" \
  -movflags faststart \
  -pix_fmt yuv420p \
  -vf "scale=$width:$height" \
  "$thumbnail_path"

My tool is a wrapper around these two snippets. It picks the correct dimensions for the final thumbnail based on the dimensions of the original image and the max width/height, then runs these two snippets to create the thumbnail.

(This turns out to be not completely trivial: something in ffmpeg requires that the width/height be even numbers. If you try to create a video with an odd width/height, the command fails to create anything. This is something I’ve rediscovered multiple times, and now it’s encoded in this tool.)

Why did I make this?

I’ve written some version of a thumbnailer in at least a dozen projects. There’s only one variable that differs between them: how big are the thumbnails? In one project I might want thumbnails that are 100 pixels wide, in another I need thumbnails that are no more than 250 pixels tall. Although imaging libraries expose lots of options, the thumbnail dimensions are the only one that I change for each project.

This is what informed the design of my new tool: you can choose the width/height of the thumbnails it creates, but nothing else. I take this approach with a lot of my scripts and tools: wrap a versatile, flexible interface with one that’s more tightly constrained and exposes the few options I use. I find this easier to use because I have less to remember on a day-to-day basis.

Creating a single, standalone tool means I can simplify all other these projects: they can just call my new tool, rather than having their own code for creating thumbnails. It also makes it easier to keep these projects up-to-date, if I ever change my preferred options for the image crate or ffmpeg.

There’s a popular “rule of three” that says if you write the same code three times, you should refactor it into a shared function. I’m pretty good at following this rule within a single project, but not so much across multiple projects. I should have created a standalone thumbnailer a long time ago, but better late than never.

This tool is primarily intended for my projects, so it may not be exactly what you’re looking for – it focuses on a single, specific task. You might prefer to start with more flexible tools like image, ImageMagick, or ffmpeg, which have more customisation to fit a wider variety of use cases.

If you do want to check out my tool, the instructions and source code are all on GitHub: https://github.com/alexwlchan/create_thumbnail

[If the formatting of this post looks odd in your feed reader, visit the original article]

Plates and states

2024-08-14T15:13:14+00:00

I was recently visiting Vermont for a work trip – my first time in the USA since I was a child. I was drawn to the license plates on passing cars, and how they look different to the cars I’m used to.

Whenever I visit new places, I enjoy looking for the tiny bits of infrastructure and design that make life different – stuff that everybody who lives there is used to, but which is novel or unusual to me. The open-plan layout of a telephone booth in Berlin. Slovenian post boxes which are bright yellow. The variety of call buttons used on pedestrian crossings.

These details are only small, but they add up to give every place a unique feel. Often they don’t get written down or recorded, because “everybody knows” how these things work, and they don’t seem remarkable to somebody who experiences them every day. It’s hard to know these things without visiting, because so few people think they’re worth sharing – and many of them are lost to time, forgotten as places change and evolve.

As I looked into US license plates, I learnt more about both American cars and America in general.

Not all states require both plates

In the UK, every car must have a number plate on both the front and back of the car. In the US, it varies by state – some states require both plates, other states only require a rear license plate.

I’d seen US cars without a front license plate in TV shows and movies, and I thought maybe it was a Hollywood thing – special cars used for filming – but no, it turns out that’s perfectly legal in 19 US states. I only realised this was a thing a few weeks ago, when I heard Casey Liss mention “front plate states” in a podcast. Until then, I just took it for granted that all cars have plates on the front and back, because that was what seemed “normal” to me.

I can see why you might prefer to have a car without a front license plate – if it’s a car with a striking design, it looks better without a plate. The aesthetics are why I thought maybe it was a specific choice for film. Google suggests cost is also a factor, although I wonder how much of a difference an extra plate actually makes.

I didn’t see any cars without any front license plates while I was in Vermont. I did see some cars with “Vermont Strong” commemorative plates, plus a few with political slogans I didn’t catch.

A bit more research suggests that the US is in a minority – almost everywhere other country requires license plates on both the front and back of the car. I wonder if these 19 US states will continue to be outliers, or if they’ll ever change to match the rest of the world.

To my British eyes, this car looks like it’s sitting on a showroom lot – even though I know it must be from a state that doesn’t require front license plates. It just feels wrong. Photo by Jason Lawrence on Flickr, used under CC BY 2.0.

Different plates for different states

In the UK and Europe, the appearance of number plates is tightly regulated – they don’t quite specify a font, but there are strict rules around letter shape, spacing, colour, and so on. When I wander round London, it’s rare for me to see a plate which doesn’t match this style.

A pair of front (white) and rear (yellow) number plates from the UK. The font style is very typical of UK plates; the country indicator on the left-hand side is a common feature of European plates. Photo by Dickelbers on Wikimedia Commons, used under CC BY 3.0.

There’s no such consistency in the USA – license plates are an explosion of colour, font, and shape. Every state has different designs, and every state has issued many different designs over history. Pretty much every license plate had the name of the state somewhere on it, but otherwise they didn’t have much in common.

A few designs that stood out to me: “In God We Trust” with an illustration of some oranges (Florida); “Live Free or Die” (New Hampshire); “Green Mountain State” (Vermont). The plates in Vermont were a distinctive shade of green with white text, which matched the verdant scenery.

I didn’t get good pictures from a moving car, but I found examples on Wikimedia Commons:

These are a few of the license plates that I saw a lot of, and stand out in my memory.

Personally I find the extra text and graphics distracting, and I think they make the license plates harder to read – but I’m not used to this style. This is my visceral reaction against something unfamiliar, not an objective assessment.

I can see how it adds a sense of personality and identity to a car, in a way that UK number plates are very generic and dull. I see the occasional bumper sticker when driving in the UK, but this is much bolder and brighter.

Does this affect people’s sense of identity or place? My car doesn’t have any marks that tie it to a particular location, and I feel no emotional attachment towards it. Does the link between their car and their home state make Americans feel more connection to either, or both? I wasn’t in the US long enough to notice anything specific, but I wonder what influence it has.

One thing I noticed is that the different plates mean that outsiders stand out. Our rental car was registered in Maine, and our black-on-white plate stood out from the sea of green Vermont plates. This feeling was exacerbated by visiting a small rural town where everyone knows each other, and I was the stranger who nobody knew. (My British accent marked me as an visitor as well.)

I didn’t realise this variety of license plate designs was a thing until I was on US roads – although I’m sure I’ve seen different plates in US media, it wasn’t until I saw dozens of cars next to each other that it jumped out at me. Standardised number plate designs are another aspect of my life that felt universal, but were really just familiar.

What’s the rate of all the plates?

While I was driving with Jessamyn around Vermont, she pointed out a couple of the more unusual license plates – she was keeping a list of out-of-state plates she saw while out driving. Because she lives in the US, she can recognise a lot of the plates from a distance – I had to squint my eyes and read the text, because it’s all new to me. (I did all this from the passenger seat, don’t worry!)

On the drive back to the airport, from Vermont to Boston, I decided to start my own list. I was surprised by how many different states I spotted – in just three hours, I saw plates from two Canadian provinces (Ontario and Quebec) and twenty-six US states.

I took this photo while waiting in traffic at the airport. I thought this would be my final list, then I got six more states within a few minutes. Airports and rental car lots are probably hotspots for out-of-state cars.

My American geography is lousy and I have no intuition for where these states actually are, so when I got home I plotted them on a map. This helped me see the patterns more clearly.

These are the states whose plates I spotted (with the states we drove through shown in green):

Drawing this map was an educational experience – although I’ve seen a map of the US states lots of times, I’d never looked at it properly.

Most of the states I saw are on the east coast, and that makes sense – we were driving in the northeast. I’d never realised how much denser the states are in the east versus the west – they really are packed in tight. Maybe this is because European colonists arrived on the east coast, and so those areas were more densely populated when the state lines were drawn?

California (far left) is a notable outlier, and I was surprised by just how many cars I saw with those plates – by the end of the trip, I could spot the red faux-handwriting quite easily. Google tells me that California has the most people of any state, which might explain why I saw so many of their cars despite being on the opposite coast.

Washington (top left) is another outlier, and I wonder if I made a mistake here – I know I saw a plate labelled “Washington”, but it turns out Washington DC has its own license plates. Since DC is on the east coast, it’s much more likely I saw a car from there than from a state on far side of the country. Unfortunately I don’t remember what the plate looked like, so I can’t be sure.

While I was in the US, I assumed it was just states who got to issue license plates. Now I know that’s wrong, I wonder if there are other exceptions. Are there any other federal districts or smaller-than-state entities who get to issue their own license plates, or is Washington DC special?

I’ve left Alaska and Hawaii off the map because I didn’t see any of their plates. It’s a long way to drive a car from Alaska, and it’s an expensive job to ship a car from Hawaii – those cars must be pretty rare in the contiguous US.

Drawing this single map has probably helped me internalise more about US geography than anything else. For the first time I’m an active participant in the map – I’m drawing my own data and looking for patterns and causes, not just passively observing data drawn by somebody else.

License plates are fractally interesting

This is only scratching the surface of the complexity of license plates. There’s so much more history, and variety, and design, all of which would be fun to research and write about if I had more time – but I have to stop somewhere.

I describe topics like this as “fractally interesting”. However deep you dig, however much you learn, there’s more to uncover.

This is why I love looking for the seemingly insignificant details in which life in foreign places differs from mine – it might seem small at first, but there’s always something interesting to learn.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Google is showing outdated results from the UK’s election

2024-07-10T08:06:34+00:00

Last week, fourteen years of Tory government came to an end with a Labour landslide. It was a rough night for every Conservative candidate, many of whom either lost their seat or saw their majorities severely diminished.

One of those Conservative candidates was Nigel Huddleston. First elected as the MP for Mid Worcestershire in 2015, he was re-elected with increased majorities in 2017 and 2019. On Thursday, he was standing in the replacement constituency of Droitwich and Evesham.

I’d never heard of Nigel until yesterday, but when one of his tweets was heavily criticised in my timeline, I decided to look him up in Google.

Screenshot of Google search results, as retrieved 8 July 2024 at 14:08 BST. No cropping.

The subtitle describes him as a “former Member of Parliament”, so I assumed he was one of the many Conservative MPs who lost their seat.

But that’s not true. He was re-elected!

Scrolling down the page further, I found two news articles about his victory, albeit with a reduced majority. (Specifically, the local papers in Droitwich and Evesham.) And through Wikipedia, I found the official results on the council website as further proof.

How many people will scroll down the page for the full story, and how many will stop at the one-line summary?

Why is this happening?

This feels like a problem. It’s tempting to chalk it up to Google’s wayward experiments with AI, but I think there’s a simpler explanation: they’re working from stale data.

There are no MPs in the run-up to a general election. Shortly after an election is called, Parliament comes to an end. The formal term is “dissolution” – every seat in the House of Commons becomes vacant and there’s no more Parliamentary business until after the election. There are no Members of Parliament, because there is no Parliament for them to be part of.

This year, Parliament was dissolved on May 30th.

MPs are given specific guidance about what to do when Parliament is dissolved, and that includes how they describe themselves on the Internet.

From Dissolution Guidance: Members standing, House of Commons, page 20. This is part of a longer section Use of the title ‘Member of Parliament’ or ‘MP’. Retrieved 10 July 2024.

Both Nigel’s Facebook page and the Parliament website were updated to reflect this change, as we can see in Google’s cached search results:

Screenshot of Google search results, as retrieved 8 July 2024.

But if you actually visit either of those pages today, they now say he’s a Member of Parliament.

Presumably, Google crawled these pages sometime after the dissolution of Parliament, saw the phrase “no longer an MP”, and updated their search summary. They haven’t recrawled the pages since Friday, so this is the newest information they have.

With this theory in mind, I looked at the Google search results of 150 other MPs who were standing for re-election and who held their seats.

Two-thirds were described as “Member of Parliament”, “politician”, or some other term which looks correct. Google’s search index is at least somewhat aware of the election results.
But the other third were described as “former Member of Parliament” – mostly backbenchers or MPs who aren’t as prominent on the national stage. And they come from all the parties – it’s not just Nigel, and it’s not some anti-Tory bias.

Put another way, that’s over 50 MPs who were incorrectly described as “former Member of Parliament” on Tuesday morning – and I only got halfway through the list.

Eventually, Google will update its data and notice that these people should now be described as “current MP”, but who knows how long that will take?

You can submit search feedback to Google, and that does tend to fix it pretty quickly. I submitted a correction for Nigel’s results, and it was fixed within the hour. I submitted corrections for two dozen more MPs, and they were fixed within the day. I included links to the results on the BBC news site as references. (That covers maybe a quarter of the affected MPs – it’s a long list. I’m working through the rest, but it’s a tedious process.)

In the 48 hours since I started writing this post, a handful of other MPs have had their labels corrected independent of my corrections – but plenty of outdated information still remains, and the lingering inaccuracy feels uncomfortable.

Why does this matter?

Many people still trust Google to provide quick and accurate information. This trust is both a strength and a responsibility, especially in an era where political misinformation runs rampant. If Google displays inaccurate or outdated information, it can mislead individuals and undermine trust in politics.

This isn’t a hypothetical issue. It came to my attention because I saw somebody on Twitter believe Google’s search results over Nigel’s own Twitter feed, where his display name is “Nigel Huddleston MP”. They thought he had to change his title, not Google.

Although the truth was available further down the page, we know that lots of people only read the one-line summary. Google designs the results page to make those summaries prominent and to allow people to find key information quickly. It saves users from clicking extra links, but it also means Google takes on the responsibility of keeping those summaries accurate.

It’s concerning that a tech giant lags behind in updating such important information, especially when volunteer-run projects can manage much more effectively. Wikipedia, for example, already has a complete list of newly-elected MPs. And pages for individual MPs have a notice indicating that election-based information may be outdated, so readers are aware of potential inaccuracies:

Screenshot of Nigel Huddleston’s Wikipedia page. Retrieved 8 July 2024.

Hopefully this will all be fixed within a few days when Google recrawls the Parliament website and updates the rest of their one-line summaries. But it’s concerning that it happened in the first place. Accurate, timely updates aren’t a nice-to-have; they’re required if we want to have informed citizens and a healthy democracy.

Given their pivotal role as an information provider, and the global sensitivity of political news, I’m surprised that Google don’t have a team whose sole purpose is to follow elections and keep their search results up-to-date as election results are announced.

When Google tells us to eat rocks and put glue on pizza, it’s annoying but fairly harmless – we all know to ignore it. When Google starts giving us outdated or incorrect information about political news, it’s more concerning – we’re less likely to know what the truth actually is.

It’s tempting to dismiss this as an overreaction, and I almost spiked this article because I didn’t think it was important enough. I’m still not sure if I’m making a fuss over nothing. I’d be less concerned if this was just a theoretical issue, but I’ve seen multiple people who looked at Google and came to the wrong conclusion.

But I think it does matter, and we shouldn’t be complacent about political misinformation. Bad actors thrive in a world where we can’t trust what we read, and we shouldn’t sweep mistakes under the carpet.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Doodling with the Mac’s command icon

2024-07-04T10:23:15+00:00

The command key (⌘) has been a ubiquitious part of the Mac for over forty years. It was chosen by legendary icon designer Susan Kare, who picked it from a symbol dictionary – this shape was already being used in Sweden to highlight an interesting feature on a map. It’s an abstract and pleasant shape, and can still be seen today on road signs and keyboards alike.

A place of interest sign somewhere in Finland (the first country to use this symbol to mark places of interest). Photographed by Santeri Viinamäki, from Wikimedia Commons, used under CC BY‑SA 4.0.

I spend a lot of time doodling shapes that look like the command icon. The original shape is quite easy to draw, and then I try variations like making the loops bigger or smaller, or changing the number of loops in the shape. While I’ve been bored and had a pen and paper, I’ve drawn dozens of little looped squares.

But part of the beauty of this symbol is the rotational symmetry, and that’s hard to capture that consistency in a free-hand sketch – at least one of my loops or lines would look different to the others. But computers are very good at drawing repetitive, symmetric shapes, so I wanted to see if I could draw the command icon and its variants in SVG.

I started with Susan Kare’s original pixel icon from System 1.0, and I tried to recreate that as an SVG path. I had to re-read the MDN tutorial on SVG arcs several times, but I did manage to write a that reproduces the original icon as a smooth curve:

This was fairly simple, because I can match the coordinates in the pixel version to the curves I’m trying to draw. The 4-way symmetry and right angles make the geometries quite simple.

What if I wanted to do something more complicated, like vary the number of loops or increase the distance between them? That’s harder to write by hand, so I wanted to do that programatically. It required a bit of trigonometry, but I was able to get some working code.

Despite this simple starting point, I was able to get quite a variety of shapes:

Keep reading to find out how I made these pictures, and how you can make your own.

The idea

One way to imagine the macOS Command icon is being made of multiple “looped hooks”.

If you take a collection of these, rotate them around a central point, and join them together, you’d get the complete shape.

There are three variables we can change:

How many sides/loops are there in the shape?
How long are the straight edges?
How big are the circular loops?

Let’s work out how to draw one of these hooks given these inputs.

Tackling the trigonometry

This next section features a walkthrough of the trigonometry involved in drawing these hook shapes. It’s not 100% rigorous, but it’s the bit I find most interesting and it would feel like cheating if I didn’t even try to explain this – but if trig isn’t your cup of tea, you can skip to the pretty pictures.

Let’s start by considering the straight line, up to the loop. This is the part I’m thinking about:

We know the length of this line (which I’ll denote $L$) and the number of sides. Let’s assume we also know the centre of rotation, which I’ll mark with a blue circle. We need to work out the coordinates of the start and end points of this line, which I’ll mark with blue dots.

The interesting variable here is $h$, because we can use it to define the start/end points of the line:

$$ \begin{align*} \text{start} &= (\text{centre}_x - L/2, \text{centre}_y - h) \\[2pt] \text{end} &= (\text{centre}_x + L/2, \text{centre}_y - h) \end{align*} $$

How can we work out the angle $\theta$? Notice that in the final shape, these line segments form a regular polygon:

The angles around a point sum to $360^\circ$, and this angle will be the same for every side (by rotational symmetry), so we can calculate $\theta$:

$$ \theta = \frac{360^\circ}{\text{number of sides}} $$

Using some right-angled triangle geometry, we can then work out the height $h$:

$$ h = \frac{L}{2\tan(\theta/2)} $$

The SVG syntax for drawing this line is then

Next, let’s think about the curved loops. We know the number of sides and the radius of each loop (denoted $r$). We need to work out where the circular arc starts and where it ends. There are two unknowns here: the angle swept out by the loop (denoted $\psi$) and the length of the straight line segment before the circular arc starts (denoted $s$).

First let’s work out the angle $\psi$. Let’s sketch out a bit more of the overall shape:

The angle I’ve marked $\varphi$ is one of the interior angles of the regular polygon created by the line segments, and we can work out the interior angle of such a polygon:

$$ \varphi = \frac{180^\circ \times \left(\text{number of sides} - 2\right)}{\text{number of sides}} = 180^\circ \left(1 - \frac{2}{\text{number of sides}}\right) $$

Then a bit more straight line geometry and the interior angles of the kite (the line segments are at right angles to the radii, to give a smooth transition from circular arc to straight line) allow us to work out the angle $\psi$:

$$ \psi = 180^\circ + \varphi = 360^\circ \left(1 - \frac{1}{\text{number of sides}}\right) $$

We can then work out the length $s$ using some trigonometry:

$$ s = r \tan\left(\frac{360^\circ - \psi}{2}\right) = r \tan\left(\frac{180^\circ}{\text{number of sides}}\right) $$

Then we can work out the centre of the circular arc:

$$ \text{centre of arc} = (\text{start of loop}_x + s, \text{start of loop}_y - r) $$

And from this we can get the start and finish points of the circular arc:

$$ \begin{align*} \text{start} &= (\text{centre of arc}_x, \text{centre of arc}_y + r) \\[2pt] \text{end} &= (\text{centre of arc}_x + r \sin\psi, \text{centre of arc}_y + r \cos\psi) \end{align*} $$

The SVG syntax for drawing this circular arc is then

The 0 1 0 are part of the SVG arc syntax. The first 0 is x-axis-rotation, which we leave at the default. Then we have to choose between the four possible arcs that connect the start/end points: in this case large-arc-flag = 1 and sweep-flag = 0 give us the arc we need.

Finally, let’s work out the centre of rotation. Everything is defined in relation to this point. Intuitively we’re going to put it in the middle of our SVG canvas, but how big should that canvas be? This depends on how large our straight lines and circular arcs are going to be, because bigger shapes need more space.

So let’s ask a different question: what’s the furthest point on the shape from the centre of rotation?

You can find this point by drawing a line from the centre of rotation to the centre of the circular arc, and then continuing until it intersects the arc:

Why is this the furthest point? Let’s consider the alternatives. If you’re at any point on the line, you can get further away from the centre by moving toward one of the ends. If you’re at the end of the line, you can get further away by moving along the circular arc. If you’re at any other point on the circular arc, you’re closer to the centre of rotation than this intersection point (using the triangle inequality).

Pythagoras’ theorem gets us the distance to the center of the circle, and then we add on the radius. This means the further point from the centre of rotation is:

$$ \text{max distance} = r + \sqrt{(L/2 + s)^2 + (h+r)^2} $$

Let’s double that and add some padding, and use that as the width/height of our SVG canvas. This should give us enough space to draw our shape.

It took a while to work this all out, and iron out my silly mistakes, but it paid off when I was implementing the final version in code. I wasn’t trying to write code and do geometry at the same time – I could just take the already solved geometry problem, and translate the solution into JavaScript. It would have been harder to get right if I was trying to implement an incomplete idea.

Projects like this are where plain text feels really limiting as a medium for source code. I’d love to be able to include formulas and small sketches to explain what I’m doing, because often they’re much clearer than a wall of text.

From equations to XML

The idea behind the maths is that we create a single loop shape, and then repeat it as many times as we need. We can replicate this idea in SVG, by defining the shape of the loop as an identified element, then reusing that by calling with a rotation transformation.

Here’s a template for what the SVG looks like:

 viewBox="0 0 {width} {height}" xmlns="http://www.w3.org/2000/svg">
  
    

     id="loopedHook"
      d="
        M ${start of line}
        L ${end of line}
        L ${start of arc}
        A ${radiusOfLoop} ${radiusOfLoop}
        0 1 0
        ${end of arc}
        L ${end of loop}
        "/>
  

   href="#loopedHook" transform="rotate({0 * angle}, {centre of rotation})" />
   href="#loopedHook" transform="rotate({1 * angle}, {centre of rotation})" />
  …
   href="#loopedHook" transform="rotate({N * angle}, {centre of rotation})" />

At last, some pretty pictures

With all the geometry and programming done, I was able to implement this in JavaScript. Drawing inspiration from another bit of Apple history, I made a web app where you can change the variables, and it redraws the shape for you. I also included a colour picker and rotation, which adds a bit more variety and visual interest.

Every time you reload the page, it draws a shape with random values. You can also adjust the values manually and it’ll draw the specified shape. Try it here.

I started by drawing a couple of shapes which look like alien versions of the Mac’s command icon:

You can create more interesting shapes when you let the loops start to overlap – by cranking down the length of the straight edges, and cranking up the radius of the loops.

They get particularly fun as you make the length of the straight edge very small:

At some point I tried setting a negative length. This makes no sense mathematically, but the computer is happy to plug it into the formulas and draw a shape anyway. Some of these still bear a resemblance to the original icon, but others are getting further away – in particular the straight edges on the outside are a new feature.

This naturally led me to consider setting a negative radius as well as a negative length. These weren’t quite as interesting – in many cases the two negatives cancelled each other out – but when I made the radius large and negative, I did get a few fun examples where the diagram has negative space in the middle:

What’s actually happening here is that the shape extends beyond the expected boundary of the canvas – there’s still a big circular symbol around it, but we’re only seeing the centre.

And finally, here’s the blooper reel for this post. It took a while to get all the maths right, and make sure I hadn’t flipped a sign somewhere or made a simple logic error. Sometimes the most interesting pictures are the ones I made by mistake:

This variety is one of the things I enjoy about generative art: I can start with a simple idea or doodle, and create a whole variety of shapes and pictures. Many of these I wouldn’t know how to draw if I started from a blank sheet of paper, but here I just stumbled into them.

If you want to create some of these shapes yourself, the web app is available at looped-squares.glitch.me. Send me your favourites!

If you like this post, you might want to check out my other generative art.

[If the formatting of this post looks odd in your feed reader, visit the original article]

emptydir: look for (nearly) empty directories and delete them

2024-06-24T13:46:58+00:00

I’ve posted a new command-line tool on GitHub: emptydir, which looks for directories which are empty or nearly empty, and deletes them.

This isn’t a completely trivial problem, because emptiness is deceptive. Consider the following folder. Finder tells us it has 0 items, so it must be empty, right?

What you can’t see is the invisible .DS_Store file – this is a file that keeps some information about how you want the folder to appear in Finder. For example, if you arrange the icons on your Desktop, their positions get stored in a .DS_Store file. If you delete the file and relaunch Finder, your Desktop icons will revert to the default grid layout.

If there are files in the folder, the .DS_Store is a useful file to keep around. If the folder is empty, it’s not worth saving.

Because I don’t care about lonely .DS_Store files, I wrote emptydir with the following rules:

If a folder is completely empty, delete it.

If the only thing in a folder is a .DS_Store file, delete the entire folder.

If there’s anything else in the folder, leave it as-is.

This means that emptydir will clean up this apparently-empty folder and the hidden .DS_Store file it contains – but leave the .DS_Store file in place for folders where I want to keep it, like the Desktop.

There are a couple of other things which I’m similarly happy to delete if they’re the only thing in a folder – .venv (my Python virtual environments) and __pycache__ (compiled Python byte code), both of which are transient folders I can easily recreate.

Why not use `find`?

Deleting empty directories isn’t a new problem. For example, there’s an answer on Unix Stack Exchange with hundreds of upvotes that suggests the following command:

find . -type d -empty -delete

This is what I used for a while, but it only deletes folders that are completely empty – it will miss folders with .DS_Store files.

You could use find to delete all the .DS_Store files also:

find . -type f -name .DS_Store -delete

but this is too aggressive – it would also delete .DS_Store in non-empty folders where I’ve set some view options that I want to keep. If I’ve taken the time to arrange my icons carefully, I don’t want to reset them!

Maybe there’s a way to do what I want with find, but I couldn’t work out how to do it.

How does it work?

I started by looking for a Rust function that could walk a directory tree and recursively find all the subdirectories – the Rust equivalent of Python’s os.walk function.

I quickly stumbled upon the walkdir crate, which provides precisely this functionality. Adapting one of the examples from the README, I was able to build a simple iterator that prints all the subdirectories of the current directory:

use walkdir::WalkDir;

let directories = WalkDir::new(".")
    .into_iter()
    .filter_map(|e| e.ok())
    .filter(|e| e.file_type().is_dir());

for dir in directories {
    println!("{}", dir.path().display());
}

// .
// ./target
// ./target/debug
// ./target/debug/.fingerprint
// …

While I was reading the documentation, I discovered the contents_first() method – when you set this to true, it yields the contents of a directory before the directory itself. And my use case is called out explicitly: “this is useful when, e.g. you want to recursively delete a directory”.

let directories = WalkDir::new(".")
    .contents_first(true)
    .into_iter()
    .filter_map(|e| e.ok())
    .filter(|e| e.file_type().is_dir());
    
// ./target/debug/.fingerprint/example-7dcfb8b698ea9da0
// ./target/debug/.fingerprint
// ./target/debug
// …

(It turns out that Python’s os.walk has a similar argument topdown, which I’d never come across before writing this Rust code. Because I’ve been using os.walk for years and I “knew” how to use it, it’s been a long time since I looked at the Python docs.)

This iterator generates every directory, but I only want to get directories which are safe to delete. How do I know if a directory is empty, or only contains files/folders which are safe to delete?

I started with a function that lists all the entries in a given directory:

use std::collections::HashSet;
use std::ffi::OsString;
use std::fs;
use std::io;
use std::path::Path;

fn get_names_in_directory(dir: &Path) -> io::Result<HashSet<OsString>> {
    let mut names = Vec::new();

    for entry in fs::read_dir(dir)? {
        let entry = entry?;
        names.push(entry.file_name());
    }

    Ok(HashSet::from_iter(names))
}

println!("{:?}", get_names_in_directory(Path::from(".")));
// Ok({"Cargo.toml", "target", ".git", "src", ".gitignore", "Cargo.lock"})

println!("{:?}", get_names_in_directory(Path::from("/dev/null")));
// Err(Os { code: 20, kind: NotADirectory, message: "Not a directory" })

This returns a HashSet because sets are easy to compare.

I can also create a set of the names I consider safe to delete if they’re the only thing in a directory:

let deletable_names = HashSet::from([
    OsString::from(".DS_Store"),
    OsString::from("__pycache__"),
    OsString::from(".venv"),
]);

Then I can compare these two sets, to tell me if a directory is same to delete:

fn can_be_deleted(path: &Path) -> bool {
    let deletable_names = HashSet::from([
        OsString::from(".DS_Store"),
        OsString::from("__pycache__"),
        OsString::from(".venv"),
    ]);

    match get_names_in_directory(path) {
        Ok(names) if names.is_empty() => true,
        Ok(names) => names.is_subset(&deletable_names),
        Err(_) => false,
    }
}

If for some reason we can’t get a list of entries in a directory, we leave it as-is – we can’t be sure that it’s safe to delete, so err on the side of caution and don’t do anything.

I can add this new function on the end of my iterator:

let directories_to_delete = WalkDir::new(".")
    .contents_first(true)
    .into_iter()
    .filter_map(|e| e.ok())
    .filter(|e| e.file_type().is_dir())
    .filter(|e| can_be_deleted(e.path()));

and then I can iterate over this filtered list, and delete any directories which are safe to delete. It prints the path as it deletes a directory, so I can see what it’s doing:

for dir in directories_to_delete {
    match fs::remove_dir_all(dir.path()) {
        Ok(_) => println!("{}", dir.path().display()),
        Err(_) => (),
    };
}

To make this into a standalone tool, I added some tests, documentation, and a basic command-line interface using the clap crate. The CLI interface allows me to choose which directory will be searched for empty directories – either the working directory, or another directory of my choice:

$ emptydir
$ emptydir /path/to/other/directory

If you want to see the full code or install it yourself, I’ve put everything on GitHub.

Why did you make this?

Beyond the fact that finding and deleting empty directories is something I do on a semi-regular basis, there are a few reasons why I made this as a standalone project and wrote this article:

I want to make my code easier to find. I have a lot of handy tools and utilities, which I used to put in my scripts repo. But that repo is a grab bag of loosely related code, there’s not much reason for anybody else to look at it, and it’s hard for them to find the useful parts if they do.

Standalone projects with a clear purpose are more discoverable than a miscellaneous bag of bits.

Explaining my code makes it better. If I take the time to write an article that explains my code in more detail, the code always gets better. I read it more carefully, and every line gets more attention than it does during normal programming. I spot parts that are tricky or confusing, and I improve them. I also gather reference links, and I often discover something new as I do – like when I learnt that Python’s os.walk has a topdown argument as I wrote this article!

This is particularly important right now, because:

I wanted to get more practice with Rust. I like Rust as a way to write fast tools, and I want to use it more often. Informal benchmarking suggests this tool is 4–12× faster than a previous Python implementation – but more than just clock speed, this new version feels much snappier. It’s approaching the threshold where it feels instantaneous.

Although I first wrote Rust in 2016, I’m still pretty much a novice. I have no experience working in large or shared Rust codebases, and a lot of my code is fragile or unidiomatic. I’m getting the speed of Rust, but not the safety.

In this project, I tried to write more idiomatic Rust, and I’m proud of the result. For example, my older code makes liberal use of unwrap(), but this project uses proper Result types. This was a nice, small, self-contained task to get some Rust practice, and I learnt a lot.

I wrote about my Python projects in some of the earliest articles on this site, and I wince at that code now. I was still a beginner, I was still learning, and my initial code was clumsy and verbose. Today I’m a confident Python programmer, and writing those articles helped me get here. I hope to do the same with Rust, albeit over a longer period.

Today, at least, I’m proud of this code and I think it’s the best Rust I’ve written so far.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Improving millions of files on Wikimedia Commons with Flickypedia Backfillr Bot →

2024-06-11T11:21:31+00:00

I’ve written a post on the Flickr Foundation blog about Flickypedia Backfillr Bot, a new bot I built last year and which has been running ever since:

Last year, we built Flickypedia, a new tool for copying photos from Flickr to Wikimedia Commons. As part of our planning, we asked for feedback on Flickr2Commons and analysed other tools. We spotted two consistent themes in the community’s responses:

Write more structured data for Flickr photos

Do a better job of detecting duplicate files

We tried to tackle both of these in Flickypedia, and initially, we were just trying to make our uploader better. Only later did we realize that we could take our work a lot further, and retroactively apply it to improve the metadata of the millions of Flickr photos already on Wikimedia Commons. At that moment, Flickypedia Backfillr Bot was born. Last week, the bot completed its millionth update, and we guesstimate we will be able to operate on another 13 million files.

The main goals of the Backfillr Bot are to improve the structured data for Flickr photos on Wikimedia Commons and to make it easier to find out which photos have been copied across. In this post, I’ll talk about what the bot does, and how it came to be.

[If the formatting of this post looks odd in your feed reader, visit the original article]

The surprising utility of a Flickr URL parser →

2024-06-06T16:14:24+00:00

As part of my work at the Flickr Foundation, I wrote a little Python library that can be used to parse Flickr URLs. For example:

$ flickr_url_parser 'https://www.flickr.com/photos/usnationalarchives/4727552068/'
{"type": "single_photo", "photo_id": "4727552068"}

This started as a simple project, and grew as I discovered more and more variants of Flickr URL. I’ve written about the library and how we’re using it in a new article on the Flickr Foundation blog.

At the heart of the project is a Python library called hyperlink. This is a URL parsing library that I first came across several years ago, when I made a few contributions to the python-hyper library. It has quite a nice API for breaking apart URLs:

>>> import hyperlink
>>> url = hyperlink.parse("https://www.flickr.com/photo_exif.gne?id=4727552068")
>>> url.host
'www.flickr.com'
>>> url.path
('photo_exif.gne',)
>>> url.fragment
''

There is a urlparse module in the standard library, but I prefer Hyperlink because of how it handles query strings. It does the work of parsing query strings and reversing any URL decoding in a single step, whereas it’s several steps with the standard library.

Compare:

>>> url = hyperlink.parse('https://example.com/?greeting=hello%20world&place=caf%c3%a9')
>>> url.query
(('greeting', 'hello world'), ('place', 'café'))
>>> url.get('place')
['café']

with:

>>> url = urllib.parse.urlparse('https://example.com/?greeting=hello%20world&place=caf%c3%a9')
>>> url.query
'greeting=hello%20world&place=caf%c3%a9'
>>> urllib.parse.parse_qs(url.query)
{'greeting': ['hello world'], 'place': ['café']}
>>> urllib.parse.parse_qs(url.query)['place']
['café']

I find the former easier to write and to read. It also has a nice API for manipulating query parameters, which I use in a lot of projects.

If you want to learn more about the flickr-url-parser library, check out the GitHub repo or my article on flickr.org.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Drawing repetitive radial artworks

2024-06-01T10:20:51+00:00

I was waiting for a meeting to start the other day, and I was idly doodling in my notebook. I’d just had a text from a friend about an upcoming trip to Ireland, and she’d sent me the four-leafed clover emoji (🍀), so I was sketching some petal-like shapes. These are a few of my doodles:

I was trying to draw various patterns that could be repeated around a central point. This is easy to imagine, but quite fiddly to do by hand. Fortunately, computers.

After my meeting, I cracked out Python and started experimenting. I wrote some scripts to generate SVG images – I’m bashing lines and curves together, and I’ve done similar stuff before.

Here’s a sample of some of the art I was able to create:

Read on to find out how I made it, and see more examples.

Polar coordinates

Normally when you draw shapes in SVG, you work in Cartesian coordinates, where points are determined by their horizontal (x) and vertical (y) distance from an origin.

Note that in SVG, the origin is the top left-hand corner of your diagram: as x increases, you move from right-to-left, and as y increases, you move from bottom-to-top. This can be confusing at first, because we’re used to y increasing in the opposite direction.

Two points in the Cartesian coordinate system used by SVG. In green, the point with horizontal distance 4 and vertical distance 2 or (4, 2). In blue, the point (1, 3).

But for drawing patterns that repeat in a circle around a point, it’s easier to use polar coordinates, an alternative coordinate system where points are determined by:

their distance from a central point (the radius)
their angle from a specified direction (the angle)

I chose to work in a polar coordinate system where angles are measured clockwise from the upwards vertical axis, like so:

Two points in a polar coordinate system. In green, the point with radius 9 and angular coordinate 40 degrees or (9, 40°). In blue, the point (6, 120°).

This choice of coordinates gives you a straightforward conversion between polar and Cartesian:

import math


def polar_to_cartesian(origin_x, origin_y, radius, angle):
    return {
        "x": origin_x + radius * math.cos(angle),
        "y": origin_y + radius * math.sin(angle),
    }

This helper function allows me to define my shapes with polar coordinates, then convert to Cartesian when I want to actually draw them in the SVG.

The sketches

First I drew “spokes” that emanate from the centre of the diagram. I picked a random number, then draw that many spokes which are equally-spaced around the centre:

I added a random offset angle, to rotate all the spokes slightly – this was to avoid creating a series of diagrams that all had the same vertical upward spoke at 0°. Notice how, for example, the four spokes in the first diagram aren’t perfectly horizontal or vertical.

Next I wanted to connect the spokes, to create something vaguely resembling flower petals. Initially I connected their ends with straight lines, creating sets of equal-sized triangles:

That first one reminded me of looking down at a spiral staircase. I went and manually added some colours to create the illusion of steps, and then I kept playing with more varieties:

These colours are static and hard-coded. I also played with adding animation, to create a basic colour spinner – there are a few rough edges and it’s a bit jerky, but if you’re interested, you can see what the animation looks like.

Returning to the line art, I wanted to replace those straight lines with something a bit more visually interesting.

I started with spiky “petals”. I allowed my script to choose randomly: would the spikes bend inward or outward, and by how much. The pictures I got back remind me of stars and flowers:

But the thing I really wanted was round petals – where each spoke would continue outwards, follow a circular arc, and come back to join the next spoke along. This involved a bit of trigonometry to work out the angles, and my first few attempts didn’t work – but I think they have a certain beauty all the same:

But eventually I got it all figured out, and I was able to reproduce my original idea: flower “petals” with circular ends. (And despite generating over 60 examples, I never got one with four parts. Whatever your medium, a four-leafed clover is a tricksy and elusive thing.)

And once I’d worked out the angles required to make a single curve work, I was able to stack them so there could be multiple arcs along the edge of each segment, like so:

I like the ones with fewer segments, so you can really see the extra arcs. The eight-segment one reminds me of a citrus fruit.

It was at this point that I noticed that all my diagrams looked quite… symmetrical. I’d pick a random starting value, and then repeat that value throughout the picture. What if I allowed even more randomness?

First I tried varying the radius of different segments. In my first version of this code, I had a bug where I didn’t join the extra-wide segments properly, leading to a rather fun “hook” effect:

After I fixed the bugs, I was able to get petals with different radii:

Then play with the angles:

It took me a couple of tries to work out how to partition 360° in a way that isn’t very lopsided – I ended up picking a random float in [0, 1] for each segment, then scaling those values up so their total was 360.

Here’s a final set of doodles, which are all “mistakes” where the code didn’t do what I was expecting, but made something pretty anyway:

There are more things I could try, like adding colour and moving the centre, but this is all I had time for. That’s okay. I was only drawing to have fun and because I wanted to make some pretty pictures, and I did both of those. I like how far I was able to get from a single idea: “what if I repeated a pattern in a circle around a central point”.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Documenting my DNS records

2024-05-25T13:21:10+00:00

I had to change some of my DNS records recently, a phrase which strikes fear into the heart of sysadmins everywhere. It all went fine, but I definitely felt like I was playing with fire.

My current domain registrar is Hover, and the only way I can manage my domains is through a web dashboard on their website:

It’s easy to make a one-off change in this dashboard, but it’s harder to manage a set of DNS records over a long period. There are two big things it’s missing:

No documentation – there’s nowhere to keep notes or comments, so I can’t write down why I created a particular record or if I still need it.
No edit history – changes are immediate, and overwrite whatever was there before. If I break something, there’s no button to rollback or revert – I’m expected to just know what the old, working configuration was, and apply that as new.

One way to get both of these would be to use an infrastructure-as-code (IaC) tool to manage my DNS records, which is how I’ve managed DNS records at multiple jobs. I could define my DNS records in code, add inline comments, and track changes in Git.

Unfortunately there are no IaC tools for Hover – it doesn’t even have a public API – so that approach is out. (If I was starting from scratch, one of the reasons I’d pick a different domain registrar is so I could use a proper IaC tool.) I could migrate my domains to another service, but that’s a big change and I’m a bit nervous doing that without any sort of safety net.

However, I’ve still found a way to add documentation and change history to my existing setup. This adds a safety net that makes me feel more comfortable making changes, and opens the door to me moving my domains elsewhere.

Getting a snapshot of my existing DNS records

This project started when I learnt about Alex Dalitz’s gem dnsruby, which lets you list DNS records in Ruby. Here’s a simple example:

require 'dnsruby'  # dnsruby (1.72.1)

dns = Dnsruby::DNS.new
records = dns.getresources('alexwlchan.net', 'TXT')

puts records.map(&:rdata_to_string)
# "v=spf1 include:spf.messagingengine.com ?all"
# "ahrefs-site-verification_c8470a858a715b78845c1b81e2dc2f7aa8b367ced4cd8d342a3986a33a03b84c"
# "google-site-verification=o3zoiEGC6aLEgPMKiyWHZcRZrutF6wHQjKqhkRvgWiQ"

You have to know exactly which domain name and record type you want to query – I don’t think there’s an easy way to get all the DNS records for a particular domain, especially if you want to include all the subdomains. This is a limitation of DNS, not the dnsruby gem.

But that’s not an issue for me, because I know what subdomains and record types I’m using – I can read them out of my web dashboard. By iterating over the possible domains and record types, I wrote a script that gets all my DNS records and saves them to a YAML file:

require 'date'
require 'yaml'

require 'dnsruby'

def get_dns_records(domain, record_type)
  dns = Dnsruby::DNS.new
  records = dns.getresources(domain, record_type)
  records.map(&:rdata_to_string).sort
end

domains_to_check = {
  'alexwlchan.net'        => ['NS', 'MX', 'A', 'TXT'],
  'books.alexwlchan.net'  => ['CNAME'],
  'social.alexwlchan.net' => ['CNAME'],
  # ...and several other domains and subdomains
}

dns_records =
  domains_to_check
    .flat_map do |domain, record_types|
      record_types.map do |rt|
        [domain, rt, get_dns_records(domain, rt)]
      end
    end

now = DateTime.now.strftime('%Y-%m-%d.%H-%M-%S')
File.write(
  "dns_records.#{now}.yml",
  dns_records
    .to_h { |domain, rt, resources| ["#{domain} #{rt}", resources] }
    .to_yaml
)

puts "dns_records.#{now}.yml"

Here’s a little snippet of the YAML it produces:

---
alexwlchan.net NS:
- ns1.hover.com.
- ns2.hover.com.
alexwlchan.net MX:
- 20 in2-smtp.messagingengine.com.
- 10 in1-smtp.messagingengine.com.
alexwlchan.net A:
- 75.2.60.5
…

This is already an improvement on what I had before – if I run this script on a schedule, I’ll have snapshots of what my DNS looked like on a particular date. I could use that to construct an edit history, and it would make it easier for me to revert a bad change. If I make a change and break something, I can look at a previous snapshot to see what working configuration I should re-apply.

And now I have my DNS records in a plaintext file, I can add comments.

Adding documentation to my DNS snapshots

I started rearranging one of these YAML snapshots, grouping similar records from different domains and adding comments to explain what they’re for. For example, I can add a comment to remind me where the IP address 75.2.60.5 comes from:

# == Netlify DNS records ==
#
# These are DNS records that allow me to use my own domains with my sites
# hosted on Netlify.
#
# See https://docs.netlify.com/domains-https/custom-domains/configure-external-dns/

alexwlchan.net A:   [ "75.2.60.5", ]
alexwlchan.com A:   [ "75.2.60.5", ]
alexwlchan.co.uk A: [ "75.2.60.5", ]

I went through the snapshot and added a comment for every DNS record – now I know why created each record. It took a while, but now I have a much better understanding of what my DNS is doing, and what’s safe to change in the future. You can read the fully-commented file on GitHub. This file is now the canonical statement of what my DNS records should be.

I wrote a second script that can compare two YAML snapshots: do my live DNS records match this canonical statement?

require 'yaml'

expected_records = YAML.load_file(ARGV[0])
actual_records   = YAML.load_file(ARGV[1])

if expected_records == actual_records
  puts 'The DNS records match 🥳'
  exit 0
else
  puts "The DNS records don't match! 😱"

  (expected_records.keys + actual_records.keys).uniq.each do |k|
    next unless expected_records[k] != actual_records[k]

    puts "#{k}:"
    puts " - expected: #{expected_records[k].inspect}"
    puts " - actual:   #{actual_records[k].inspect}"
  end

  exit 1
end

Here’s the output:

$ ruby compare_dns_records.rb dns_records.yml dns_records.good.yml
The DNS records match 🥳

$ ruby compare_dns_records.rb dns_records.yml dns_records.bad.yml
The DNS records don't match! 😱
alexwlchan.net A:
 - expected: ["75.2.60.5"]
 - actual:   ["57.2.60.5"]

In the first case, all my DNS records are configured correctly. In the second case, I’ve typo’d 75 as 57 – now I know that I need to go and fix something in my Hover dashboard.

It can’t actually fix the mistake, only tell me that something’s wrong – but this is much better than what I had before.

An infrastructure-as-code future

I’m going to leave my DNS records in Hover for now, but these scripts have also give me ideas for how I might migrate out of Hover, if I ever decide to do so. One of the tricky parts is replicating all my existing DNS records in a new service – how do I know I’ve done that correctly?

Fortunately, dnsruby is very flexible. Currently the nameserver for alexwlchan.net is at Hover, and their nameserver is ns1.hover.com. When you do a DNS lookup for my domain, it asks ns1.hover.com for the DNS records.

But I can tell dnsruby to ignore that, and to ask Linode’s nameserver instead:

require 'dnsruby'

dns = Dnsruby::DNS.new({:nameserver => ["ns1.linode.com"]})
records = dns.getresources('alexwlchan.net', 'TXT')
puts records.map(&:rdata_to_string).inspect
# []

I feel like this could give me more reassurance when I copy DNS records between providers. First, I copy my existing DNS records into the new provider. Then, I use dnsruby to get snapshots of the DNS records being served by my old/new provider’s nameservers. Finally, I compare the two snapshots to check they match.

Crucially, I could do this before I switch the domain to the new provider’s nameservers. This gives me time to test, to iterate, to fix silly mistakes, and I can do so at a relaxed pace without worrying if my site/email are down.

Conclusion

You can see the complete code on GitHub.

These two scripts allow me to do regular checks of my DNS. I have them set to run as a daily job in GitHub Actions. First, I create a snapshot of my live DNS records. Then, I compare those records to the canonical statement of what I expect my DNS to be. If the two have diverged, the job will fail and I’ll get an alert, and I’ll go to investigate.

I can also run the check on demand, if I’m actively making changes.

This doesn’t change anything in Hover or the way I manage my DNS records, but it’s done wonders for my peace of mind. I now have some written documentation about all of my DNS records are for, and I have an edit history so I can easily revert any breaking changes.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Preserving pixels in Paris

2024-05-23T20:52:42+00:00

Last month, I was in Paris for the IIPC Web Archiving Conference, a two-day event to discuss the preservation of websites and social media. It was my first time attending, and I was there with both a professional and a personal interest. This post has some thoughts and photos from the trip.

I met a lot of smart people, and we had some thoughtful conversations about the challenges of web archiving. Most of my experience is limited to small scale projects – single sites or personal archives – and it was good to hear more about how large organisations are trying to preserve this ever-growing chunk of our digital heritage. I tend to save specific, targeted chunks of the web, whereas larger archives try to capture as much as they can.

Big tech was a running theme. An increasing chunk of the web is consolidating onto large, commercially-owned social media platforms. A few years ago, those companies would have freely-available APIs for downloading content en masse, but no more – they’re all locked down now, and several speakers talked about how their ability to archive social media has been severely curtailed.

AI was also a hot topic, another product of big tech. As web archives grow and grow and grow, you need a way to search their contents and find the useful stuff. Multiple people showed off experiments using AI to make these large archives more tractable, like using AI to add more metadata to catalogue records, or creating automated summaries of archived TV footage.

A lot of the focus was on saving web content and creating large web archives, but less on what we’ll actually do with these archives. Who’s using web archives? What do they need? What research do these web archives enable? These feel like important questions to answer, if we want to create useful resources. I’d love to see more of a focus on that at future events.

With those reflections done, here are a few photos. Paris is a very photogenic city.

The conference took place at the Bibliothèque nationale de France, which is split across two sites. The drinks reception was at the Richelieu site, the home of the library for nearly 300 years. It includes a museum, which we got to look around before the reception:

This is the Galerie Mazarin, named after the seventeenth-century cardinal of the same name. In the foreground is a celestial globe made by Vincenzo Coronelli, an Italian cartographer well known for his maps and globes. There were several other maps in the gallery, although I couldn’t always tell where they were a map of.

This unusual geometric design is the work of the German artist Lorenz Stöer, one of a series of eleven illustrations he published in Geometria et Perspectiva in 1567.

The reception itself took place in the Richelieu’s salle Oval, a grand reading space that blends a historical design with modern library conveniences. When you think of big and impressive libraries, this is the sort of room you think of:

There are sixteen circular windows around the ceiling, and above each is the name of a city famous for its contributions to civilisation and libraries. All the windows filled the room with natural light, so it felt warm and open, even though it was already evening when I took this photo. I bet it looks even better in the summer.

Even in a room this photogenic, the speaker had little trouble getting everyone's attention.

One of the nice things about going to a conference in France is that their clocks are only an hour ahead of the UK – close enough that I can sleep on my normal schedule, and get some extra time in the morning. Before the first day of talks started properly, I went for a bit of a wander.

I stumbled upon a lovely little park near the hotel:

The talks themselves took place at the François-Mitterrand site, which was built in the 1990s. It’s a rather charming set of buildings, framed by four right-angled shaped towers at each corner of the site. The base of the towers are joined by a rectangular building, and the middle of the rectangle is filled with a dense wooded area.

I think the wooded area might be closed to the public – I never saw any directions for it, nor did I see anybody wandering around down there. But given the amount of greenery elsewhere in Paris, I don’t think anyone is missing out.

The conference dinner was in the top of one of the towers, and looking down is what helped me understand the layout of the site – it was a bit confusing from ground level.

You had to cross an interior bridge to get from the main library area to the auditorium where the talks took place, and the space in between felt reminiscent of The Backrooms. I was simultaneously fascinated and vaguely creeped out, and I’m not sure whether to be glad or disappointed that I never found out how to get down to the area below.

Where do you think those escalators go?

On one of the days we left the building for lunch, and stumbled across a window display full of old cameras and video tapes. It felt strangely appropriate, given the topic of the conference:

On my final evening, I went for a dusk walk around Paris. I enjoy walking around new cities as a way to get a feel of the place, and you see a very different city at twilight than at midday.

I love railway stations at night…

…and the elaborate cast iron entrances to the Paris Metro are really something.

Most people would be disappointed that their hotel room overlooked a large rail yard, but I am not most people.

I promise I saw some things that weren’t transport related, like this fountain! This is one of the things I enjoy about being out when it gets dark – all the fun coloured lights you can’t see during the daytime.

Towards the end of my walk, I walked past some sort of outdoor area with food stalls, upbeat music, and a bunch of people having a good time. I walked past because I was excited to get home to bed, but I’m glad they were enjoying themselves and I got to soak up their vibes.

And finally, it wouldn’t be a collection of photos from Paris without at least a passing mention of food. I’m not much of a foodie, but I did enjoy what I ate on this trip. This is the delicious croque monsieur I had for my final lunch:

I was only there for about four days – long enough for the conference and a day or so of leisure, short enough that I only needed a small bag. It was a good length of trip, a refreshing break, and travel was easy thanks to the Eurostar.

Every time I visit Paris I’m reminded of how convenient it is to get to from London, and I think I should visit more often. (And all of continental Europe; there are good onward trains from Paris.)

I hope I can visit Paris again soon.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Creating a Safari webarchive from the command line

2024-05-17T09:12:01+00:00

Recently I’ve been trying to create a local archive of my bookmarked web pages. I already have tools to take screenshots, and I love them as a way to take quick snapshots and skim the history of a site, but bitmap images aren’t a great archival representation of a website. What if I also want to save the HTML, CSS, and JavaScript, and keep an interactive copy of the page?

There are lots of tools in this space; for my personal stuff I’ve come to like Safari webarchives. There are several reasons I find them appealing:

Each saved web page is stored as a single file. Each file includes the entire content of the page, and a single file per web page is pretty manageable. I can create backups, keep multiple copies, and so on.
I can easily add pages to my archive that can’t be crawled from the public web. Lots of the modern web is locked behind paywalls, login screens, interstitial modals that are difficult for automated crawlers to get past. It’s much easier for me to get through them as a human using Safari as my default browser. Once I have a page open, I can save it as a webarchive with the File > Save As… menu item.
The archive can be stored locally and offline. It will always remain available to me, as long as I keep up with backups and maintenance, and I can archive private web pages that I don’t want to put in somebody else’s archive. (For example, I wouldn’t want to save private tweets in the publicly available Wayback Machine.)
I can read the format without Safari. Although Safari is only maintained by Apple, the Safari webarchive format can be read by non-Apple tools – it’s a binary property list that stores the raw bytes of the original files. I’m comfortable that I’ll be able to open these archives for a while, even if Safari unexpectedly goes away.

The one thing that’s missing is a way to create webarchive files programatically. Although I could open each page and save it in Safari individually, I have about 6000 bookmarks – I’d like a way to automate this process.

I was able to write a short script in Swift that does this for me. In the rest of this article I’ll explain how it works, or you can skip to the GitHub repo.

Prior art: newzealandpaul/webarchiver

I found an existing tool for creating Safari webarchives on the command line, written by newzealandpaul.

I did some brief testing and it seems to work okay, but I had a few issues. The error messages aren’t very helpful – some of my bookmarks failed to save with an error like “invalid URL”, even though the URL opens just fine. I went to read the code to work out what was happening, but it’s written in Objective‑C and uses deprecated classes like WebView and WebArchive.

Given that it’s only about 350 lines, I wanted to see if I could rewrite it using Swift and the newest classes. I thought that might be easier than trying to understand a language and classes that I’m not super familiar with.

Playing with `WKWebView` and `createWebArchiveData`

It didn’t take much googling to learn that WebView has been replaced by WKWebView, and that class has a method createWebArchiveData which “creates a web archive of the web view’s current contents asynchronously”. Perfect!

I watched a WWDC session by Brady Eison, a WebKit engineer, where the createWebArchiveData API was introduced. It gave me some useful context about the purpose of WKWebView – it’s for showing web content inside Mac and iOS apps. If you’ve ever used an in-app browser, there was probably an instance of WKWebView somewhere underneath.

The session included some sample code for using this API, which I fashioned into an initial script:

import WebKit

let url = URL(string: "https://example.com/")
let savePath = URL(fileURLWithPath: "example.webarchive")

let webView = WKWebView()
let request = URLRequest(url: url!)

webView.load(request)

// https://developer.apple.com/videos/play/wwdc2020/10188/?time=1327
webView.createWebArchiveData(completionHandler: { result in
  do {
    let data = try result.get()
    try data.write(to: savePath)
  } catch {
    print("Unable to save webarchive file: \(error.localizedDescription)")
  }
})

I saved this code as create_webarchive.swift, and ran it on the command line:

$ swift create_webarchive.swift

I was hoping that this would load https://example.com/, and save a webarchive of the page to example.webarchive. The script did run, but it only created an empty file.

I did a little debugging, and I realised that my WKWebView was never actually loading the web page. I pointed it at a local web server, and I could see it wasn’t fetching any data. Hmm.

We need a loop-de-loop

Using a WKWebView inside a Swift script isn’t how it’s normally used. Most of the time, it appears as part of a web browser inside a Mac or iOS app. In that context, you don’t want fetching web pages to be a blocking operation – you want the rest of the app to remain responsive and usable, and download the web page as a background operation.

This made me wonder if my problem was that my script doesn’t have “background operations”. When I ask WKWebView to load my page, it’s getting shoved in a queue of background tasks, but nothing is picking up work from that queue. I don’t fully understand what I did next, but I think I’ve got the gist of the problem.

I had another look at newzealandpaul’s code, and I found some lines that look a bit like they’re solving the same problem. I think the NSRunLoop is doing work that’s on that background queue, and it’s waiting until the page has finished loading:

// Wait until the site has finished loading.
NSRunLoop *currentRunLoop = [NSRunLoop currentRunLoop];
NSTimeInterval resolution = _localResourceLoadingOnly ? 0.1 : 0.01;
BOOL isRunning = YES;

while (isRunning && _finishedLoading == NO) {
  NSDate *next = [NSDate dateWithTimeIntervalSinceNow:resolution];
  isRunning = [currentRunLoop runMode:NSDefaultRunLoopMode beforeDate:next];
}

I was able to adapt this idea for my Swift script, using RunLoop.main.run(). I can track the progress of WKWebView with the isLoading attribute, so I kept running the main loop for short intervals until I could see this attribute change. I realised that createWebArchiveData is also an asynchronous operation that runs in the background, so I need to wait for that to finish too.

I added these two functions to WKWebView. Here’s my updated script:

import WebKit

let urlString = "https://www.example.com"
let savePath = URL(fileURLWithPath: "example.webarchive")

extension WKWebView {

  /// Load the given URL in the web view.
  ///
  /// This method will block until the URL has finished loading.
  func load(_ urlString: String) {
    if let url = URL(string: urlString) {
      let request = URLRequest(url: url)
      self.load(request)

      while (self.isLoading) {
        RunLoop.main.run(until: Date(timeIntervalSinceNow: 0.1))
      }
    } else {
      fputs("Unable to use \(urlString) as a URL\n", stderr)
      exit(1)
    }
  }

  /// Save a copy of the web view's contents as a webarchive file.
  ///
  /// This method will block until the webarchive has been saved,
  /// or the save has failed for some reason.
  func saveAsWebArchive(savePath: URL) {
    var isSaving = true

    self.createWebArchiveData(completionHandler: { result in
      do {
        let data = try result.get()
        try data.write(to: savePath)
        isSaving = false
      } catch {
        fputs("Unable to save webarchive file: \(error.localizedDescription)\n", stderr)
        exit(1)
      }
    })

    while (isSaving) {
      RunLoop.main.run(until: Date(timeIntervalSinceNow: 0.1))
    }
  }
}

let webView = WKWebView()

webView.load(urlString)
webView.saveAsWebArchive(savePath: savePath)

This works, but there’s a fairly glaring hole – it will archive whatever got loaded into the web view, even if the page wasn’t loaded successfully. Let’s fix that next.

Checking the page loaded successfully with `WKNavigationDelegate`

If there’s some error getting the page – say, my Internet connection is down or the remote server doesn’t respond – the WKWebView will still complete loading and set isLoading = false. My code will then proceed to archive the error page, which is unhelpful. I’d rather the script threw an error, and prompted me to investigate.

While I was reading more about WKWebView, I came across the WKNavigationDelegate protocol. If you implement this protocol, you can track the progress of a page load, and get detailed events like “the page has started to load” and “the page failed to load with an error”.

There are two methods you can implement, which will be called if an error at different times during page load. Because I’m working in a standalone script, I just have them print an error and then terminate the process – I don’t need more sophisticated error handling than that.

I also wrote a method that checks the HTTP status code of the response, and terminates the script if it’s not an HTTP 200 OK. This means that 404 pages and server errors won’t be automatically archived – I can do that manually in Safari if I think they’re really important.

Here’s the delegate I wrote:

/// Print an error message and terminate the process if there are
/// any errors while loading a page.
class ExitOnFailureDelegate: NSObject, WKNavigationDelegate {
  var urlString: String

  init(_ urlString: String) {
    self.urlString = urlString
  }

  func webView(
    _: WKWebView,
    didFail: WKNavigation!,
    withError error: Error
  ) {
    fputs("Failed to load \(self.urlString): \(error.localizedDescription)\n", stderr)
    exit(1)
  }

  func webView(
    _: WKWebView,
    didFailProvisionalNavigation: WKNavigation!,
    withError error: Error
  ) {
    fputs("Failed to load \(self.urlString): \(error.localizedDescription)\n", stderr)
    exit(1)
  }

  func webView(
    _: WKWebView,
    decidePolicyFor navigationResponse: WKNavigationResponse,
    decisionHandler: (WKNavigationResponsePolicy) -> Void
  ) {
    if let httpUrlResponse = (navigationResponse.response as? HTTPURLResponse) {
      if httpUrlResponse.statusCode != 200 {
        fputs("Failed to load \(self.urlString): got status code \(httpUrlResponse.statusCode)\n", stderr)
        exit(1)
      }
    }

    decisionHandler(.allow)
  }
}

let webView = WKWebView()

let delegate = ExitOnFailureDelegate()
webView.navigationDelegate = delegate

To check this error handling worked correctly, I tried loading a website while I was offline, loading a URL with a domain name that doesn’t have DNS, and loading a page that 404s on my own website. All three failed as I want:

$ swift create_webarchive.swift
Failed to load web page: The Internet connection appears to be offline.

$ swift create_webarchive.swift
Failed to load web page: A server with the specified hostname could not be found.

$ swift create_webarchive.swift
Failed to load web page: got status code 404

Adding some command-line arguments

Right now the URL string and save location are both hard-coded; I wanted to make them command-line arguments. I can do this by inspecting CommandLine.arguments:

guard CommandLine.arguments.count == 3 else {
  print("Usage: \(CommandLine.arguments[0])  ")
  exit(1)
}

let urlString = CommandLine.arguments[1]
let savePath = URL(fileURLWithPath: CommandLine.arguments[2])

And then I can call the script with my two arguments:

$ swift create_webarchive.swift "https://www.example.com/" "example.webarchive"

For more complex command-line interfaces, Apple has an open-source ArgumentParser library, but I’m not sure how I’d use that in a standalone script.

Running it over my bookmarks

Once I’d written the initial version of this script and put all the pieces together, I used it to create webarchives for 6000 or so bookmarks in my Pinboard account. It worked pretty well, and captured 85% of my bookmarks – the remaining 15% are broken due to link rot. I did a spot check of a few dozen archives that did get saved, and they all look good.

My script worked correctly in the happy path, but I went back and improved some of the error messages. I saw a lot of different failures when archiving such a wide variety of URLs, including esoteric HTTP status codes, expired TLS certificates, and a couple of redirect loops. Now those errors are reported in a bit more detail and not just “something went wrong”.

I also tweaked the code so it won’t replace an existing webarchive file. I do this by adding .withoutOverwriting to my write() call. I don’t want to risk overwriting a known-good archive of a page with a copy that’s now broken.

The finished script

I’ve put the script in a new GitHub repository: alexwlchan/safari-webarchiver. This repo will be the canonical home for this code, and I’ll post any updates there.

It includes the final copy of the code in this post, a small collection of tests, and some instructions on how to download and use the finished script.

[If the formatting of this post looks odd in your feed reader, visit the original article]

What comes after AWS?

2024-05-14T14:39:11+00:00

James Governor posed some interesting questions yesterday:

Grumble Bundle @monkchips

what are the core primitives developers need for building and deploying modern applications? what platform services does the underlying infrastructure need to provide?

what's essential and what's nice to have? what might an AWS alternative look like?

2:56 PM · 13 May 2024

I started writing a reply, but it was too long for Twitter, so I’m posting it here.

Whatever displaces AWS (and similar public clouds) as a dominant force in computing has to be significantly better in some key area. It’s not enough to offer a marginal improvement – “like AWS but 3% better” isn’t going to win anyone over. You need some major advantage that provides the activation energy to justify the expense of switching away from AWS.

This is why we see lots of stories of migrations between on-prem and public cloud, but not as many between different clouds. There are major differences between on-prem and public cloud, and one of those differences might offer an advantage that justifies the cost of moving (in either direction). But the big public clouds are similar enough that there are less advantages to switching between them.

So what might be the big improvement that draws people away from AWS? Here are some guesses:

Turnkey solutions for common problems

AWS has a lot of services. That flexibility is great, but it comes with challenges.

It’s not always obvious which service is best for a given task, and a lot of common use cases require combining core primitives in architecture diagrams that resemble Rube Goldberg machines. These complex setups put a lot of responsibility on the developer to get everything right, and open the door to subtle bugs. For example, setting up your IAM permissions so services can talk to each other is quite fiddly, and it’s tempting to write a blanket “allow all” policy and be done with it.

I think there’s an opportunity for higher-level services that streamline this experience. Combine these core primitives and present a nicer interface for common use cases – static sites, web apps, data pipelines. Developers give up some flexibility, but in return they get a simpler workflow, better guard rails, and the peace of mind that they haven’t left a ticking time bomb somewhere in their config.

I already see some services in this area – Vercel, Glitch, and Netlify spring to mind – and I’m sure there more. They won’t replace all use of AWS, and they’re not trying to, but I think they can carve off big chunks if they’re lucky.

Closer to the Edge

There are lots of advantages to running your code close to users – as in, geographically close. At some point all our devices are powerful enough that the network speed is the bottleneck for an app, and over long enough distances geographic latency becomes an issue.

AWS is still very tied to the “region” model – your code runs in data centres in a specific geographic location. A lot of requests still need to be routed back to that location before they can be handled, even if that’s halfway round the world. It’s non-trivial to run your app across multiple regions.

There are a few AWS services that have a global presence, like CloudFront and CloudFront Functions, but there are other platforms that run in the edge by default, not tied to specific locations. If this becomes more important, I think AWS might struggle to catch up compared to newer players.

I wrote those three paragraphs in January 2022, long before the current wave of generative AI and Large Language Models arrived. At the time I thought there might be applications that would benefit from running in an edge network, but I wasn’t sure what they were. I wonder if AI and LLMs will be those applications – they’re definitely trying to be as fast as possible.

I haven’t followed the field closely, but it feels like LLM latency is still defined by the speed of the model rather than the network connection. Even if you’re running a model locally, you’re still waiting multiple seconds for results. How long will that last? How long before models are fast enough that there’s a perceptible benefit to running them close to your users?

More presence in the Global South

Like a lot of the tech industry, AWS is very US-centric. There are more AWS regions on the west coast of the USA than in all of South America and Africa. But there are a lot of users and developers in the rest of the world, and it feels like they’re underserved by a tech industry which is based in the Global North. (Which assumes, for example, reliable access to cutting-edge hardware and high-bandwidth Internet connections as standard.)

I wonder what infrastructure that envisions the Global South as your primary users would look like. What if Silicon Valley isn’t the centre of your universe? Is it just the same AWS, but at a lower latitude? Or are there meaningful differences in how you build services for people who live in that part of the world?

(This is only a vague thought because I’m not an expert on politics, socioeconomics, or the tech industry in the Global South. I don’t know what might happen here, but I suspect it’s not nothing.)

A Free Tier that’s actually free

There are plenty of horror stories about devs who set up an account on the so-called “Free Tier”, and then receive a life-ruining bill. Even if AWS forgives those charges, it has a chilling effect and deters people from trying the platform. (I don’t have a personal AWS account for precisely this reason.)

The free tier isn’t about making money – it’s about cultivating a community of users who are experienced and comfortable with your platform. When those users later have money to spend, you hope they’ll pick you as the thing they already know. But that can’t happen if they’re scared off before they can try the free tier.

Making a better free tier is a long-term play for a competitor, and risky – you have to survive long enough for your free tier users to become paying customers. But I think it could work, and you could win some business by speaking to people who aren’t CTOs yet, but will be in a couple of years.

Better bills for big business

Individual users want to get the most bang for their buck; companies have different priorities. They want something that’s predictable, which can fit in their accounting spreadsheets.

It’s really hard to predict your AWS bill, because it follows a “pay-as-you-go” model and you can use different amounts each month. At my last job, we’d routinely see a 5–10% change in our month-to-month bill. That wasn’t a big issue for us, but it could be an issue for somebody trying to do plan tight budgets or estimate future spending.

Even when you get the bill, it can be hard to understand where the money is going. There are consultancies whose sole purpose is helping companies understand the cryptic crossword that is an AWS billing statement.

Predictable and comprehensible billing feels like it would be appealing to a lot of businesses. I don’t think it’d be the sole reason to switch to a competitor, but it would sweeten the deal.

The power of politics

Public sentiment towards big tech companies is changing, and not in a positive way. I can imagine new legislation that would hamstring AWS and other big public clouds, and drive a shift towards a larger set of small clouds, for example:

Concerns about the environmental impact of large data centres might lead to restrictions on their construction or expansion.
Antitrust or anti-monopoly lawsuits might force AWS to make changes which make their product worse, and leave an opening for a competitor.
More data locality laws that force companies to keep data inside national boundaries could lock AWS out of certain contracts, and drive business towards smaller, local clouds.

I don’t know how likely any of these are, but they seem more plausible now than they did, say, five years ago. I can imagine a future where AWS comes under more political scrutiny.

Or maybe something completely different

Take all this with a pinch of salt. I’m no expert, and this is light-hearted speculation rather than in-depth analysis.

When a company comes along that displaces AWS, it could do all of these, or none of them. There are plenty of other strategies that smarter people than me have thought of.

I do think it’s “when”, not “if”. We may think of AWS as an institution now, but one day it will be relegated to the likes of IBM and Oracle – tech companies that still exist, but with a fraction of the power and influence they had at their zenith. The bigger a company gets, the more work it takes to change direction – and eventually a smaller, more agile competitor will replace you.

At a previous job I was designing a digital archive that was stored in AWS, and that archive is expected to last decades, if not centuries. That’s why I was thinking about the long-term future of AWS – that archive probably will outlast public cloud as a dominant model, and we designed the archive to make it easy to exit when that day comes. This article has sat in my drafts for years, and I’m glad James’s tweet finally prompted me to finish it.

I don’t think AWS is going away any time soon, and if you’re not an Amazon executive you probably don’t need to think about it. We should all be more concerned with our own longevity, not AWS.

[If the formatting of this post looks odd in your feed reader, visit the original article]

What is psephology?

2024-05-03T09:11:21+00:00

Yesterday there were local elections in the UK, and this morning I’ve been catching up on the news. As I was reading Yohannes Lowe’s live coverage in the Guardian, I spotted a word I didn’t recognise (emphasis mine):

Labour and the Conservatives are each defending about 1,000 seats, and psephologists predict that the Tories may lose about 500.

A quick trip to Google cleared up my initial confusion: psephologists are experts in psephology, which is the scientific study of elections and voting behaviour. They analyse data from past elections, study voting behaviour, and try to forecast the result of future elections. I’ve seen experts do this sort of analysis in news coverage of elections, but I never knew there was a word for it!

I could have guessed the definition of “psephologist” from the context, but not if I saw it in isolation. Although I recognise the -ology suffix as “the study of something”, I didn’t recognise the pseph- prefix at all, and I was curious where it came from.

Voting and counting in ancient Greece

The Wikipedia article for psephology explains that pseph- comes from the Greek word for pebble, because pebbles were used for voting in ancient Greece. It refers to an article by Annelisa Stephan:

Voting with pebbles? Even allowing for artistic license, it seems the Greeks really did it this way. Voters deposited a pebble into one of two urns to mark their choice; after voting, the urns were emptied onto counting boards for tabulation. […] In ancient Greece a pebble was called a psephos, which gives us the dubious term psephology, the scientific study of elections.

I also found a paper by Alan Boegehold Towards a Study of Athenian Voting Procedure, which describes the use of pebbles when counting votes in ancient Greece. It talks about voting using special bronze discs as ballots, and various mechanisms for keeping ballots secret.

This includes an early example of ballot stuffing, when “thirty men somehow dropped a total of more than sixty ballots into [the urn]”.

He also describes a scene from Greek vase paintings, in which men cast votes for Ajax and Odysseus using piles of pebbles. One notable aspect is that there doesn’t seem to be any secrecy involved in the voting – we can see who two of the men are voting for.

Extract from the bottom of a red-figured kylix. Photo from the J. Paul Getty Museum website, 86.AE.286. Public domain.

Athena stands behind a low table (or altar?) which men approach from right and left. They are about to vote; two, in fact, are in the act of voting. That the others have already voted is clear from the two small piles of pebbles on the altar. The pebbles to the left appear to be about double the number of those on the right. They represent votes cast for Odysseus, the victor, while those to the right, fewer, have been cast for Ajax. The two heroes themselves appear at the extreme right and left. Athena, gesturing gracefully with her right hand, certifies Odysseus' victory, although the voting is not yet over.

This feels familiar from modern politics, where the result of an election is often known before the counting is complete. (Sorry Ajax.)

It’s not just for voting

Boegehold also describes the use of pebbles as a way to count things other than votes, when fingers were insufficient. Mostly he’s talking about ancient Greece, but he also mentions pebbles being sealed inside clay tablets as a form of bookkeeping in Mesopotamia. They were placed inside a tablet with an inscription of a sheep (or goat), and the pebbles counted the animals involved.

I came across a similar idea last year while reading It All Adds Up, by Mickaël Launay. I forget the exact time period, but it was early in human civilisation when writing was still uncommon. Somebody owns a flock of sheep, and they send a shepherd out to tend them during the day. They want to know the same number of sheep came back – how do they count them?

They came up with a series of tokens, each with a picture of a sheep. You put them in a jar, and count them to match the sheep who came back. But does the shepherd trust the owner? How do they know the owner won’t add tokens?

So they’d put the tokens in a sealed spherical jar, and both of them would sign it. But how do you know the size of the flock without opening the jar? They’d put marks on the side to count the size – but now why do you need the tokens in the jar at all?

This is an early versions of numbers on “paper”, a more abstract form of counting.

Other uses of the pseph- prefix

I’d never heard of the Greek word psephos; the closest I knew was lithos, for stone. I know lots of English words that use litho- as a root, my favourite being the somewhat whimsical lithobraking. (Stopping your spaceship by ploughing it into a rock.)

By contrast, there only seem to be less English words that use pseph- as a root, and I don’t recognise any of them. Two of the words suggested by Wikipedia are literally about pebbles (psephite/psephitic describes sediment made of pebble-sized fragments), and a third is about their use in counting (isopsephy is adding up the number values of the letters to get a single number).

I was a bit confused by isopsephy initially, but it makes more sense after reading about how pebbles were a tool for counting lots of things, not just votes. That looks like it could be a fun Wikipedia rabbit hole about word counting methods that are a bit like Scrabble, including a similar Hebrew practice called gematria, but I really have to stop now.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Taking regular screenshots of my website

2024-04-23T08:16:35+00:00

A few weeks ago I was reading the DPC Bit List, an inventory of digital materials and the risks associated with their long-term preservation. What formats need urgent attention before they’re lost forever, what mediums are already being well-preserved, and so on. For example, Adobe Flash animations are “practically extinct”, while emails are merely “endangered”.

Something that struck me is how the Bit List treats the content and interface of online services as two separate concerns. Most preservation efforts focus on saving the content – the photos, videos, and text that we upload to the web. We don’t have as many records of the interfaces – the “look and feel” of these sites. But if you only save one and not the other, you’re losing a lot of important context about how we use those sites, and the influence of their designs.

One present-day example is TikTok. It popularised the use of vertical swiping to move between videos, and that design makes it easy for users to watch a continuous stream of content. It’s very effective at keeping people in the app, and it’s been copied by lots of other services. Although it seems well-known and obvious to us today, it’ll be easy to forget the novelty and impact of this idea as social media continues to evolve.

This user interface is behind concerns about the addictive nature of TikTok, because it requires minimal effort to stay in the app and watch another video. It’s also affected how videos are made, because creators need to capture your attention quickly before you swipe to watch something else. Popular videos on TikTok look different to those on YouTube, on television, or in cinema.

You can’t understand TikTok or its effect on the world without understanding this interface. Watching a single TikTok video in an isolated player isn’t the same as experiencing it in the app.

This distinction between content and interfaces got me thinking about how you preserve user interfaces. One challenge is that most user interfaces don’t have a single version. Designs are constantly changing as companies add features, fix bugs, and try to find new ways to get more of our attention. Even if you put aside the technical issues, we can only really preserve snapshots of how a service looked at a particular point in time.

One way to create these snapshots is with screenshots or screen recordings. I think they represent a good tradeoff of effort and preservation value. A static screenshot isn’t as complete as a fully interactive, working copy of a thing, but it’s much easier to create, to preserve, and to access later. It records the look, if not the feel.

I was mulling over this for a while, and I had an idea. I like having a set of screenshots as a visual history of the stuff I’ve worked on (including this site), but I’m not very good at remembering to create them. I just finished a bunch of design tweaking on this site, and I completely forgot to take a screenshot of what it looked like before I started making changes.

Computers are pretty good at doing things on a repetitive schedule – wouldn’t it be nice if I could automate taking these screenshots? What if a computer took a screenshot of my website every month? Or every week? Or every day?

I already had a vague idea of how to take screenshots programatically. At my last job we used Playwright as a way to do end-to-end testing of our websites. Playwright is a library for automating web browsers – for example, you can use it to open a website in Chromium, click buttons, check the page loads correctly, and so on.

It can also take a screenshot of a web page, like so:

$ npm install playwright
$ npx playwright install chromium
$ npx playwright screenshot --full-page "alexwlchan.net" "screenshot.png"

This installs Playwright, then opens my website in Chromium and takes a screenshot of the page. The --full-page flag ensures the image contains the entire scrollable page, as if you had a tall screen and could fit the whole page in view without scrolling.

Once I knew how to take a screenshot once, I wanted to do it on a regular schedule, and save those images somewhere. There are lots of ways to run code on a schedule; I decided to use GitHub Actions because it’s what I’m familiar with.

My code for taking scheduled screenshots is entirely contained in a single GitHub Actions workflow. It’s in a file called .github/workflows/take_screenshots.yml, and it’s only 79 lines:

name: Take screenshots

on:
  push:
    branches:
      - main

  schedule:
    - cron: '7 7 * * 1'  # Every Monday at 7:07am UTC

jobs:
  take-screenshots:
    runs-on: macos-latest

    strategy:
      matrix:
        include:
          - url: alexwlchan.net
            filename_prefix: alexwlchan.net
          - url: books.alexwlchan.net
            filename_prefix: books

      # Setting max-parallel ensures that these jobs will run in serial,
      # not parallel, so we don't have conflicting tasks trying to
      # push new commits.
      max-parallel: 1

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

        with:
          # Check out the latest version of main, which may not be the
          # commit that triggered this event -- jobs in this workflow will
          # push new commits and update main, and we want each job to
          # get the latest code from main.
          ref: main

          # Make sure we don't download the existing screenshots as part
          # of this process -- this Action is strictly append-only, so
          # don't waste limited LFS bandwidth on it.
          lfs: false

      - name: Install Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 20

      - name: Install Playwright and browser
        run: |
          npm install playwright
          npx playwright install chromium

      - name: Take screenshot
        run: |
          today=$(date +"%Y-%m-%d")
          screenshot_path="screenshots/${{ matrix.filename_prefix }}.$today.png"

          # Make these variables available to subsequent steps
          # See https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-environment-variable
          echo "today=$today" >> "$GITHUB_ENV"
          echo "screenshot_path=$screenshot_path" >> "$GITHUB_ENV"

          mkdir -p "$(dirname "$screenshot_path")"

          # If there's already a screenshot for today, don't
          # bother overwriting it.
          if [[ -f "$screenshot_path" ]]; then exit 0; fi

          npx playwright screenshot \
            --full-page \
            --wait-for-timeout 10000 \
            "${{ matrix.url }}" "$screenshot_path"

      - name: Push changes to GitHub
        run: |
          git add "$screenshot_path"
          git commit -m "Add screenshot for ${{ matrix.url }} for $today" || exit 0
          git push origin main

This runs once a week on Monday mornings – I don’t update my websites that often, so I don’t need more frequent screenshots.

It installs Playwright, and uses it to take screenshots of two websites: alexwlchan.net (this site) and books.alexwlchan.net (my book tracker). The images are saved in a folder called screenshots, and the filenames include both the name of the site and the date taken, e.g. alexwlchan.net.2024-04-22.png or books.2024-03-21.png.

If I want to get screenshots of a different website, I can add to the list in the matrix section.

I had to add a timeout to Playwright (--wait-for-timeout 10000) to ensure it downloads all the images correctly. Before I added that option, I’d sometimes get screenshots with holes where the images hadn’t loaded in time.

Once the screenshot has been created, it gets committed to Git and pushed to GitHub. I had to tweak the GITHUB_TOKEN permissions to allow GitHub Actions to push commits to my repo. This is inspired by Simon Willison’s “git scraping” technique, but I’m tracking images rather than text.

Because PNG files can get quite big and I have a lot of them, I decided to use Git Large File Storage (LFS) with this repo – vanilla Git can struggle with large binary files. This is my first time using Git LFS, and it was pleasantly easy to set up following the Getting Started guide:

$ brew install git-lfs
$ git lfs install

$ cd ~/repos/scheduled-screenshots

$ git lfs track "*.png"
$ git add .gitattributes
$ git commit -m "Add .gitattributes file to store PNG images in Git LFS"

And that’s what it took to set up scheduled screenshots. If you want to see the code in a repo, or see the growing collection of screenshots, the GitHub repo is alexwlchan/scheduled-screenshots.

This is great for creating new screenshots, but what about everything that came before? This site is nearly 12 years old, and it’d be nice for that to be reflected in the visual record.

I dove into the Wayback Machine to backfill the old screenshots. My site isn’t indexed that often – on average about once a month – but I can fill in some of the gaps this way. First I used the Wayback Machine’s CDX Server API to get a list of captures, then I used Playwright to take screenshots. I had to adjust the timeouts to make sure everything loaded correctly, but I got them all to work eventually, and I got a hundred or so historical screenshots.

I was surprised by was how many issues I found. There were 116 captures of my book tracker, and of those 13 were clearly broken – the CSS or images hadn’t been saved, and so the page was unstyled or had gaps where the images were meant to go.

A further 7 were broken in subtle ways, where the HTML and CSS didn’t match. For example, I found one HTML capture from 2021 that’s loading CSS from 2024. The Wayback Machine shows you a working page, but it’s a hallucination – that’s not what the page looked like in 2021. (The rounded corners are a dead giveaway – I didn’t add those until 2022.)

I love the Wayback Machine and I think it’s a great service, but you shouldn’t rely on it to preserve your website. I’m glad these captures exist exist, but they’re a bit shaky as a preservation record. If there’s a website you care about, make sure you have your own system that saves the stuff you think is important – don’t just rely on the Wayback Machine.

My scheduled screenshots are now up and running, and every Monday I’ll get a new image to record the visual history of this site.

If you want to set up something similar for your websites, here are the steps:

Create a new GitHub repository
Create a new file .github/workflows/take_screenshots.yml with the contents of the YAML file earlier in this post
Give write access to the GITHUB_TOKEN used by GitHub Actions, so it can push to your repo
Change the list of URLs/filename prefixes in the matrix block for the websites you want to screenshot

The best time to start taking regular screenshots of my website was when I registered the domain name. The second best time is now.

[If the formatting of this post looks odd in your feed reader, visit the original article]

Getting faster Jekyll builds with caching in plugins

2024-04-17T19:44:35+00:00

This website is a static site built with Jekyll, and recently I overhauled the process for generating the site. This should be invisible if you’re just a reader, but it makes a big difference to me – like any software project, I’d accumulated cruft and complexity, and it was time to sweep that all away. The new process is simpler and should be easier to debug when stuff breaks. Yay!

One of my goals was making the site faster to build. This site is pretty small, modern computers are silly fast, and yet I was still waiting at least a minute for builds with a warm cache. I was sure it could be faster! So I tried to find ways to speed up the build process.

I found two bits of low-hanging fruit that could be useful additions to a lot of Jekyll sites.

Use the Jekyll Cache API

I’ve written quite a few Jekyll plugins which I keep in my _plugins folder.

Most of these are various text processing functions that always return the same output for the same input. For example, I have one plugin that adds non-breaking spaces to my rendered HTML, and it always adds the same set of spaces.

This makes my plugins a good fit for Jekyll 4’s caching API, which allows me to cache the expensive computation in my plugins. The API is quite nice: you create an instance of Jekyll::Cache, then you call getset(key). If key is already in the cache, it returns the cached value; if not, it runs the expensive computation once and stores the value in the cache.

Here’s an example:

def cache
  @@cache ||= Jekyll::Cache.new('AddNonBreakingSpaces')
end

def add_non_breaking_spaces(markdown)
  cache.getset(markdown) do
    expensive_add_non_breaking_spaces_method(markdown)
  end
end

In this code, expensive_add_non_breaking_spaces_method will only be called once, and then cached until the Markdown changes or the cache is cleared.

The cache is cleared every time you change _config.yml or you run Jekyll with a different verb, e.g. build or serve. This makes it easy to see the impact that caching is having – make a change to _config.yml, then run jekyll build twice. The first build will run with a cold cache, the second with a warm cache. The difference is the speedup you get from caching.

In this site, adding caching to my plugins speeds up the build from ~6s to ~1.5s. That’s 4× faster!

Add caching to the built-in Jekyll filters

I did some testing with jekyll build --profile to find which pages were taking the most build time, and I discovered that the built-in smartify filter was taking a big chunk of the build time. This is a filter which converts "straight quotes" into “curly quotes”.

I use this filter in a lot of places (individual posts, index pages, the RSS feed) and it’s another good fit for the caching API – adding curly quotes to a piece of text will always return the same result.

There are two ways you could add caching to the smartify filter:

create a new filter cached_smartify, and use that instead of smartify
override the existing smartify filter to add caching

I think both approaches are fine, and it just depends on which style you prefer. In this case, I slightly prefer the second approach, so I added a “cached smartify” plugin to my _plugins folder. This overrides the existing smartify filter and replaces it with a cached version. I can continue to use smartify in my templates as before, and enjoy a nice speedup:

# cached_smartify.rb
module Jekyll
  module Filters
    alias builtin_smartify smartify
    
    def smartify_cache
      @@smartify_cache ||= Jekyll::Cache.new('Smartify')
    end

    # Like the builtin smartify filter, but faster.
    def smartify(input)
      smartify_cache.getset(input) do
        builtin_smartify(input)
      end
    end
  end
end

In my book tracker, adding caching to the smartify filter speeds up the build from ~1.8s to ~0.8s. Nice!

Conclusion

The last time I refactored the site build, it felt pretty fast. Over time, I’ve gradually made changes that slowed it down, and it was feeling pretty sluggish. I didn’t think to record precise timings before I started the refactor, but I know I sometimes had to wait tens of seconds for a build to run.

Now this site takes ~1.5s to build (with a warm cache). It doesn’t exactly feel “fast”, but it doesn’t feel slow any more. I’m sure I could wring out more speed if I really tried, but this is already a big improvement and fast enough for now.

A lot of the improvements came from simplifying the build system, and deleting a bunch of custom code that was causing slowness. The Jekyll caching is only part of the speedup, but it’s the improvement most likely to be applicable to other Jekyll sites.

When the site is slow, I get frustrated, bored, and tend to find excuses not to write anything. When the site is fast, I enjoy working on it and I’m more likely to write new stuff. One of my goals for this year is to spend more time on my writing, and making it more pleasant to write is a good step in that direction.

[If the formatting of this post looks odd in your feed reader, visit the original article]

The Star-Spangled Ballad

2024-04-14T15:52:57+00:00

In an hour or so, Hannah Waddingham will take the stage at the Royal Albert Hall, and present this year’s Olivier Awards. I won’t be there, but a number of dear friends are in the audience, and we’re keeping our fingers crossed for Operation Mincemeat – a musical with which we all have an entirely normal and healthy relationship. (Just don’t ask how many times we’ve been.)

One of Mincemeat’s six nominations is “Outstanding musical contribution”, specifically Joe Bunker for music directing and Steve Sidwell for orchestrating. While we wait to find out who’s won, I wanted to point out a fun detail in the music – one that I only heard for myself last night.

In the second act, the main characters are waiting for news of a fake plane crash – a dead body they’ve dressed to look like the made-up British officer William Martin, who’s carrying “secret plans” about the “Allied invasion of Europe”. The idea is that he’ll wash ashore in Spain, the Germans will find the plans in his briefcase, and they’ll be misled about the Allies’s true goals.

Unfortunately the initial news is quite different – an Allied pilot has washed ashore, but he’s very much alive and very much American. We get to know him in the song The Ballad of Willie Watkins, while the main characters fret about whether two pilots crashing on the same day, in the same place, with the same name, might arouse suspicion.

(This may seem like an implausible coincidence, but it’s true. Willie Watkins was a real pilot, he really did crash in Spain, and he was even present at William Martin’s autopsy – a detail too ridiculous to fit into the show. Truth is stranger than fiction.)

For months now, friends have been telling me that The Ballad of Willie Watkins has a few bars from The Star-Spangled Banner, the US national anthem, but I’ve always struggled to hear it. With some help from Liam in the Discord, I was able to spot it.

Here’s a clip from the cast recording where the anthem is playing in the background:

Download MP3

Liam messed with an EQ to make the Star-Spangled Banner a bit more prominent – the rest of the song fades away when the anthem is playing. Here’s an MP3 he made:

Download MP3

I took that a step further by stripping out the bass and the drums, and I think it makes those parts even more prominent. (I googled something like “remove lyrics from MP3” and used the first website I found. It split the MP3 into different tracks, and I could toggle each of them off/on.) This makes it very clear to me:

Download MP3

These processed versions definitely distort the brass which is actually playing the melody, but they were enough that I can now reliably hear it in the unprocessed soundtrack. And last night, I was sitting in the audience and I finally heard the Star-Spangled Banner as it was played live!

[If the formatting of this post looks odd in your feed reader, visit the original article]

flapi.sh: a tiny command-line tool for exploring the Flickr API →

2024-04-11T22:09:55+00:00

I use the Flickr API pretty much every day in my day job. Within the first week, I bashed together a couple of command-line tools to make a simple tool for exploring the API. They’re not meant for building “proper” apps, more for quick experiments and seeing what API responses look like.

The main tool is flapi, and you pass it two arguments: the name of the API method, and (optionally) any query parameters you want to include. Then it calls the Flickr API for you!

For example:

$ flapi flickr.photos.getInfo photo_id=25260341744

    …

There are lots of ways to play with APIs – colleagues at a previous job used Postman, and Flickr itself has an API Explorer where you can try requests in the browser. I like doing it with a command-line tool because I always have lots of terminal windows open, and it’s easy to grab one and check something with flapi.

Today I finally took the time to tidy up the script and post it on GitHub.

The code isn’t complicated – indeed, it was easy to cobble it together from a bunch of other tools. Even if you have no interest in the Flickr API, some of them might be useful for you.

I’m using keyring to store and retrieve my Flickr API key in the macOS Keychain. This is a Python library that makes it easy to manage passwords and other secrets, and to avoid hard-coding secrets in my code.
I use curl to make the actual HTTP request. I recently listened to an interesting interview with Daniel Stenberg, curl’s creator and maintainer, where he talked about how curl got started and how he keeps it going.
I pipe the output to xmllint --format, which pretty-prints the XML. I’m not sure how much of a difference it makes to Flickr’s API responses, but it’s nice to remember that macOS has a built in reformatter for XML.
Finally, I pipe the reformatted XML to Pygments to add syntax highlighting. For me, this makes the output a bit easier to read.

Put together, the core logic of flapi is just four lines:

api_key=$(keyring get flickr_api key)

curl --silent "https://api.flickr.com/services/rest/?api_key=${api_key}&method=${method}&${params}" \
  | xmllint --format - \
  | pygmentize -l xml

These four programs feel like an exemplar of the Unix Way: small tools that each do one task well, written so they can be combined in complex ways.

[If the formatting of this post looks odd in your feed reader, visit the original article]

alexwlchan

Using static websites for tiny archives

How did I get to static websites?

Emphasis on “tiny”

Prior art

Making alt text more visible

Flickr Foundation at iPres 2024 →

Two examples of hover styles on images

Adding an image border on hover

Changing the colour of icons on hover

Drawing a better bandwidth graph for Netlify

Getting my bandwidth data

Drawing this as a pie chart

Going between Finder and the Terminal

reveal (aka open -R)

ffile

trash

Digital decluttering

Three examples of digital clutter

1. My photo library

2. My podcast collection

3. My bookmark archive

Mountaintop moments

create_thumbnail: create smaller versions of images

How does it work?

Why did I make this?

Plates and states

Not all states require both plates

Different plates for different states

What’s the rate of all the plates?

License plates are fractally interesting

Google is showing outdated results from the UK’s election

Why is this happening?

Why does this matter?

Doodling with the Mac’s command icon

The idea

Tackling the trigonometry

From equations to XML

At last, some pretty pictures

emptydir: look for (nearly) empty directories and delete them

Why not use find?

How does it work?

Why did you make this?

Improving millions of files on Wikimedia Commons with Flickypedia Backfillr Bot →

The surprising utility of a Flickr URL parser →

Drawing repetitive radial artworks

Polar coordinates

The sketches

Documenting my DNS records

Getting a snapshot of my existing DNS records

Adding documentation to my DNS snapshots

An infrastructure-as-code future

Conclusion

Preserving pixels in Paris

Creating a Safari webarchive from the command line

Prior art: newzealandpaul/webarchiver

Playing with WKWebView and createWebArchiveData

We need a loop-de-loop

Checking the page loaded successfully with WKNavigationDelegate

Adding some command-line arguments

Running it over my bookmarks

The finished script

What comes after AWS?

Turnkey solutions for common problems

Closer to the Edge

More presence in the Global South

A Free Tier that’s actually free

Better bills for big business

The power of politics

Or maybe something completely different

What is psephology?

Voting and counting in ancient Greece

It’s not just for voting

Other uses of the pseph- prefix

Taking regular screenshots of my website

Getting faster Jekyll builds with caching in plugins

Use the Jekyll Cache API

Add caching to the built-in Jekyll filters

Conclusion

The Star-Spangled Ballad

`reveal` (aka `open -R`)

`ffile`

`trash`

Why not use `find`?

Playing with `WKWebView` and `createWebArchiveData`

Checking the page loaded successfully with `WKNavigationDelegate`