Digital preservation

Digital preservation is about protecting digital information to ensure it’s available for a long time into the future. Libraries and archives have old manuscripts and papers from centuries ago; digital preservation is about trying to give digital media a similar lifespan.

I’ve always been a digital packrat, saving fanfiction as a teenager when it became clear I couldn’t rely on my favourite websites to stay up.

I formalised those ideas when I went to work for Wellcome Collection and the Flickr Foundation, where I helped to build services to store digital collections.

Sub-topics:

Ignore AI upscaled YouTube videos with yt-dlp · Preserving social media · 24 Dec 2025
Filter for formats that don’t include -sr (“super resolution”) in their format ID.
Creating a personal wrapper around yt-dlp · Preserving social media · 7 Oct 2025
I’ve written a new script which calls yt-dlp with my preferred options, so I don’t have to copy my configuration across different projects.
Get the avatar URL for an Instagram page · Preserving social media · 5 Oct 2025
Use gallery-dl --get-urls "https://www.instagram.com/{page_name}/avatar".
The “MCP” in Archivematica stands for “Master Control Program” · Digital preservation · 22 Sep 2025
It’s nothing to do with generative AI.
Get the avatar URL for a Bluesky user · Preserving social media · 11 Aug 2025
Make a request to the app.bsky.actor.getProfile endpoint, passing their handle as the actor parameter.
Looking up posts in the Bluesky API · Preserving social media · 10 Aug 2025
Install the atproto package, construct a client with your username/password, then call the get_post_thread method with your at:// URI.
My favourite websites from my bookmark collection · Web archiving · 2 Jun 2025
Websites that change randomly, that mirror the real world, or even follow the moon and the sun, plus my all-time favourite website design.

Downloading avatars from Tumblr · Preserving social media · 9 Feb 2025
There’s an API endpoint that lets you download avatars in a variety of sizes.
Bitly will delete your account if you don’t use it for three years · Preserving social media · 6 Feb 2025
Google will delete your account if you don’t use it for two years · Preserving social media · 23 Jan 2025

The surprising utility of a Flickr URL parser · Preserving social media · 6 Jun 2024
I made a library that knows how to read lots of different forms of Flickr.com URL, and I used hyperlink to do it.
Preserving pixels in Paris · The world around us · 23 May 2024
I went to France for a conference about archiving the web, and I came back with thoughts and photos.
Open a Safari webarchive from Twitter/X without being redirected · Preserving social media · 19 May 2024
Disabling JavaScript when you open the webarchive file will prevent you from redirecting you to twitter.com.
Creating a Safari webarchive from the command line · Web archiving · 17 May 2024
We can use the createWebArchiveData method on WKWebView to write a Swift script that creates Safari webarchive files.
What’s inside a Safari webarchive? · Web archiving · 14 May 2024
The inside of a .webarchive file is a binary property list with the complete responses and some request metadata.
Taking regular screenshots of my website · Screenshots · 23 Apr 2024
A screenshot a day keeps the bit rot at bay.
How to get a list of captures from the Wayback Machine · Web archiving · 21 Apr 2024
Use the CDX Server API to get a list of captures for a particular URL.
How to take a screenshot of a page in the Wayback Machine · Web archiving · 17 Apr 2024
Using Playwright to take screenshots and adding some custom styles gets a screenshot of a page without the Wayback Machine overlay.
My config for running youtube-dl · Preserving social media · 23 Mar 2024
The flags and arguments I find useful when I’m using youtube-dl.
Going through my old school papers · Digital preservation · 15 Feb 2023
Digitising and pruning my boxes of paper from school. In which I have nostalgia, sadness, and the sense that everything old is new again.
Saving your alt text from Twitter · Preserving social media · 6 Nov 2022
Twitter’s archives don’t include the alt text you wrote on images, but you can save a copy with their API.

Replicating Wellcome Collection’s digital archive to Azure Blob Storage · Digital preservation · 30 Sep 2020
How and why we keep copies of Wellcome’s digital collections in multiple cloud storage providers.
How to do parallel downloads with youtube-dl · Preserving social media · 12 Jul 2020
Archive monocultures considered harmful · Digital preservation · 15 Jun 2020
We are better off when the same topic is represented in multiple, different archives.
Downloading the AO3 fics that I’ve saved in Pinboard · Preserving social media · 15 May 2020
A script that downloads the nicely formatted AO3 downloads for everything I’ve saved in Pinboard.
Storing multiple, human-readable versions of BagIt bags · Digital preservation · 11 Feb 2020
How we use the fetch.txt file in a bag to track multiple copies of an object in our digital archive.

How I scan and organise my paperwork · Digital preservation · 27 Nov 2019
My procedure for scanning paper, and organising the scanned PDFs with keyword tagging.
Saving a copy of a tweet by typing ;twurl · Preserving social media · 17 Nov 2019
Digital preservation at Wellcome Collection · Wellcome Collection · 22 Oct 2019
Slides from a presentation about our processes, practices, and tools.
Reversing a t.co URL to the original tweet · Preserving social media · 28 Apr 2019
Twitter uses t.co to shorten links in tweets, so I wrote some Python to take a t.co URL and find the original tweet.
Getting a transcript of a talk from YouTube · Preserving social media · 11 Apr 2019
Using the auto-generated captions from a YouTube video as a starting point for a complete transcript.
Finding the latest screenshot in macOS Mojave · macOS · 11 Mar 2019
A script for backing up Tumblr posts and likes · Preserving social media · 5 Dec 2018
Since Tumblr users are going on a mass deletion spree (helped by the Tumblr staff), some scripts to save content before it’s too late.
Backing up full-page archives from Pinboard · Web archiving · 31 Jul 2017
A Rust utility for saving local copies of my full-page archives from Pinboard.
Backing up content from SoundCloud · Preserving social media · 18 Jul 2017
Automatic Pinboard backups · Web archiving · 31 Mar 2013
A script for automatically backing up bookmarks from Pinboard

Digital preservation

Ignore AI upscaled YouTube videos with yt-dlp · Preserving social media · 24 Dec 2025

Creating a personal wrapper around yt-dlp · Preserving social media · 7 Oct 2025

Get the avatar URL for an Instagram page · Preserving social media · 5 Oct 2025

The “MCP” in Archivematica stands for “Master Control Program” · Digital preservation · 22 Sep 2025

Get the avatar URL for a Bluesky user · Preserving social media · 11 Aug 2025

Looking up posts in the Bluesky API · Preserving social media · 10 Aug 2025

My favourite websites from my bookmark collection · Web archiving · 2 Jun 2025

Downloading avatars from Tumblr · Preserving social media · 9 Feb 2025

Bitly will delete your account if you don’t use it for three years · Preserving social media · 6 Feb 2025

Google will delete your account if you don’t use it for two years · Preserving social media · 23 Jan 2025

The surprising utility of a Flickr URL parser · Preserving social media · 6 Jun 2024

Preserving pixels in Paris · The world around us · 23 May 2024

Open a Safari webarchive from Twitter/X without being redirected · Preserving social media · 19 May 2024

Creating a Safari webarchive from the command line · Web archiving · 17 May 2024

What’s inside a Safari webarchive? · Web archiving · 14 May 2024

Taking regular screenshots of my website · Screenshots · 23 Apr 2024

How to get a list of captures from the Wayback Machine · Web archiving · 21 Apr 2024

How to take a screenshot of a page in the Wayback Machine · Web archiving · 17 Apr 2024

My config for running youtube-dl · Preserving social media · 23 Mar 2024

Going through my old school papers · Digital preservation · 15 Feb 2023

Saving your alt text from Twitter · Preserving social media · 6 Nov 2022

Replicating Wellcome Collection’s digital archive to Azure Blob Storage · Digital preservation · 30 Sep 2020

How to do parallel downloads with youtube-dl · Preserving social media · 12 Jul 2020

Archive monocultures considered harmful · Digital preservation · 15 Jun 2020

Downloading the AO3 fics that I’ve saved in Pinboard · Preserving social media · 15 May 2020

Storing multiple, human-readable versions of BagIt bags · Digital preservation · 11 Feb 2020

How I scan and organise my paperwork · Digital preservation · 27 Nov 2019

Saving a copy of a tweet by typing ;twurl · Preserving social media · 17 Nov 2019

Digital preservation at Wellcome Collection · Wellcome Collection · 22 Oct 2019

Reversing a t.co URL to the original tweet · Preserving social media · 28 Apr 2019

Getting a transcript of a talk from YouTube · Preserving social media · 11 Apr 2019

Finding the latest screenshot in macOS Mojave · macOS · 11 Mar 2019

A script for backing up Tumblr posts and likes · Preserving social media · 5 Dec 2018

Backing up full-page archives from Pinboard · Web archiving · 31 Jul 2017

Backing up content from SoundCloud · Preserving social media · 18 Jul 2017

Automatic Pinboard backups · Web archiving · 31 Mar 2013