Falsehoods programmers believe about Unix time

With apologies to Patrick McKenzie.

Danny was asking us about our favourite facts about Unix time in the Wellcome Slack yesterday, and I was reminded of the way that it behaves in some completely counter-intuitive ways.

These three facts all seem eminently sensible and reasonable, right?

  1. Unix time is the number of seconds since 1 January 1970 00:00:00 UTC
  2. If I wait exactly one second, Unix time advances by exactly one second
  3. Unix time can never go backwards

False, false, false.

But it’s unsatisfying to say “this is false” without explaining why, so I’ll explain that below. If you’d like to think about it first and make your own guess, don’t scroll past the picture of the clock!

A bracket clock. Construction and assembly by John Leroux. Credit: Wellcome Collection. Used under CC BY.

All three of these falsehoods have the same underlying cause: leap seconds. If you’re unfamiliar with leap seconds, here’s a brief primer:

There are two factors that make up UTC:

Problem is, these two numbers don’t always match. The Earth’s rotation isn’t consistent – it’s gradually slowing down, so days in Universal Time are getting longer. Atomic clocks, on the other hand, are fiendishly accurate, and consistent for millions of years.

When the two times drift apart, a leap second is added or removed to UTC to bring them back together. Since 1972, the IERS (who manage this stuff) have inserted an extra 27 leap seconds. The result is a UTC day with 86,401 seconds (one extra), or 86,399 (one missing) – both of which mess with a fundamental assumption of Unix time.

Unix time assumes that each day is exactly 86,400 seconds long (60 × 60 × 24 = 86,400), leap seconds be damned. If there’s a leap second in a day, Unix time either repeats or omits a second as appropriate to make them match. As of 2019, the extra 27 leap seconds are missing.

And so our falsehoods go as follows:

And these probably aren’t even the only weirdnesses of Unix time – they’re just the ones I half-remembered yesterday, enough to check a few details and write a blog post about.

Time is straaaaaange.

Creating a locking service in a Scala type class

A few weeks ago, Robert (one of my colleagues at Wellcome) and I wrote some code to implement locking. I’m quite pleased with the code we wrote, and the way we do all the tricky logic in a type class. It uses functional programming, type classes, and the Cats library.

I’m going to walk through the code in the post, but please don’t be intimidated if it seems complicated. It took us both a week to write, and even longer to get right!

I’m not expecting many people to use this directly. You can copy/paste it into your project, but unless you have a similar use case to us, it won’t be useful to you. Instead, I hope you get a better understanding of how type classes work, how they can be useful, and the value of sans-IO implementations.

The problem

Robert and I are part of a team building a storage service, which will eventually be Wellcome’s permanent storage for digital records. That includes archives, images, photographs, and much more.

We’re saving files to an Amazon S3 bucket1, but Amazon doesn’t have a way to lock around writes to S3. If more than one process writes to the same location at the same time, there’s no guarantee which will win!

Our pipeline features lots of parallel workers – Docker containers running in ECS, and each container running multiple threads. We want to lock around writes to S3, so that only a single process can write to a given S3 location at a time. We already verify files after they’ve been written, and locking gives an extra guarantee that a rogue process can’t corrupt the archive. Because S3 doesn’t provide those locks for us, we have to manage them ourselves.

This is one use case – there are several other places where we need our own locking. We wanted to build one locking implementation that we could use in lots of places.

The idea

We already had an existing locking service that used DynamoDB as a backend. It creates locks by writing a row for each lock, and doing a conditional update “only store this row if there isn’t already a row with this lock ID”. If the conditional updated failed, we’d know somebody else was holding the lock.

This code worked fine, but it was closely tied to DynamoDB, and that caused issues.

It was slow and fiddly to test – you needed to set up a dummy DynamoDB instance – and if you were calling the locking service, you needed that test setup as well. It was also closely tied to the DynamoDB APIs, so we couldn’t easily extend or modify it to work with a different backend (for example, MySQL).

We wanted to try writing a new locking service that wasn’t tied to DynamoDB. We’d separate out the locking logic and the database backend, and write something that was easy to extend or modify.

This is the API in the original service which were were trying to replicate:

lockingService.withLocks(Set("1", "2", "3")) {
  // do some stuff
}

The idea of doing it in a type class (so it wasn’t tied to a particular database implementation) isn’t new.

I first came across this idea when working on hyper-h2, an HTTP/2 protocol stack for Python that’s purely in-memory. It only operates on bytes, and doesn’t have opinions about I/O or networking, so it can be reused in a variety of contexts. hyper-h2 is part of a wider pattern of sans-IO network protocol libraries, and many of the same benefits apply here.

Managing individual locks

First we need to be able to manage a lock around a single resource. We assume the resource has some sort of identifier, which we can use to distinguish locks.

We might write something like this (here, a dao is a data access object):

trait LockDao[Ident] {
  def lock(id: Ident)
  def unlock(id: Ident)
}

This is a generic trait, which manages acquiring and releasing a single lock. It has to decide if/when we can perform each of those operations.

We can create implementations with different backends that all inherit this trait, and which have different rules for managing locks. A few ideas:

The type of the lock identifier is a type parameter, Ident. An identifier might be a string, or a number, or a UUID, or something else – we don’t have to decide here.

Sometimes we need to acquire more than one lock at once, which needs multiple calls to lock() – and then the caller has to remember which locks they’ve acquired to release them. To make it simpler for the caller, we’ve added a second parameter – a context ID – to track which process owns a given lock. A single call to unlock() releases all the locks owned by a process.

Here’s what that trait looks like:

trait LockDao[Ident, ContextId] {
  def lock(id: Ident, contextId: ContextId)
  def unlock(contextId: ContextId)
}

As before, the context ID could be any type, so we’ve made it a type parameter, ContextId.

Now let’s think about what these methods should return. We need to tell the caller whether the lock/unlock succeeded.

We probably want some context, especially if something goes wrong – so more than a simple boolean. We could use a Try or a Future, but that doesn’t feel quite right – we expect lock failures sometimes, and it’d be nice to type the errors beyond just Throwable.

Eventually we settled upon using an Either, with case classes for lock/unlock failures that include some context for the operation in question, and a Throwable that explains why the operation failed:

trait LockDao[Ident, ContextId] {
  type LockResult = Either[LockFailure[Ident], Lock[Ident, ContextId]]
  type UnlockResult = Either[UnlockFailure[ContextId], Unit]

  def lock(id: Ident, contextId: ContextId): LockResult
  def unlock(contextId: ContextId): UnlockResult
}

trait Lock[Ident, ContextId] {
  val id: Ident
  val contextId: ContextId
}

case class LockFailure[Ident](id: Ident, e: Throwable)

case class UnlockFailure[ContextId](contextId: ContextId, e: Throwable)

There’s also a generic Lock trait, which holds an Ident and a ContextId. Implementations can return just those two values, or extra data if it’s appropriate. (For example, we have an expiring lock that tells you when the lock is due to expire.)

Now we need to create implementations of this trait!

Creating an in-memory LockDao for testing

Somebody who uses the LockDao can ask for an instance of that trait, and it doesn’t matter whether it’s backed by a real database or it’s just in-memory. So when we’re testing code that uses the LockDao – but not testing a LockDao implementation specifically – we can use a simple, in-memory implementation. This makes our tests faster and easier to manage!

Let’s create one now. Here’s a skeleton to start with:

class InMemoryLockDao[Ident, ContextId] extends LockDao[Ident, ContextId] {
  def lock(id: Ident, contextId: ContextId): LockResult = ???
  def unlock(contextId: ContextId): UnlockResult = ???
}

Because this is just for testing, we can store the locks as a map. When somebody acquires a new lock, we store the context ID in the map. Here’s what that looks like:

case class PermanentLock[Ident, ContextId](
  id: Ident,
  contextId: ContextId
) extends Lock[Ident, ContextId]

class InMemoryLockDao[Ident, ContextId] extends LockDao[Ident, ContextId] {
  private var currentLocks: Map[Ident, ContextId] = Map.empty

  def lock(id: Ident, contextId: ContextId): LockResult =
    currentLocks.get(id) match {
      case Some(existingContextId) if contextId == existingContextId =>
        Right(
          PermanentLock(id = id, contextId = contextId)
        )
      case Some(existingContextId) =>
        Left(
          LockFailure(
            id,
            new Throwable(s"Failed to lock <$id> in context <$contextId>; already locked as <$existingContextId>")
          )
        )
      case None =>
        val newLock = PermanentLock(id = id, contextId = contextId)
        currentLocks = currentLocks ++ Map(id -> contextId)
        Right(newLock)
    }

  def unlock(contextId: ContextId): UnlockResult = ???
}

We have to remember to look for an existing lock, and compare it to the lock that’s requested. It’s fine to call lock() if you already have the lock, but you can’t lock an ID that somebody else owns.

Unlocking is much simpler: we just remove the entry from the map.

class InMemoryLockDao[Ident, ContextId] extends LockDao[Ident, ContextId] {
  def lock(id: Ident, contextId: ContextId): LockResult = ...

  def unlock(contextId: ContextId): UnlockResult = {
    currentLocks = currentLocks.filter { case (_, lockContextId) =>
      contextId != lockContextId
    }

    Right(Unit)
  }
}

This gives us a LockDao implementation that’s pretty simple, and we can use whenever we need a LockDao in tests.

Because it’s only for testing, it doesn’t need to be thread-safe or especially robust. This code is quite simple, so we’re more likely to get it right. When a caller uses this in tests, they can trust the LockDao is behaving correctly and focus on how they use it, and not worry about bugs in the locking code.

Here’s what it looks like in practice:

import java.util.UUID

val dao = new InMemoryLockDao[String, UUID]()

val u1 = UUID.randomUUID
println(dao.lock(id = "1", contextId = u1))               // succeeds
println(dao.lock(id = "1", contextId = UUID.randomUUID))  // succeeds
println(dao.lock(id = "2", contextId = UUID.randomUUID))  // fails
println(dao.unlock(contextId = u1))
println(dao.lock(id = "1", contextId = UUID.randomUUID))  // succeeds

We also have a small number of tests to check it behaves correctly:

Because there’s no I/O involved, those tests take a fraction of a second to run.

Creating a concrete implementation of LockDao

Because we work primarily in AWS, we’ve created a LockDao implementation that uses DynamoDB as a backend. This is what we use when running in production.

It fulfills the same basic contract, but it has to be more complicated. It calls the DynamoDB APIs, makes conditional updates, and it expires a lock after a fixed period if it hasn’t been released. If a worker crashes before it can release its locks, we want the system to recover automatically – we don’t want to have to clean up those locks by hand.

I’m not going to walk through it, but you can see this code in our GitHub repo (link at the end of the post).

Creating the locking service

Now let’s build a locking service. You pass it a set of identifiers and a callback. It has to acquire a lock on each of those identifiers, get the result of the callback, then release the locks and return the result.

Here’s a stub to start us off:

trait LockingService[Ident] {
  def withLocks(ids: Set[Ident])(callback: => ???) = ???
}

For now, let’s put aside the return type of the callback, and acquire a lock. We’ll need a lock dao (which can be entirely generic), and a way to create context IDs:

trait LockingService[LockDaoImpl <: LockDao[_, _]] {
  implicit val lockDao: LockDaoImpl

  def withLocks(ids: Set[lockDao.Ident])(callback: => ???) = ???

  def createContextId: lockDao.ContextId
}

We’re asking implementations to tell us how to create a context ID, because the type of context ID will vary, as will the rules for creation. Maybe it’s a worker ID, or a thread ID, or a random ID used once and discarded immediately after.

Then we need to acquire the locks on all the identifiers we’ve received. If we get them all, we can call the callback – but if any of the locks fail, we should release anything we’ve already locked and return without invoking the callback.

Let’s write a method for acquiring the locks:

import grizzled.slf4j.Logging

trait FailedLockingServiceOp

case class FailedLock[ContextId, Ident](
  contextId: ContextId,
  lockFailures: Set[LockFailure[Ident]]) extends FailedLockingServiceOp

trait LockingService[LockDaoImpl <: LockDao[_, _]] extends Logging {
  ...

  type LockingServiceResult = Either[FailedLockingServiceOp, lockDao.ContextId]

  def getLocks(
    ids: Set[lockDao.Ident],
    contextId: lockDao.ContextId): LockingServiceResult = {
    val lockResults = ids.map { lockDao.lock(_, contextId) }
    val failedLocks = getFailedLocks(lockResults)

    if (failedLocks.isEmpty) {
      Right(contextId)
    } else {
      unlock(contextId)
      Left(FailedLock(contextId, failedLocks))
    }
  }

  private def getFailedLocks(
    lockResults: Set[lockDao.LockResult]): Set[LockFailure[lockDao.Ident]] =
    lockResults.foldLeft(Set.empty[LockFailure[lockDao.Ident]]) { (acc, o) =>
      o match {
        case Right(_)         => acc
        case Left(failedLock) => acc + failedLock
      }
    }

  private def unlock(contextId: lockDao.ContextId): Unit =
    lockDao
      .unlock(contextId)
      .leftMap { error =>
        warn(s"Unable to unlock context $contextId fully: $error")
      }
}

The main entry point is getLocks(), which gets both the IDs and the context ID we’ve created. As in the InMemoryLockDao, this returns an Either[…], so we get nice context about any locking failures.

First we call lockDao.lock(…) on every ID, which gives us a list of LockResults. We look for any failures with getFailedLocks() – if there are any, we try to release the locks we’ve already taken, and return a Left. If all the locks succeed, we get a Right.

The unlocking happens in unlock(). It attempts to unlock everything, but an unlock failure just gets a warning in the logs, not a full-blown error. We’re already bubbling up an error for the locking failure, and we didn’t think it worth exposing those extra errors. And if the callback succeeds but the unlocking fails, the operation as a whole is still a success and worth returning to the caller.

Then we have to actually invoke the callback, and this bit gets interesting. We want this service to be very generic, and handle different types of function. The callback might return a Future, or a Try, or an Either, or something else. We want to preserve that return type, and combine it with possible locking errors.

So we added another pair of type parameters:

trait LockingService[Out, OutMonad[_], ...] {
  ...

  type Process = Either[FailedLockingServiceOp, Out]

  def withLocks(
    ids: Set[lockDao.Ident])(
    callback: => OutMonad[Out]): OutMonad[Process] = ???
}

We’re starting to get into code that uses more advanced functional programming, and in particular the Cats library. Robert and I were reading the book Scala with Cats as we wrote this code. It’s a free ebook, and I’d recommend it if you want more detail.

Let’s go through this code carefully.

We’ve added two new type parameters: Out and OutMonad[_], so the return type of our callback is OutMonad[Out]. What’s a monad?

This is the definition that works for me: a type F is a monad if:

Some examples of monads in Scala include List[_], Option[_] and Future[_]. They all take a single type parameter, have a monadic unit, and you can compose them with flatMap.

So we expect our callback to return a monad wrapping another type. Inside the service, we’ll get an Either which contains the result of the callback or the locking service error, and then we’ll wrap that Either in the monad type. We’re preserving the monad return type of the callback.

For example, if our callback returns Future[Int], then OutMonad would be Future and Out would be Int. The withLocks(…) method then returns Future[Either[FailedLockingServiceOp, Int]].

But what if our callback doesn’t return a monad? What if it returns a type like Int or String? Here we’ll use a bit of Cats: we can imagine these types as being wrapped in the identity monad, Id[_]. This is the monad that maps any value to itself, i.e. id(a: A) = a.

So even if the callback code isn’t wrapped in an explicit monad, the compiler can still assign the type parameter OutMonad, by imagining it as Id[_].

So now we know what type our callback returns, let’s actually call it inside the locking service. For now, assume we’ve already successfully acquired the locks, and we want to run the callback.

import cats.MonadError

case class FailedProcess[ContextId](contextId: ContextId, e: Throwable)
  extends FailedLockingServiceOp

trait LockingService[Out, OutMonad[_], ...] {
  ...

  type Process = Either[FailedLockingServiceOp, Out]

  def unlock(contextId: ContextId): UnlockResult = ...

  type OutMonadError = MonadError[OutMonad, Throwable]

  import cats.implicits._

  def safeCallback(contextId: lockDao.ContextId)(
    callback: => OutMonad[Out]
  )(implicit monadError: OutMonadError): OutMonad[Process] = {
    val partialResult: OutMonad[Process] = callback.map { out =>
      unlock(contextId)
      Either.right[FailedLockingServiceOp, Out](out)
    }

    monadError.handleError(partialResult) { err =>
      unlock(contextId)
      Either.left[FailedLockingServiceOp, Out](FailedProcess(contextId, err))
    }
  }
}

We’re bringing in more stuff from Cats here. The type we’ve just imported, MonadError, gives us a way to handle errors that happen inside monads – for example, an exception thrown inside a Future.

We call the callback, and wait for it to return (for example, a Future doesn’t return immediately). If it returns successfully, we map over the result, unlock the context ID, and wrap the result in a Right. We’ve imported cats.implicits._ so we can map over OutMonad and preserve its type. This the happy path.

If something goes wrong, we use the MonadError to handle the error, unlock the context ID, and then wrap the result in a Left. Using this Cats helper ensures we handle the error correctly, and it gets wrapped in the appropriate monad type at te end. This is the sad path.

Either way, we’re waiting for the callback to return and then releasing the locks.

If we had a concrete type like Future or Try, we’d know how to wait for the result. Instead, we’re handing that off to Cats.

Now we have all the pieces we need to actually write out withLocks method, and here it is:

import cats.data.EitherT

trait LockingService[Out, OutMonad[_], ...] {
  ...

  type LockingServiceResult = Either[FailedLockingServiceOp, lockDao.ContextId]

  def getLocks(
    ids: Set[lockDao.Ident],
    contextId: lockDao.ContextId): LockingServiceResult = ...

  def withLocks(ids: Set[lockDao.Ident])(
    callback: => OutMonad[Out]
  )(implicit m: OutMonadError): OutMonad[Process] = {
    val contextId: lockDao.ContextId = createContextId

    val eitherT = for {
      contextId <- EitherT.fromEither[OutMonad](
        getLocks(ids = ids, contextId = contextId)
      )

      out <- EitherT(safeCallback(contextId)(callback))
    } yield out

    eitherT.value
  }
}

Hopefully you recognise all the arguments to the function – the IDs to lock over, the callback, and the implicit MonadError (which will be created by Cats).

That EitherT in the for comprehension is another Cats helper. It’s an Either transformer – if you have a monad type F[_] and types A and B, then EitherT[F[_], A, B] is a thin wrapper for F[Either[A, B]]. It lets us easily swap the Either and the F[_].

In the first case, it takes the result of getLocks() and wraps it in OutMonad.

If getting the locks succeeds, then it calls safeCallback() and wraps that in an EitherT as well. Once that returns, it extracts the value of the underlying OutMonad[Either[_, _]] and returns that result.

And that’s the end of the locking service! In barely a hundred lines of Scala, we’ve implemented all the logic for a locking service – and it’s completely independent of the underlying database implementation.

Putting the locking service to use

We can combine the generic locking service with the in-memory lock dao, and get an in-memory locking service. Because all the logic is in the type class, this is really short:

import java.util.UUID

val lockingService = new LockingService[String, Try, LockDao[String, UUID]] {
  override implicit val lockDao: LockDao[String, UUID] =
    new InMemoryLockDao[String, UUID]()

  override def createContextId: UUID =
    UUID.randomUUID()
}

This is perfect for testing the locking service logic – because it’s in-memory, it runs really quickly, and we can write lots of tests to check it behaves correctly. Our tests cases include checking that it:

And those tests run in a fraction of a second! Because everything happens in memory, it’s incredibly fast.

And when we have code that uses the locking service, we can drop in the in-memory version for testing that, as well. It makes tests simpler and cleaner elsewhere in the codebase.

When we want an implementation to write in production, we can combine it with a LockDao implementation and get a new locking service implementation. This is the entirety of our DynamoDB locking service:

class DynamoLockingService[Out, OutMonad[_]](
  implicit val lockDao: DynamoLockDao)
    extends LockingService[Out, OutMonad, LockDao[String, UUID]] {

  override protected def createContextId(): lockDao.ContextId =
    UUID.randomUUID()
}

This is the beauty of doing it in a type class – we can swap out the implementation and not have to rewrite any of the tricky lock/unlock logic. It’s a really generic and reusable implementation.

Putting it all together

All the code this post was based on is in a public GitHub repository, wellcometrust/scala-storage, which is a collection of our shared storage utilities (mainly for working with DynamoDB and S3). These are the versions I worked from:

I’ve also put together a mini-project on GitHub with the code from this blog post alone. It has both the type classes, the in-memory LockDao implementation, and a small example that exercises both classes. All the code linked above (and in this post) is available under the MIT licence.

Writing this blog post was a useful exercise for me. If I want to explain this code, I have to really understand it. There’s no room to handwave something and say “this works, but I’m not sure why”.

And it makes the code better too! As I was writing this post, I spotted several places where the original code was unclear or inefficient. I’ll push those fixes back to the codebase – so not only is this blog post an explanation for future maintainers, but the code itself is clearer as well.

I can’t do this sort of breakdown for all the code I write, but I recommend it if you’re every writing especially complex or tricky code.


  1. Eventually every file will be stored in multiple S3 buckets, all with versioning and Object Locks enabled. We’ll also be saving a copy in another geographic region and with another cloud provider, probably Azure. ↩︎

Finding unused variables in a Terraform module

At work, we use Terraform to manage our infrastructure in AWS. We use modules to reduce repetition in our Terraform definitions, and we publish them in a public GitHub repo. A while back, I wrote a script that scans our modules and looks for unused variables, so that I could clean them all up.

In this post, I’m going to walk through the script and explain how it works. If you just want the script, you can skip to the end.

What variables are defined by a single Terraform file?

There’s a Python module for parsing HCL (the Terraform markup language), so let’s use that – much easier and more accurate than trying to detect variables manually. Here’s what that looks like:

import hcl


def get_variables_in_file(path):
    try:
        with open(path) as tf:
            tf_definitions = hcl.load(tf)
    except ValueError as err:
        raise ValueError(f"Error loading Terraform from {path}: {err}") from None

    try:
        return set(tf_definitions["variable"].keys())
    except KeyError:
        return set()

The hcl.load method does the heavy lifting. It returns a dictionary, where the keys are the different elements of the Terraform language – resource, variable, provider, and so on. Within the dictionary for each element, you get every instance of that element in the file.

For example, the following Terraform definition:

variable "queue_name" {
  description = "Name of the SQS queue to create"
}

resource "aws_sqs_queue" "q" {
  name            = "${var.queue_name}"
  redrive_policy = "{\"deadLetterTargetArn\":\"${aws_sqs_queue.dlq.arn}\",\"maxReceiveCount\":${var.max_receive_count}}"
}

resource "aws_sqs_queue" "dlq" {
  name = "${var.queue_name}_dlq"
}

gets a dictionary a bit like this:

{
  "resource": {
    "aws_sqs_queue": {
      "dlq": ...,
      "q": ...
    }
  },
  "variable": {
    "queue_name": ...
  }
}

Getting the list of keys in the variable block (if it’s present) tells us the variables defined in this file.

Sometimes you’ll discover the Terraform inside a file is just malformed (or the file is empty!) – so we wrap the exception we receive to include the file path. The from None disables exception chaining in Python 3, and makes the traceback a little cleaner.

What variables are defined by a Terraform module?

Once we can get the variables defined by a single file, we can get all the variables defined in a module.

A module is a collection of Terraform files in the same directory, so we can find them by using os.listdir, like so:

import os


def tf_files_in_module(dirname):
    for f in os.listdir(dirname):
        if f.endswith(".tf"):
            yield f


def get_variables_in_module(dirname):
    all_variables = {}

    for f in tf_files_in_module(dirname):
        for varname in get_variables_in_file(os.path.join(dirname, f)):
            all_variables[varname] = f

    return all_variables

This returns a map from (variable name) to (file where the variable was defined). If a variable turns out to be redundant, knowing which file it was defined in will be helpful when we go back to delete it.

Does a module have any unused variables?

Once we have a list of variables defined in a module, we need to go back to see which of them are in use. I haven’t found such a good way to do this – right now the best I’ve come up with is to look for the string var.VARIABLE_NAME in all the files. It’s a bit crude, but seems to work.

Here’s the code:

def find_unused_variables_in_module(dirname):
    unused_variables = get_variables_in_module(dirname)

    for f in tf_files_in_module(dirname):
        if not unused_variables:
            return {}

        tf_src = open(os.path.join(dirname, f)).read()
        for varname in list(unused_variables):
            if f"var.{varname}" in tf_src:
                del unused_variables[varname]

    return unused_variables

We start by getting a list of all the variables defined in the module.

Then we go through the files in the module, one-by-one. If we don’t have any unused variables left, we can exit early – checking the rest of the files won’t tell us anything new. Otherwise, we open the file, read the Terraform source, and look for instances of the variables we haven’t seen used yet. If we see a variable in use, we delete it from the dict.

We have to iterate over list(unused_variables) rather than unused_variables itself, because we’re deleting elements from that dict as we go along. If you don’t make it a list first, you’ll get an error when you delete the first element: “dictionary changed size during iteration”.

If the module uses all of its variables, we get back an empty dict. If there are unused variables, we get a dict that tells us which variables aren’t being used, and which file they’re defined in.

Looking at all the modules in a repo

Our terraform-modules repo defines dozens of modules, and I wouldn’t want to check them all by hand. Instead, it’s easier (and faster!) to use os.walk to look through every directory in the repo. For a quick speedup, we can look for filenames ending with .tf to decide if a particular directory is a module.

Here’s some code:

def find_unused_variables_in_tree(root):
    for mod_root, _, filenames in os.walk(root):
        if not any(f.endswith(".tf") for f in filenames):
            continue

        unused_variables = find_unused_variables_in_module(mod_root)

        if unused_variables:
            print(f"Unused variables in {mod_root}:")
            for varname, filename in unused_variables.items():
                print(f"* {varname} ~> {os.path.join(mod_root, filename)}")
            print("")

And I wrap that in a little main block:

import sys


if __name__ == "__main__":
    try:
        root = sys.argv[1]
    except IndexError:
        root = "."

    find_unused_variables_in_tree(root)

This means I can pass a directory to the script, and it looks for unused modules under that directory – or if I don’t pass an argument, it looks in the current directory.

Putting it all together

Here’s the final version of the code:

import os
import sys

import hcl


def get_variables_in_file(path):
    try:
        with open(path) as tf:
            tf_definitions = hcl.load(tf)
    except ValueError as err:
        raise ValueError(f"Error loading Terraform from {path}: {err}")

    try:
        return set(tf_definitions["variable"].keys())
    except KeyError:
        return set()


def tf_files_in_module(dirname):
    for f in os.listdir(dirname):
        if f.endswith(".tf"):
            yield f


def get_variables_in_module(dirname):
    all_variables = {}

    for f in tf_files_in_module(dirname):
        for varname in get_variables_in_file(os.path.join(dirname, f)):
            all_variables[varname] = f

    return all_variables


def find_unused_variables_in_module(dirname):
    unused_variables = get_variables_in_module(dirname)

    for f in tf_files_in_module(dirname):
        if not unused_variables:
            return {}

        tf_src = open(os.path.join(dirname, f)).read()
        for varname in list(unused_variables):
            if f"var.{varname}" in tf_src:
                del unused_variables[varname]

    return unused_variables


def find_unused_variables_in_tree(root):
    for mod_root, _, filenames in os.walk(root):
        if not any(f.endswith(".tf") for f in filenames):
            continue

        unused_variables = find_unused_variables_in_module(mod_root)

        if unused_variables:
            print(f"Unused variables in {mod_root}:")
            for varname, filename in unused_variables.items():
                print(f"* {varname} ~> {os.path.join(mod_root, filename)}")
            print("")


if __name__ == "__main__":
    try:
        root = sys.argv[1]
    except IndexError:
        root = "."

    find_unused_variables_in_tree(root)

When I originally ran this script, it turned up a lot of unused variables, and I cleaned up the entire repo in one go. I don’t use it very often, because the modules don’t change as much as they used to, but it’s useful to have it. I run it once in a blue moon, and clean up anything it tells me about.

It even exposed a few bugs! It flagged a variable as being unused, even though it was one we expected the module to be using. When I went to look, I found a configuration error or a typo. Fix that, the variable is now in use, and the script is happy.

I’ve also used this code to look for unused locals – but I’ll leave that as an exercise for the reader.

Reversing a t.co URL to the original tweet

If you post a link on Twitter, it goes through Twitter’s t.co link-shortening service. The link in the tweet text is replaced with a t.co URL, and that URL redirects to the original destination.

A flow chart: A tweet contains a t.co URL, and a t.co URL redirects to the destination.

If you’re just reading Twitter, the presence of t.co is mostly invisible – it’s not shown in the interface, and if you click on a URL you get to the original destination.

A t.co URL is an HTTP 301 Redirect to the destination, which any browser or HTTP client can follow (as long as Twitter keeps running the service). For example:

>>> import requests
>>> resp = requests.head("https://t.co/mtXLLfYOYE")
>>> resp.status_code
301
>>> resp.headers["Location"]
'https://www.bbc.co.uk/news/blogs-trending-47975564'

But what if you only have the t.co URL, and you want to find the original tweet? For example, I see t.co URLs in my referrer logs – people linking to my blog – and I want to know what they’re saying about me!

Twitter don’t provide a public API for doing this, so there’s no perfect way to reverse a t.co URL back to its source. I have found a couple of ways to do it, and in this post I’ll explain how.

The manual approach

If you search for a t.co URL in Twitter, you can see tweets which include it. If the twet is recent and visible to you, it shows up in the results:

Searching for a t.co URL with a single result.

Sometimes you might find multiple tweets that include the same URL. I’ve seen this happen when somebody posts the same link several times:

Searching for a t.co URL with multiple results.

If you only need to search for a couple of URLs, this is probably fine.

The Python approach

Because I need to do this a lot, I wanted to automate the process Twitter have a search API which provides similar data to the Twitter website, so by calling this API we can mimic the search interface. I wrote a Python script to do it for me, which I’ll walk through below

First we need to authenticate with the Twitter API. You’ll need some Twitter API credentials, which you can get through Twitter’s developer site.

In the past I used tweepy to connect to the Twitter APIs, but these days I prefer to use the requests-oauthlib library and make direct requests. We create an OAuth session:

from requests_oauthlib import OAuth1Session

sess = OAuth1Session(
    client_key=TWITTER_CONSUMER_KEY,
    client_secret=TWITTER_CONSUMER_SECRET,
    resource_owner_key=TWITTER_ACCESS_TOKEN,
    resource_owner_secret=TWITTER_ACCESS_TOKEN_SECRET
)

Then we can call the search API like so:

resp = sess.get(
    "https://api.twitter.com/1.1/search/tweets.json",
    params={
        "q": TCO_URL,
        "count": 100,
    }
)

The q parameter is the search query, which in this case is the t.co URL. We get as many tweets as possible (you’re allowed up to 100 tweets in a single request).

We extract the tweets like so:

statuses = resp.json()["statuses"]

The API represents every retweet as an individual status, so a tweet with three retweets would have four entries in this response – one for the original tweet, and three more for each of the retweets. The Twitter web UI handles that for us, and consolidates them into a single result. We have to do that manually.

If a tweet from the API is a retweet, it has a retweeted_status key that contains the original tweet. Let’s look for that, and build tweet URLs accordingly:

tweet_urls = set()

for status in statuses:
    try:
        tweet = status["retweeted_status"]
    except KeyError:
        tweet = status

    url = "https://twitter.com/%s/status/%s" % (
        tweet["user"]["screen_name"], tweet["id_str"]
    )

    tweet_urls.add(url)

This gives us the URLs for tweets that use or mention the t.co URL we were looking for.

If we want to be stricter, we could check that these tweets include the t.co short URL in their URL entities. (In the Twitter API, an “entity” is metadata or extra context for the tweet – images, videos, URLs, that sort of thing.) We add "include_entities": True to the parameters in our API call, then modify our for loop slightly:

for status in statuses:
    ...

    if not any(u["url"] == TCO_URL for u in tweet["entities"]["urls"]):
        continue

    url = "..."

Putting this all together gives us the following function:

from requests_oauthlib import OAuth1Session


sess = OAuth1Session(
    client_key=TWITTER_CONSUMER_KEY,
    client_secret=TWITTER_CONSUMER_SECRET,
    resource_owner_key=TWITTER_ACCESS_TOKEN,
    resource_owner_secret=TWITTER_ACCESS_TOKEN_SECRET
)


def find_tweets_using_tco(tco_url):
    """
    Given a shortened t.co URL, return a set of URLs for tweets that use this URL.
    """
    # See https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html
    resp = sess.get(
        "https://api.twitter.com/1.1/search/tweets.json",
        params={
            "q": tco_url,
            "count": 100,
            "include_entities": True
        }
    )

    statuses = resp.json()["statuses"]

    tweet_urls = set()

    for status in statuses:
        # A retweet shows up as a new status in the Twitter API, but we're only
        # interested in the original tweet.  If this is a retweet, look through
        # to the original.
        try:
            tweet = status["retweeted_status"]
        except KeyError:
            tweet = status

        # If this tweet shows up in the search results for a reason other than
        # "it has this t.co URL as a short link", it's not interesting.
        if not any(u["url"] == tco_url for u in tweet["entities"]["urls"]):
            continue

        url = "https://twitter.com/%s/status/%s" % (
            tweet["user"]["screen_name"], tweet["id_str"]
        )

        tweet_urls.add(url)

    return tweet_urls

I’ve been using this code to reverse t.co URLs that appear in my web analytics for a while now. It works about as well as the website but I find it quicker to use.

Limitations

Not all t.co URLs come from a tweet.

If you post a link in your profile, that gets shortened as well. But as far as I can tell, there’s no way to go from a shortened profile link back to the original profile page. If you search for the shortened URL, you don’t find anything.

Also, if the original tweet is an account you can’t see (maybe they’re private or they’ve blocked you), their tweet won’t show up in your searches.

Some tips for conferences

My first tech conference was PyCon UK, back in September 2016. Since then, I’ve been to a dozen or so tech conferences – most recently ACCU 2019 – and I’m enjoying them more now than when I started. This post is a list of some of the things I’ve learnt that make conferences more enjoyable.

The short version: when to go to sessions and when to have conversations, pace yourself for socialising, and pack carefully.

Distinguish between “must see” and “nice to see” sessions

When I was first going to conferences, I tried to go to a talk or workshop in every slot. That’s fine, but sessions aren’t the only important thing at a conference – the conversations between sessions are important too! I had several conversations that I cut off to go to a session where, in hindsight, I might have better off skipping the session and continuing the conversation. Most conferences video their sessions, so I could have caught up later.

These days, I split sessions into “must see” and “nice to see”. It helps me decide if I really want to end a conversation and go to a session, or if I’d rather stay and chat.

Know how to end a conversation respectfully

Conversations are important, but sometimes they aren’t going anywhere. That’s okay too!

When I think I’ve hit a dead end, I say something like “It was lovely to chat to you, and now I’m going to talk to some other people”, and offer a handshake. It’s polite, respectful, and nobody has ever been upset when I say that. It leaves a good final impression.

Don’t flub it with a feeble excuse about going to the toilet or fetching a drink, then not coming back. You’re leaving the conversation, so own it.

Follow the Pac-Man rule

The Pac-Man rule is an idea from Eric Holscher, which at its core is this: When standing as a group of people, always leave room for 1 person to join your group.

That physical gap helps people feel like they can join the group. It’s a nice way to help newcomers feel included, and for you to meet new people. For more explanation, I recommend Eric’s original blog post.

When somebody joins the conversation, give them some context

This is a tip I got from Samathy Barratt at ACCU.

When somebody joins your conversation, give them some quick context so they know what you were just talking about. It doesn’t have to be much; just a sentence or two will do. For example, “We’re talking about exception handling in C++.” It implicitly welcomes them to the conversation, and means they can take part more quickly – they don’t have to try to guess the context.

Expect to crash after (or during) the conference

Conferences can be very intense – you’re meeting lots of people, learning new information, having conversations – and that can be tiring.

During a conference, I always put aside time to rest, away from the bustle of the conference. Whether that’s in the quiet room, a nearby green space, or just in the corridor while everyone else is in a session, it helps me recharge and be enjoy the next part of the conference.

After the conference ends, I usually have an emotional crash. I’ve spent a few days meeting people and spending time with friends I don’t usually see, and coming down from that is hard. I always plan a quiet day at home (and some annual leave at work) after a long trip.

Plan to visit the location beforehand, not after

For the last two years I’ve stayed in Cardiff for a few days after PyCon UK ends. I wanted to rest and see a bit of the city, but it was tinged with melancholy. It was weird to wake up, walk through Cardiff, and there was nobody from the conference. All my friends had gone home; it was just me left.

Next time I’ll try to visit before the conference starts, and go home at the same time as everyone else.

Stuff to pack

I have a fairly long checklist of things to pack for away-from-home travel. These are a few items that I find especially useful for conferences:

Getting a transcript of a talk from YouTube

When I give conference talks, my talks are often videoed and shared on YouTube. Along with the video, I like to post the slides afterwards, and include an inline transcript. A written transcript is easier to skim, to search, and for Google to index. Plus, it makes the talk more accessible for people with hearing difficulties. Here’s an example from PyCon UK last year: Assume Worst Intent.

I share a transcript rather than pre-prepared notes because I often ad lib the content of my talks. I might add or remove something at the last minute, make subtle changes based on the mood of the audience, or make a reference to a previous session that wasn’t in my original notes. A transcript is a more accurate reflection of what I said on the day.

Some conferences have live captioning (a human speech-to-text reporter transcribing everything I say, as I write it), which does the hard work for me! That’s great, and those transcripts are very high quality – but not every event does this.

If I have to do it myself, writing a new transcript is a lot of work, and slows down posting the slides. So what I do instead is lean on YouTube to get a first draft of a transcript, and then I tidy it up by hand.

YouTube uses speech-to-text technology to automatically generate captions for any video that doesn’t already have them (in a handful of languages, at least). It’s not fantastically accurate, but it’s close enough to be a useful starting point. I can edit and polish the automatically generated transcript much faster than I could create my own from scratch.

How I do it

I start by using youtube-dl to download the automatically generated captions to a file.

$youtube-dl --write-auto-sub --skip-download "https://www.youtube.com/watch?v=XyGVRlRyT-E"

This saves a .vtt subtitle file in the current directory.

The .vtt file is a format meant for video players – it describes what words should appear on the screen, when. Here’s a little snippet:

00:00:00.030 --> 00:00:03.500 align:start position:0%

again<c.colorE5E5E5><00:00:01.669><c> since</c><00:00:02.669><c> you've</c><00:00:02.790><c> already</c><00:00:02.970><c> heard</c></c><c.colorCCCCCC><00:00:03.300><c> from</c><00:00:03.449><c> me</c></c>

00:00:03.500 --> 00:00:03.510 align:start position:0%
again<c.colorE5E5E5> since you've already heard</c><c.colorCCCCCC> from me
 </c>

It’s a mixture of timestamps, colour information, and the text to display. To turn this into something more usable, I have a Python script that goes through and extracts just the text. It’s a mess of regular expressions, not a proper VTT parser, but it does the trick. You can download the script from GitHub.

This gives me just the content of the captions:

again since you've already heard from me
before I'll skip the introduction and
gets right into the talk we're talking

I save that to a file, then I go through that text to add punctuation and tidy up mistakes. If it’s not clear from the transcript what I was saying, I’ll go back and rewatch the video, but I only need a few seconds at a time.

Observations

The YouTube auto-captioning software is good, but far from perfect. Here are a couple of changes I’m especially used to making:

Overall it’s a lot faster than writing a transcript from scratch, and a lot kinder to my hands. I spend most of my time reading, not typing, and it takes much less time from start to finish.

If you need some captions and you don’t have the time or money for a complete human transcript, the YouTube auto-generated captions are a good place to start.

How I back up my computing devices, 2019 edition

About a fortnight ago, there was lots of news coverage about Myspace losing 12 years of uploaded music. I never had a Myspace account, so I didn’t lose anything on this occasion, but it was a prompt to think about how I back up my computing devices.

A lot of my work and documents only exist on a computer. That includes most of my personal photographs, all my code and prose, and many of the letters I receive (physical copies of which get scanned and shredded). It’s scary to imagine losing any of that data, so I have a number of systems to keep it backed up and secure.

These are the notes I made on my backup system.

Requirements

These are the things I think make a good backup system:

My devices

I have three devices that have important data:

I also have a work laptop, but I let IT manage its backups. It has less data that I personally care about, and corporate IT policies tend to frown upon people making unauthorised copies of company data.

I also have a lot of data tied up in online accounts (Twitter, Dreamwidth, Tumblr, and so on), and I try to keep separate copies of that. How I back up that data is a subject for a separate post.

My setup

Because my iPhone and my laptop are both portable devices, and I take them out of the house regularly, I assume I could lose or break them at any time. (Many years ago, I lost my first two phones in quick succession.) I try not to keep important files on them for long, and instead copy the files to my iMac – where they get backed up in multiple ways.

Here’s what I do to secure my files:

Full-disk encryption

My scanned documents have a lot of personal information, including my bank details, home address, and healthcare records. I don’t want that to be readily available if my phone or laptop get stolen, so I do full-disk encryption on both of them.

On my iMac and MacBook, I’ve turned on FileVault encryption. On my iPhone, I’m using the encryption provided by iOS.

iCloud Photo Stream and iCloud Backups

Any photos I take on my iPhone are automatically uploaded to iCloud Photo Stream, and I have an iCloud backup of the entire phone. My iMac downloads the original, full-resolution file for every photo I store in Photo Stream, so I’m not relying on Apple’s servers. Because the iMac is always running, it usually downloads an extra copy very quickly.

When I’m using a camera with an SD card, I transfer photos off the SD card to my phone at the end of the day, and I upload those to iCloud Photo Stream as well.

I’m paying for a 200GB iCloud storage plan (£2.49/month), which is easily enough for my needs.

File sync with Dropbox and GitHub

When I’m actively working on something, I keep the relevant files on GitHub (if it’s code) or Dropbox (if it’s not). That’s a useful short-term copy of all those files, and keeps them in sync between devices.

Two full disk clones of my iMac, kept at home

I have a pair of Western Digital hard drives plugged into my iMac, and I use SuperDuper to create bootable clones of its internal drive every 24 hours. One backup runs in the early morning before I start work, one in the late evening when I’m in bed.

I space out the clones to reduce the average time since the last backup, and to give me more time to spot if SuperDuper is having issues before it affects both drives.

The drives are permanently mounted; ideally I’d only mount them when SuperDuper is creating a clone.

Both these drives are encrypted with FileVault. They never leave my desk, but it means I don’t have to worry about a burglar getting my personal data.

A full disk clone of my iMac, kept at the office

I have a portable bus-powered Seagate hard drive, and SuperDuper creates a bootable clone of my iMac whenever it’s plugged in. This disk usually lives in a drawer at work, thirty miles from home, so if my home and the local drives are destroyed (say, by fire or flood), I still have an easy-to-hand backup.

Once a fortnight, I bring the drive home, plug it into the iMac, and update the clone.

I encrypt this drive so it’s not a disaster if I lose it somewhere between home and the office.

Both this and the permanently plugged-in drives are labelled with their initial date of purchase. Conventional wisdom is that hard drives are reliable for about 3–4 years; the label gives me an idea of whether it’s time to replace a particular drive.

Remote backups with Backblaze

I run Backblaze to continuously make backups of both my iMac and my MacBook.

This is a last resort. Restoring my entire disk from Backblaze would be slow and expensive, but it means that even if all my physical drives are destroyed, I have an extra copy of my data.

But it’s handy at other times, even if I’m not doing a complete restore – if I’m on my laptop and I realise I need a file that’s only on my iMac, I can restore a single file from Backblaze. It’s a good way to shuffle files around in a pinch.

Keeping it up-to-date

The most recent addition to this setup is the portable iMac clone.

When I was moving house last year, I had my iMac and all my backup drives in the same car. If I’d had an accident, all my backups would disappear at once, and I’d be stuck downloading 600GB of files from Backblaze. The extra drive was a small cost, but should make it much easier to restore if that worst-case scenario ever happens.

Soon I need to replace the other drives plugged into my iMac – they’re both three years old, and approaching the end of their reliable lives. The current pair are both desktop hard drives, with dedicated power supplies. I’ll probably replace them with bus-powered, portable drives, to tidy up my desk.

I don’t have any local backups of my laptop, and I’m not planning to change that. The only files I keep on the laptop are things I’m actively working on, which also go in Dropbox and GitHub.

So that’s my backup system.

It’s not perfect, but I’m happy with it. My last drive failure was three years ago, and I didn’t lose a single file. I don’t lose sleep wondering if a disk is about to fail and lose all my data.

If you already have a backup system in place, use the Myspace disaster as a prompt to review it. Are there gaps? Single points of failure? Could it be improved or made more resilient?

And if you don’t have a backup system, please get one! Data loss is miserable, and your disk is going to fail – it’s only a matter of when, not if.

Creating a GitHub Action to auto-merge pull requests

GitHub Actions is a new service for “workflow automation” – a sort-of scriptable GitHub. When something happens in GitHub (you open an issue, close a pull request, leave a comment, and so on), you can kick off a script to take further action. The scripts run in Docker containers inside GitHub’s infrastructure, so there’s a lot of flexibility in what you can do.

If you’ve not looked at Actions yet, the awesome-actions repo can give you an idea of the sort of things it can do.

I love playing with build systems, so I wanted to try it out – but I had a lot of problems getting started. At the start of March, I tweeted in frustration:

I really like the idea of GitHub Actions, but every time I try to use it I feel like an idiot.

I don’t know if it’s me or the UI, but I cannot understand what it’s doing, and things seem to be happening at random.

If something makes me feel stupid, I’m not going to use it.

A few days later, I got a DM from Angie Rivera, the Product Manager for GitHub Actions. We arranged a three-way call with Phani Rajuyn, one of GitHub’s software engineers, and together we spent an hour talking about Actions. I was able to show them the rough edges I’d been hitting, and they were able to fill in the gaps in my understanding.

After our call, I got an Action working, and I’ve had it running successfully for the last couple of weeks.

In this post, I’ll explain how I wrote an Action to auto-merge my pull requests. When a pull request passes tests, GitHub Actions automatically merges the PR and then deletes the branch:

A screenshot of the GitHub pull request UI, showing the github-actions bot merging and deleting a branch.

If you just want the code, skip to the end or check out the GitHub repo.

The problem

I have lots of “single-author” repos on GitHub, where I’m the only person who ever writes code. The source code for this blog is one example; my junk drawer repo is another.

I have CI set up on some of those repos to run tests and linting (usually with Travis CI or Azure Pipelines). I open pull requests when I’m making big changes, so I get the benefit of the tests – but I’m not waiting for code review or approval from anybody else. What used to happen is that I’d go back later and merge those PRs manually – but I’d rather they were automatically merged if/when they pass tests.

Here’s what I want to happen:

This means my code is merged immediately, and I don’t have lingering pull requests I’ve forgotten to merge.

I’ve experimented with a couple of tools for this (most recently Mergify), but I wasn’t happy with any of them. It felt like GitHub Actions could be a good fit, and give me lots of flexibility in deciding whether a particular pull request should be merged.

Creating a “Hello World” Action

Let’s start by creating a tiny action that just prints “hello world”. Working from the example in the GitHub Actions docs, create three files:

# .github/main.workflow
workflow "on pull request pass, merge the branch" {
  resolves = ["Auto-merge pull requests"]
  on       = "check_run"
}

action "Auto-merge pull requests" {
  uses = "./auto_merge_pull_requests"
}
# auto_merge_pull_requests/Dockerfile
FROM python:3-alpine

MAINTAINER Alex Chan <alex@alexwlchan.net>

LABEL "com.github.actions.name"="Auto-merge pull requests"
LABEL "com.github.actions.description"="Merge the pull request after the checks pass"
LABEL "com.github.actions.icon"="activity"
LABEL "com.github.actions.color"="green"

COPY merge_pr.py /
ENTRYPOINT ["python3", "/merge_pr.py"]
# auto_merge_pull_requests/merge_pr.py
#!/usr/bin/env python
# -*- encoding: utf-8

if __name__ == "__main__":
    print("Hello world!")

The Dockerfile and Python script define a fairly standard Docker image, which prints "Hello world!" when you run it. This is where we’ll be adding the interesting logic. I’m using Python instead of a shell script because I find it easier to write safe, complex programs in Python than in shell.

Then the main.workflow file defines the following series of steps:

I had a lot of difficulty understanding how the check_run event works, and Phani and I spent a lot of time discussing it on our call.

A check run is a third-party CI integration, like Travis or Circle CI. A check run event is fired whenever the state of a check changes. That includes:

That last event is what’s interesting to me – if the tests completed and they’ve passed, I want to take further action.

What confused me is that not all CI integrations use the Checks API – in particular, a lot of my Travis setups were using a legacy integration that doesn’t involve checks. Travis started using the Checks API nine months ago, but I missed the memo, and hadn’t migrated my repos. Until I moved to the Checks integration, it looked as if GitHub was just ignoring my builds.

Adding the logic

We start by loading the event data. When GitHub Actions runs a container, it includes a JSON file with data from the event that triggered it. It passes the path to this file as the GITHUB_EVENT_PATH environment variable. So let’s open and load that file:

import json
import os


if __name__ == "__main__":
    event_path = os.environ["GITHUB_EVENT_PATH"]
    event_data = json.load(open(event_path))

We only want to do something if the check run is completed, otherwise we don’t have enough information to determine if we’re ready to merge. The GitHub developer docs explain what the fields on a check_run event look like, and the “status” field tells us the current state of the check:

import sys


if __name__ == "__main__":
    ...
    check_run = event_data["check_run"]
    name = check_run["name"]

    if check_run["status"] != "completed":
        print(f"*** Check run {name} has not completed")
        sys.exit(78)

Calling sys.exit means we bail out of the script, and don’t do anything else. In a GitHub Action, exit code 78 is a neutral status. It’s a way to say “we didn’t do any work”. This is what it looks like in the UI, compared to a successful run:

Two rows of text, both saying “on pull request pass, merge the branch”, one with a grey square, one with a green tick.

If we know the check has completed, we can look at how it completed. Anything except a success means something has gone wrong, and we shouldn’t merge the PR – it needs manual inspection.

    if check_run["conclusion"] != "success":
        print(f"*** Check run {name} has not succeeded")
        sys.exit(1)

Here I’m dropping an explicit failure. The difference between a failure and a neutral status is that a failure blocks any further steps in the workflow, whereas a neutral result lets them carry on. Here, something has definitely gone wrong – the tests haven’t passed – so we shouldn’t continue to subsequent steps.

If the script is still running, then we know the tests have passed, so let’s put in the conditions for merging the pull request. For me, that means:

The check_run event includes a bit of data about the pull request, including the PR number and the branches. I use this for a bit of logging:

    assert len(check_run["pull_requests"]) == 1
    pull_request = check_run["pull_requests"][0]
    pr_number = pull_request["number"]
    pr_src = pull_request["head"]["ref"]
    pr_dst = pull_request["base"]["ref"]

    print(f"*** Checking pull request #{pr_number}: {pr_src} ~> {pr_dst}")

But for the detailed information like title and pull request author, I need to query the pull requests API. Let’s start by creating an HTTP session for working with the GitHub API:

import requests


def create_session(github_token):
    sess = requests.Session()
    sess.headers = {
        "Accept": "; ".join([
            "application/vnd.github.v3+json",
            "application/vnd.github.antiope-preview+json",
        ]),
        "Authorization": f"token {github_token}",
        "User-Agent": f"GitHub Actions script in {__file__}"
    }

    def raise_for_status(resp, *args, **kwargs):
        try:
            resp.raise_for_status()
        except Exception:
            print(resp.text)
            sys.exit("Error: Invalid repo, token or network issue!")

    sess.hooks["response"].append(raise_for_status)
    return sess

This helper method creates an HTTP session that, on every request:

I have to add pip3 install requests to the Dockerfile so I can use the requests library.

Then I modify the action in my main.workflow to expose an API token to my running code:

action "Auto-merge pull requests" {
  uses    = "./auto_merge_pull_requests"
  secrets = ["GITHUB_TOKEN"]
}

This is one of the convenient parts of GitHub Actions – it creates this API token for us at runtime, and passes it into the container. We don’t need to much around creating and rotating API tokens by hand.

We can read this environment variable to create a session:

    github_token = os.environ["GITHUB_TOKEN"]

    sess = create_session(github_token)

Now let’s read some data from the pull requests API, and run the checks:

    pr_data = sess.get(pull_request["url"]).json()

    pr_title = pr_data["title"]
    print(f"*** Title of PR is {pr_title!r}")
    if pr_title.startswith("[WIP] "):
        print("*** This is a WIP pull request, will not merge")
        sys.exit(78)

    pr_user = pr_data["user"]["login"]
    print(f"*** This PR was opened by {pr_user}")
    if pr_user != "alexwlchan":
        print("*** This pull request was opened by somebody who isn't me")
        sys.exit(78)

If the PR isn’t ready to be merged, I use another neutral status – a failing build and a red X would look more severe than it really is.

If it’s ready and we haven’t bailed out yet, we can merge the pull request!

    print("*** This pull request is ready to be merged.")
    merge_url = pull_request["url"] + "/merge"
    sess.put(merge_url)

Then to keep things tidy, I delete the PR branch when I’m done:

    print("*** Cleaning up pull request branch")
    pr_ref = pr_data["head"]["ref"]
    api_base_url = pr_data["base"]["repo"]["url"]
    ref_url = f"{api_base_url}/git/refs/heads/{pr_ref}"
    sess.delete(ref_url)

This last step partially inspired by Jessie Frazelle’s branch cleanup action, which is one of the first actions I used, and was a useful example when writing this code.

Putting it all together

Here’s the final version of the code:

# .github/main.workflow
workflow "on pull request pass, merge the branch" {
  resolves = ["Auto-merge pull requests"]
  on       = "check_run"
}

action "Auto-merge pull requests" {
  uses    = "./auto_merge_pull_requests"
  secrets = ["GITHUB_TOKEN"]
}
# auto_merge_pull_requests/Dockerfile
FROM python:3-alpine

MAINTAINER Alex Chan <alex@alexwlchan.net>

LABEL "com.github.actions.name"="Auto-merge pull requests"
LABEL "com.github.actions.description"="Merge the pull request after the checks pass"
LABEL "com.github.actions.icon"="activity"
LABEL "com.github.actions.color"="green"

RUN pip3 install requests

COPY merge_pr.py /
ENTRYPOINT ["python3", "/merge_pr.py"]
# auto_merge_pull_requests/merge_pr.py
#!/usr/bin/env python
# -*- encoding: utf-8

import json
import os

import requests


def create_session(github_token):
    sess = requests.Session()
    sess.headers = {
        "Accept": "; ".join([
            "application/vnd.github.v3+json",
            "application/vnd.github.antiope-preview+json",
        ]),
        "Authorization": f"token {github_token}",
        "User-Agent": f"GitHub Actions script in {__file__}"
    }

    def raise_for_status(resp, *args, **kwargs):
        try:
            resp.raise_for_status()
        except Exception:
            print(resp.text)
            sys.exit("Error: Invalid repo, token or network issue!")

    sess.hooks["response"].append(raise_for_status)
    return sess


if __name__ == "__main__":
    event_path = os.environ["GITHUB_EVENT_PATH"]
    event_data = json.load(open(event_path))

    check_run = event_data["check_run"]
    name = check_run["name"]

    if check_run["status"] != "completed":
        print(f"*** Check run {name} has not completed")
        sys.exit(78)

    if check_run["conclusion"] != "success":
        print(f"*** Check run {name} has not succeeded")
        sys.exit(1)

    assert len(check_run["pull_requests"]) == 1
    pull_request = check_run["pull_requests"][0]
    pr_number = pull_request["number"]
    pr_src = pull_request["head"]["ref"]
    pr_dst = pull_request["base"]["ref"]

    print(f"*** Checking pull request #{pr_number}: {pr_src} ~> {pr_dst}")

    github_token = os.environ["GITHUB_TOKEN"]

    sess = create_session(github_token)

    pr_data = sess.get(pull_request["url"]).json()

    pr_title = pr_data["title"]
    print(f"*** Title of PR is {pr_title!r}")
    if pr_title.startswith("[WIP] "):
        print("*** This is a WIP PR, will not merge")
        sys.exit(78)

    pr_user = pr_data["user"]["login"]
    print(f"*** This pull request was opened by {pr_user}")
    if pr_user != "alexwlchan":
        print("*** This pull request was opened by somebody who isn't me")
        sys.exit(78)

    print("*** This pull request is ready to be merged.")
    merge_url = pull_request["url"] + "/merge"
    sess.put(merge_url)

    print("*** Cleaning up pull request branch")
    pr_ref = pr_data["head"]["ref"]
    api_base_url = pr_data["base"]["repo"]["url"]
    ref_url = f"{api_base_url}/git/refs/heads/{pr_ref}"
    sess.delete(ref_url)

I keep this in a separate repo (which doesn’t have auto-merging enabled), so nobody can maliciously modify the workflow rules and get their own code merged. I’m not entirely sure what safety checks are in place to prevent workflows modifying themselves, and having an extra layer of separation makes me feel more comfortable.

Putting it to use

If you want to use this code, you’ll need to modify the code for your own rules. Please don’t give me magic merge rights to your GitHub repos!

With this basic skeleton, there are lots of ways you could extend it. You could post comments on failing pull requests explaining how to diagnose a failure. You could request reviews if you get a pull request from an external contributor, and post a comment thanking them for their work. You could measure how long it took to run the check, to see if it’s slowed down your build times. And so on.

GitHub Actions feels like it could be really flexible and powerful, and I’m glad to have created something useful with it. I’ve had this code running in the repo for this blog for nearly a month, and it’s working fine – saving me a bit of work every time. It’ll even merge the pull request where I’ve written this blog post.

A day out to the Forth Bridge

While clearing out some boxes recently, I found a leaflet from an old holiday. It was a fun day out, and I’d always meant to share the pictures – so here’s the story a day trip from 2016.

I was spending a week in Edinburgh, relaxing and using up some holiday allowance at the end of my last job. My grandparents had suggested I might enjoy seeing the Forth Bridge, because I tend to like railways and railway-related things. I’d heard of the bridge, but I didn’t know that much about it – so while I was nearby, I decided to go take a look.

So on a cold December morning, I caught a train from Edinburgh station, up to a village on the north end of the Forth Bridge. I’d never heard of North Queensferry, but what little Googling I’d done suggested it was the best place to go if I wanted to see the bridge up close.

Here’s a map that shows the train line from Edinburgh to the village:

A map showing the railway line between Edinburgh and North Queensferry.
Map data from OpenStreetMap.

The train takes about 20 minutes, and it crosses the Forth Bridge just at the end of the journey. I wasn’t really aware from the bridge as we went across – not until I got out at the station, wandered into the village, and looked back towards the track.

A silhouette of the bridge and some trees against a grey sky.

The name of North Queensferry hints at its former life. It’s on the north side of the narrowest point of the Firth of Forth, which makes it a natural choice if you want to cross the water by boat.

It’s said that in 1068, Saint Margaret of Scotland (wife of King Malcolm III) created the village to ensure a safe crossing point for pilgrims heading to St. Andrew’s. Whether or not she actually created the village, she was a regular user of the ferry service to travel between Dunfurmline (the then-capital of Scotland) and Edinburgh Castle.

For centuries, there were regular crossings of boats and ferries. You can still see a handful of small boats in the harbour, but I’m sure it used to be a lot busier.

Photo from the water's edge, with two bridges in the background and a couple of boats in the water.

Update, 21 March 2019: It turns out the lighthouse isn’t the only reminder of the ferry service! Chris, an ex-Wellcome colleague and archivist extraordinaire, found some pictures from the Britten-Pears foundation (whose archive and library he runs), including a ticket from the ferry service:

Another lost transport world in @BrittenOfficial’s papers: the ferry over the Forth at Queensferry, which operated until the Forth Road Bridge opened in 1964. Here are 3 passenger tickets plus what I think is a counterfoil from a ticket for a car, from 1950. pic.twitter.com/mpSWZYeEiu

Although most of the boats are gone, one part of the ferry service survives – the lighthouse! This tiny hexagonal tower is the smallest working light tower in the world. It was built in 1817 by Robert Stevenson, a Scottish civil engineer who was famous for building lighthouses. (It was a name I recognised; I loved the story of the Bell Rock Lighthouse when I was younger.)

The light tower sits on the pier, where the ferries used to dock.

A yellowish-stone hexagonal tower, with a domed roof and windows around the top.

Unlike many lighthouses of the time, the keeper didn’t live in the lighthouse itself – but they were still responsible for keeping the flame lit, the oil topped out, and the lighthouse maintained. At night, it would have been an invaluable guide for boats crossing the Firth.

Today, the lighthouse is open to the public. (I think this is where I picked up my leaflet.) You can climb the 24 steps, see the lamp mechanism, and look out over the water. When lit, it gave a fixed white light, with a paraffin-burning lamp – and the large half-dome was the parabolic reflector that turned the candle light into a focused beam.

The back of a copper-coloured, parabolic lens looking out through a lighthouse window.

I wish I’d got a few more photos of the inside of the lighthouse, but it was a pretty small space, and I was struggling to find decent angles. Either way, the lighthouse was an unexpected treat – not something I was expecting at all!

But these days, North Queensferry isn’t known for its ferry service – it’s known for the famous bridge.

In the 1850s, the Edinburgh, Leith and Granton Railway ran a “train ferry” – a boat that carried railway carriages between Granton and Burntisland. There was a desire to build a continuous railway service, and the natural next step was a bridge. After a failed attempt to build a suspension bridge in the 1870s (axed after the Tay Bridge disaster), there was a second attempt in the 1880s. It was opened in March 1890, and it’s still standing today.

Here’s a photo of its original construction, taken from the North Queensferry hills:

A sepia-toned photograph of a partially constructed bridge, with three cantilevers visible above the water.
Image of the construction of the Forth Bridge, from the National Library of Scotland.

The Forth Bridge is a cantilever bridge. Each structure in the photo above is one of the cantilevers – a support structure fixed at one end – and in the finished bridge the load is spread between them. Spreading the load between multiple cantilevers allows you to build longer bridges, and the Forth Bridge uses this to great effect.

One of the advantages of cantilever bridges is that they don’t require any temporary support while they’re being built – once the initial structures are built, you can expand outwards and they’ll take the weight. Here’s another photo from the construction which shows off this idea:

Black-and-white photo from the construction of the Forth Bridge.
Another photo of the bridge under construction, George Washington Wilson.

You can see the shape of the bridge starting to expand out from the initial structure.

The Forth Bridge is famous for a couple of reasons. When it was built, it was the longest cantilever span in the world (not bested for another twenty-nine years, and still the second longest). It was also one of the first major structures in Britain to use steel – 55,000 tonnes in the girders, along with a mixture of granite and other materials in the masonry.

You get a great view of the finished bridge from inside the lighthouse:

Looking from inside the lighthouse window, with a red bridge visible outside and part of a copper-coloured lamp housing inside.

As I wandered around the village, I got lots of other pictures of the rail bridge. These are a few of my favourites:

The lighthouse in the foreground on the left, with the bridge set against a blue sky in the background.
Another photo with the lighthouse in the foreground, and the bridge running parallel to the horizon in the background.
The bridge dominating the background, with the jetty and a few boats in the foreground.
The bridge crossing the frame, set against a blue sky.

The last one was my favourite photo of the entire holiday, and I have a print of it on the wall of my flat. Blue skies galore, which made for a lovely day and some wonderful pictures – even if it’s not what you expect from a Scottish winter!

What’s great about wandering around the village is that you can see the bridge from all sorts of angles, not just from afar – you can get up and personal. You can see the approach viaduct towering over the houses as it approaches the village:

The bridge running across the image, with a few houses visible along the ground.

And you can get even closer, and walk right underneath the bridge itself. Here’s what part of the viaduct holding up the bridge looks like:

A yellowish-coloured stone viaduct, with the red girders of the bridge atop it.

They’re enormous – judging by the stairs, it’s quite a climb up!

A side-on view of one of the pillars and the red bridge, with some stairs going up the side of the pillar.

And you can look up through the girders, and see the thousands of beams that hold the bridge together:

A silhouette of the girders in the bridge, looking up from underneath.

It’s been standing for over a century, so I’m sure it’s quite safe – but it was still a bit disconcerting to hear a rattle as trains passed overhead!

I spent quite a while just wandering around under the bridge, looking up in awe at the structure. As you wander around the village, you never really get away from it – it always stands tall in the background. (Well, except when I popped into a café for some soup and a scone.)

After the rail bridge was built, the ferry crossings continued for many years – in part buoyed by the rise of personal cars, which couldn’t use the railway tracks. But it didn’t last – in 1964, a second bridge was built, the Forth Road Bridge – and it replaced the ferries. The day the bridge opened, the ferry service closed after eight centuries of continuous crossings.

When I visited in 2016, the Road Bridge was still open to cars. At the time, there was a second bridge under construction, but not yet open to the public – the Queensferry Crossing opened half a year later after my visit. The original Road Bridge is a public transport link (buses, cyclists, taxis and pedestrians), and the new bridge carries everything else.

Three bridges in a single day! The other bridges were on the other side of the bay, so I had to walk along the water’s edge to see them. Here’s a photo from midway along, with old and new both visible:

Two bridges spanning across a body of water.

These are both suspension bridges, whereas the rail bridge is a cantileverl.

Like the rail bridge, you can get up and close with the base of the road bridge. Here’s an attempt at an “artsy” shot of the bridge receding into the distance, with sun poking through the base:

The silhouette of a bridge on the right, with green grass and water visible.

And another “artsy” shot with more lens flare, and both the road bridges in the shot. I love the detail of the underside on the nearer bridge.

Looking at the underside of two bridges in silhouette, with lens flare in the middle of the photo.

Here’s the start of the bridge on the north side, starting to rise up over the houses:

A series of arched bridge supports rising up above some houses.

And one more close-up shot of one of the supports:

A single concrete support, with the ridged underside of the bridge visible.

Eventually it started getting dark, so I decided to head home. I considered walking back through North Queensferry to the station, but I decided to have a crack at crossing the road bridge instead, and catching the train from the other side. You can walk across it, although it’s nearly 2.5k long!

Looking onto a bridge, with a path directly ahead, fences and roadworks to the right, and water below on the left.

As you climb up to the bridge, I got some wonderful views back over the village, and in particular towards the rail bridge I’d originally come to see:

A village of houses in shadow in the foreground, with the bridge clearly visible in the background.

I didn’t take many photos from the bridge itself, although it’s a stunning view! It was extremely cold and windy, and I didn’t want to risk losing my camera while trying to take a photo. Here’s one of the few photos I did take, which I rather like. I took it near the midpoint, with the rail bridge set against a cloudy sky. (I’d forgotten about it until I came to write this post!)

Looking straight on to the side of the rail bridge, with blue sky and grey clouds behind it.

Safely across the bridge, I weaved my way through South Queensferry, found the station, and caught a train back to Edinburgh.

I didn’t plan this trip when I decided to visit Scotland, but over two years later, I still have fond memories of the day out. If you’re ever nearby and you like looking at impressive structures, it’s worth a trip. I’m glad I have pictures, but it’s hard to capture the sheer size and scale of a bridge this large – so if you have a chance, do visit in person.

Finding the latest screenshot in macOS Mojave

One of the things that changed in macOS Mojave was the format of screenshot filenames. On older versions of macOS, the filename would be something like:

Screen Shot 2016-10-10 at 18.34.18.png

On Mojave, the first two words got collapsed into one:

Screenshot 2019-03-08 at 18.38.41.png

I have a handful of scripts for doing something with screenshots – and in particular, a shortcut that grabs the newest screenshot. When I started updating to Mojave, I had to update the shell snippet that powers that shortcut. Because I couldn’t update to Mojave on every machine immediately, it had to work with both naming schemes.

This is what I’ve been using for the last few months (bound to last_screenshot in my shell config):

find ~/Desktop -name 'Screen Shot*' -print0 -o -name 'Screenshot*' -print0
  | xargs -0 stat -f '%m %N'
  | sort --numeric-sort --reverse
  | head -1
  | cut -f "2-" -d " "

Let’s break it down:

It’s certainly possible to do this with a higher-level language like Python or Ruby, but I like the elegance of chaining together tiny utilities like this. For non-critical code, I enjoy the brevity.