Getting every message in an SQS queue

At work, we make heavy use of Amazon SQS message queues. We have a series of small applications which communicate via SQS. Each application reads a message from a queue, does a bit of processing, then pushes it to the next queue. This is a classic microservices pattern.

Three applications, communicating via two message queues.

Sometimes an application fails to process a message correctly, in which case SQS can send the message to a separate dead-letter queue (DLQ). (Our Terraform module for SQS queues automatically creates and configures a DLQ for all our queues.) Sending faulty messages to a DLQ allows you to see them all in one go, rather than trying to spot the failures in your logs.

Unfortunately, the AWS Console doesn’t make it very easy to go through the contents of a queue. You can see one message at a time, but this makes it hard to spot patterns or debug a large number of failures. It would be easier to have the entire queue in a local file, so we can analyse it or process every message at once. I’ve written a Python function to do just that, and in this post, I’ll walk through how it works.

Read more →

Listing keys in an S3 bucket with Python, redux

A few months ago, I wrote about some code for listing keys in an S3 bucket. I’ve been running variants of that code in production since then, and found a pair of mistakes in the original version.


Since that post has been fairly popular, I thought it was worth writing a short update. In this post, I’ll walk through the changes I’ve made in the newer versions of the code.

Read more →

IP and DNS addresses for documentation

If you’re writing documentation that includes IP addresses, you may want to check out RFC 5737 and RFC 3849, which specify IPv4 and IPv6 addresses for use in documentation.

These addresses are “reserved”, meaning they should never be used for anything else – not on the public Internet, nor within internal networks. That means you can use them in examples, and they should never conflict or be confused with real systems.

Here’s RFC 5737 for IPv4:

The blocks (TEST-NET-1), (TEST-NET-2), and (TEST-NET-3) are provided for use in documentation.

and RFC 3849 for IPv6:

The prefix allocated for documentation purposes is 2001:DB8::/32.

In a similar vein, RFC 2606 provides a number of TLDs and domain names for use in documentation – for example, .test and Again, the idea is that these are reserved for documentation, and will never start resolving to a real system at an unknown point in the future.

Of course, you can use any IP address or DNS name in your docs, but if the exact values are unimportant, you may want to consider using these reserved blocks. They’re good placeholder values, because they can’t be mixed up with anything else.

These RFCs have come up several times in the Write The Docs Slack, which is why I decided to create a more permanent signpost. If you care about technical writing, you may want to join the Slack, where this sort of thing is often discussed – sign up through the WTD website.

Your repo should be easy to build, and how

Whenever I look at a new repository, I have a simple smell test: how long does it take me to clone, build and get the code running?

Here, I’m usually counting the steps I have to do, the commands I have to run. The clock time is less important (although fast builds are still nice!). Ideally, there’s a single command which takes me from a fresh checkout to a complete build — and without me having to fiddle with too many dependencies first.

Once I have a working build, I can start fiddling with code and find my own way around. Getting that first build is key.

Making it easy to do a clean build has many benefits.

The obvious one is time saved — I run one command, then I can walk away while the computer does all the slow bits. Downloading dependencies, compiling code, setting up the local environment, that sort of thing. It might take a while to finish, but I don’t need to supervise it while that happens. I can spend that time doing something more useful.

It’s also more reliable. Remembering “make build” is easy. Remembering eight calls to different shell scripts, and their associated arguments, is much harder. If the build is simple, there’s less to get wrong, and it’s more likely I’ll get it right first time. Automating the build process makes it faster and more reliable.

And finally, first impressions count! Being able to start working quickly is a pleasant experience. If writing and testing my first patch is easy, I’m more likely to do it a second time. And a third. And so on. This is particularly important in open source repos where patches often come from people giving up their time for free.

In the last year, I’ve spent a lot of time simplifying my build processes, both in my work and my personal repos. Most of my current repos now have a single-step build. It’s not perfect, but I’m very pleased with the results.

In this post, I’ll explain my typical setup, and how I use Make and Docker to get fast and reliable builds.

Read more →

Armed police officers don’t make me feel safer

Content warning: discussion of guns, police violence, and images of armed police.

I live in the UK. We have fairly strict laws around owning guns, and you’re unlikely to encounter a gun except on an army base or in a work of fiction.

As a result, I’m fairly sceptical of guns, and I find the sight of them unnerving. I was on holiday in Berlin recently, where most of the police officers on the street were carrying handguns, and I was jumpy whenever I passed them. The same applies when I’m in airports. I feel like this is a healthy reaction.

On occasion, I do encounter armed police officers in the UK, and it’s never pleasant.

I have no idea whether armed police actually make me safer — I’m not a crime expert, and I don’t know what gun crime statistics are in the UK. But seeing a gun is so unusual, that when I see armed police I feel less safe. Being shot at is not something I worry about in day-to-day life, except when I see an armed police officer. (Although I do know that Police Federation surveys continually show resistance to routine arming by police officers, who I’d expect to know.)

And I’m a cis white man, a group that isn’t usually profiled or targeted by police. Other people probably find it much scarier.

A month or so ago, I was waiting for a train at King’s Cross, when two officers carrying large semiautomatic guns suddenly appeared behind me. They walked straight past me, but I was briefly terrified. Another officer came up to me to explain that this was an “awareness campaign” to “make me feel safer”, even though it had the opposite effect. I politely explained this to the officer, who didn’t want to see my point of view.

On that occasion, I got on my train to Cambridge, and left the guns behind. Unfortunately, it’s followed me home.

Read more →

Pruning old Git branches

Here’s a quick tip for Git users: if you want to delete every local branch that’s already been merged into master, you can run this command:

$ git branch --merged master | egrep -v "(^\*|master|dev)" | xargs git branch --delete

A quick breakdown:

I originally got the command from a Stack Overflow answer, although I tweaked it when I read the documentation, to more closely match my use case.

If you want to see what branches this will delete without committing to it, run everything before the second pipe — not the xargs bit at the end.

The other command I often use is this one:

$ git fetch origin --prune

If a branch has been deleted in the origin remote, and you had a local branch which was tracking it, the local branch gets deleted as well.

For example: suppose you had a branch called new-feature. You push the branch to GitHub, open a pull request, and later the branch gets merged and deleted through the GitHub web interface. When you do your next fetch with --prune, it’ll clean up the local branch new-feature.

Git branches are very cheap — usually a single file that references a commit hash — so deleting branches won’t save disk space or improve performance. I like to keep my repos neat and tidy, and not have a long branch list to scroll through, which is why I do this. If a long branch list doesn’t bother you, then you can ignore these commands.

Downloading logs from Amazon CloudWatch

At work, we use Amazon CloudWatch for logging in our applications. All our logs are sent to CloudWatch, and you can browse them in the AWS Console. The web console is fine for one-off use, but if I want to do in-depth analysis of the log, nothing beats a massive log file. I’m very used to tools like grep, awk and tr, and I’m more productive using those than trying to wrangle a web interface.

So I set out to write a Python script to download all of my CloudWatch logs into a single file. The AWS SDKs give you access to CloudWatch logs, so this seems like it should be possible. There are other tools for doing this (for example, I found awslogs after I was done) — but sometimes it can be instructive to reinvent something from scratch.

In this post, I’ll explain how I wrote this script, starting from nothing and showing how I build it up. It’s also a nice chance to illustrate several libraries I use a lot (boto3, docopt and maya). If you just want the code, skip to the end of the post.

Read more →

My favourite WITCH story

Today, the National Museum of Computing (TNMoC) is celebrating the five-year anniversary of their reboot of the Harwell-Dekatron computer, also known as WITCH.

A photo of the Harwell-Dekatron under reconstruction at TNMoC in 2010. Taken from Wikimedia Commons.

The Harwell-Dekatron was originally built in Harwell in the 1950s, as part of the British nuclear program. It passed through a number of hands, before finally being decommissioned in 1973. Then it went into storage, until it was recovered by TNMoC in 2009. It moved to the museum, was restored by volunteers, rebooted in 2012, and it continues to run there today. The original news story about the reboot has more detail about the machine’s history, and how it ended up at the museum.

This computer isn’t just a static exhibit, but a working display. If you visit the museum, you’ll often see (and hear!) it running. The WITCH is powered by over 828 Dekatron tubes — a mechanical part that can hold a number from 1 to 10. It looks like a small tube, with an orange light that rotates as it cycles from 1 to 10, so you can see exactly what value it’s holding, and literally “read” the computer’s inner workings. Dekatrons also make a distinctive clackety clackety noise, and together with the visuals, the running machine is quite an experience.

A bank of dekatrons on the witch, taken by Alan Levine. From Flickr. The labels on the top row indicate the current value stored on each dekatron. Here we can read “+0998”.

The WITCH wasn’t a fast machine, even by 1950s standards. Rather than doing quick calculations, it was designed to work slowly, but run very reliably for long periods of time. Jack Howlett, Director of the Computer Laboratory at Harwell, once wrote in a report:

It took little power and could be left unattended for long periods; I think the record was over one Christmas-New Year holiday when it was all by itself, with miles of input data on punched tape to keep it happy, for at least ten days and was still ticking away when we came back.

I was once told a fun story about this Christmas run. The operators wanted to check the machine kept running, but without someone having to be in the room. So they left the phone off the hook, hanging next to the WITCH, and they’d dial in to check how it was doing. If they heard the characteristic clackety-clack, they’d know the machine was still running, and they’d rest easy. Silence, they’d know it had stopped.

I can’t remember where I first heard this story, and I have nothing to back it up. But I find the idea delightful — a machine left to run over Christmas, tracked by an analogue phone and a mechanical clack. Such an ingenious way to do remote monitoring.

Happy birthday, WITCH!

Don’t tap the mic, and other tips for speakers

When I was in college, I did a bit of work in the college theatre as a backstage technician. Among other things, this meant dealing with sound systems, where I was taught an important rule: don’t tap on the microphone. It’s a common cliche, but rarely a good idea.

Tapping creates a sudden, loud noise in the microphone, which can cause damage to the microphone and/or the speaker that plays it back.1 If you want to do a sound check, speak or sing as you’ll be using the mic live. It’s a more realistic test, gives you an opportunity to hear what you’ll really sound like, and is more pleasant for anybody listening.

I was reminded of this tonight when reading the speaker guidelines for Nine Worlds, which gives an entirely different reason not to tap the mic:

Please don’t tap the microphone, as the amplified sudden noise can cause pain to D/deaf2 people present since it will be transmitted directly into their ears.

(In the same vein, you should always use a microphone if one is provided, even if you think you don’t need it. It makes a big difference for anybody with a hearing aid, and for the quality of sound on the recording.)

If you speak at or run events, their guidelines haves lots of good advice. As well as how not to abuse your sound equipment, there are suggestions for things like handling your tech and A/V (multiple layers of backup, arrive well in advance); referring to audience members in a gender-neutral way; and providing appropriate content warnings on your talks. I recommend giving them a read.

  1. It’s only some types of mic/speaker that are susceptible to this damage, but I can never remember the difference, and equipment is expensive enough that I don’t want to risk it. ↩︎

  2. Something else I learnt tonight: there are “small d” and “big D” identities in deaf culture. Based on a quick search, it’s a distinction between the hearing loss, and being in the Deaf community — but deaf people have written about it more detail, and can explain it better than I can. ↩︎

A plumber’s guide to Git

Git is a very common tool in modern development workflows. It’s incredibly powerful, and I use it all the time — I can’t remember the last time I used a version control tool that wasn’t Git — but it’s a bit of a black box. How does it actually work?

For a long time, I’ve only had a vague understand of the Git’s inner workings. I think it’s important to understand my tools, because it makes me more confident and effective, so I wanted to learn how Git works under the hood. To that end, I gave a workshop at PyCon UK 2017 about Git internals. Writing the workshop forced me to really understand what was going on.

The session wasn’t videoed, but I do have my notes and exercises. There were four sections, each focusing on a different Git concept. It was a fairly standard format: I did a bit of live demo to show the new ideas, then people would work through the exercises on their own laptop. I wandered around the room, helping people who were stuck, or answering questions, then we’d come together to discuss the exercise. Repeat. On the day, we took about 2 ½ hours to cover all the material.

If you’re trying to follow along at home, the Git book has a great section on the low-level commands of Git. I made heavy reference to this when I wrote the notes and exercises.

If you’re interested, you can download the notes and exercises.

(There are a few amendments and corrections compared to the workshop, because we discovered several mistakes as we worked through it!)

Read more →