alexwlchan - page 2

A visit to the Crossness pumping station

In the early nineteenth century, the River Thames was heavily polluted. It was treated as an open sewer, with human excrement and industrial waste dumped directly into the river and left to rot. The uncleaned river led to multiple outbreaks of cholera, and made central London thoroughly unpleasant. In summer 1858, the hot weather made the smell so bad that it was dubbed “the Great Stink”. At this time, many people believed that bad smells (called miasma) were responsible for the spread of disease, so the state of the river was seen as a public health hazard.

After 1858, Parliament decided to commission a new, modern sewerage system that would carry the smell away from the centre of the city. The Metropolitan Board of Works – led by engineer Joseph Bazalgette – were tasked with building the new sewers. I first came across the story in a BBC docudrama series, which has quite a nice overview.

The design of this new system was rather elegant: a series of six main tunnels (three either side of the river) would carry the sewage east, away from the city. Smaller sewers would carry sewage from individual properties into the main tunnels. The whole system was built on a gradient, so everything is carried entirely by gravity. When it’s sufficiently far east, the sewage is pumped back up to ground level, dumped in the Thames and washed out to sea.

A map of London’s sewers, drawn in 1882. The main interceptor tunnels are highlighted in red. Image from Wikipedia.

The endpoint of the southern tunnel was at Crossness. There was a pumping station with four steam-driven pumps that pulled the waste up to ground level, and dumped it into the river on the outgoing tide. Both Crossness and the wider sewerage system were seen as major feats of Victorian engineering, and the opening of Crossness itself was a particularly prestiguous event.

An invitation to the opening of Crossness in 1865. Image from the Science Museum, Wellcome Images.

Today we’re (slightly) more enlightened, and don’t just dump raw sewage into the sea. Instead, sewage is sent to treatment plants for processing, and disposed of elsewhere – which led to these old pumping stations being decommissioned. By the end of 1950s, these stations were all but abandoned.

Since then, the other southern pumping station (Deptford) has essentially vanished, and the northern station (Abbey Mills) is a shell of its former self. But Crossness survived fairly well: the large chimney in the invitation above was demolished, but otherwise the site was left in reasonable shape. In 1985, the Crossness Engines Trust was established to preserve the site, and restore the engines to a working state. Today, the pumping station is open to the public.

Last weekend, Crossness were running an open day - the pumping station was open to the public, and they were running the restored engine. Given my interest, I decided to head down, have a look round, and take a few photos.

Read more →

Accessibility at AlterConf

On Saturday, I was at AlterConf London, a conference about diversity in the tech and gaming industries. If you follow me on Twitter, you’ll have seen that I was tweeting pretty effusively about it throughout the day. It was one of the friendliest, nicest conferences I’ve ever been to, with a cracking set of speakers to boot.

I was really impressed by how much the AlterConf organisers had done to make the conference accessible and inclusive. Most tech conferences are dominated by cis, white men – this was very different. Both the speaker lineup and the audience were remarkably diverse.

In this post, I want to talk about a few of the things that really stood out to me, which helped to make the conference feel more inclusive. Many of these are ideas that could be replicated elsewhere, and I’d love to see them spread. I’ll write about the talks in a separate post.

A disclaimer: I’m a cis white male, so I don’t tend to have problems at other tech conferences. Take my praise with a pinch of salt, because I’m not really the person this is aimed at helping.

Read more →

A few examples of extensions in Python-Markdown

I write a lot of content in Markdown (including all the posts on this site), and I use Python-Markdown to render it as HTML. One of Python-Markdown’s features is an Extensions API. The package provides some extensions for common tasks – abbreviations, footnotes, tables and so on – but you can also write your own extensions if you need something more specialised.

After years of just using the builtin extensions, I’ve finally started to dip my toe into custom extensions. In this post, I’m going to show you a few of my tweaked or custom extensions.

Read more →

A script for backing up your Instapaper bookmarks

About three days ago, there was an extended outage at Instapaper. Luckily, it seems like there wasn’t any permanent data loss – everybody’s bookmarks are still safe – but this sort of incident can make you worry.

I have a Python script that backs up my Instapaper bookmarks on a regular basis, so I was never worried about data loss. At worst, I’d have lost an hour or so of changes – fairly minor, in the grand scheme of things. I’ve been meaning to tidy it up and share it for a while, and this outage prompted me to get on and finish that. You can find the script and the installation instructions on GitHub.

A script for backing up your Goodreads reviews

Last year, I started using Goodreads to track my reading. (I’m alexwlchan if you want to follow me.) In the past, I’ve had a couple of hand-rolled systems for recording my books, but maintaining them often became a distraction from actually reading!

Using Goodreads is quite a bit simpler, but it means my book data is stored on somebody else’s servers. What if Goodreads goes away? I don’t want to lose that data, particularly because I’m trying to be better about writing some notes after I finish a book.

There is an export function on Goodreads, but it has to be invoked by hand. I prefer to have backup tools that can be run automatically: I can set them to run on a schedule, and I know my data is safe. This tends to be a script or a cron job.

That’s exactly what I’ve done for Goodreads: I’ve written a Python script that uses the Goodreads API to grab the same information as provided by the builtin export. I have this configured to run once a day, and now I have daily backups of my Goodreads data. You can find the script and installation instructions on GitHub.

This was a fun opportunity to play with the ElementTree module (normally I work with JSON), and also a reminder that the lack of yield from has become my most disliked feature in Python 2.

A Python interface to AO3

In my last post, I talked about some work I’d been doing to scrape data from AO3 using Python. I haven’t made any more progress, but I’ve tidied up what I had and posted it to GitHub.

Currently this gives you a way to get metadata about works (word count, title, author, that sort of thing), along with your complete reading history. This latter is particularly interesting because it allows you to get a complete list of works where you’ve left kudos.

Instructions are in the README, and you can install it from PyPI (pip install ao3).

I’m not actively working on this (I have what I need for now), but this code might be useful for somebody else. Enjoy!

Experiments with AO3 and Python

Recently, I’ve been writing some scripts that need to get data from AO31. Unfortunately, AO3 doesn’t have an API (although it’s apparently on the roadmap), so you have to do everything by scraping pages and parsing HTML. A bit yucky, but it can be made to work.

You can get to a lot of pages without having an AO3 account – which includes most of the fic. If you want to get data from those pages, you can use any HTTP client to download the HTML, then parse or munge it as much as you like. For example, in Python:

import requests

req = requests.get('')
print(req.text)  # Prints the page's HTML

I have a script that takes this HTML, and which can extract metadata like word count and pairings. (I use that to auto-tag my bookmarks on Pinboard, because I’m lazy that way.)

But there are some pages that require you to be logged in to an account. For example, AO3 can track your reading history across the site. If you try to access a private page with the approach above, you’ll just get an error message:

Sorry, you don’t have permission to access the page you were trying to reach. Please log in.

Wouldn’t it be nice if you could access those pages in a script as well?

I’ve struggled with this for a while, and I had some hacky workarounds, but nothing very good. Tonight, I found quite a neat solution that seems much more reliable.

For this to work, you need an HTTP client that doesn’t just do one-shot requests. You really want to make two requests: one to log you in, another for the page you actually want. You need to persist some login state from the first request to the second, so that AO3 remembers us on the second request. Normally, this state is managed by your browser: in Python, we can do the same thing with sessions.

After a bit of poking at the AO3 login form, I’ve got the following code that seems to work:

import requests

sess = requests.Session()

# Log in to AO3'', params={
    'user_session[login]': USERNAME,
    'user_session[password]': PASSWORD,

# Fetch my private reading history
req = sess.get('' % USERNAME)

Where previously this would return an error page, now I get my reading history. There’s more work to parse this into usable data, but we’re past my previous stumbling block.

I think this is a useful milestone, and could form the basis for a Python-based AO3 API. I’ve thought about writing such a library in the past, but it’s a bit limited if you can’t log in. With that restriction lifted, there’s a lot more you can potentially do.

I have a few ideas about what to do next, but I don’t have much free time coming up. I’m not promising anything – but you might want to watch this space.

  1. Non-fannish types: AO3 is the Archive of Our Own, a popular website for sharing fanfiction. 

A tool for backing up your message history from Slack

I’ve just pushed a small tool to PyPI for backing up message history from Slack. It downloads your message history as a collection of JSON files, including public/private channels and DM threads.

This is mainly scratching my own itch: I don’t like having my data tied up in somebody’s proprietary system. Luckily, Slack provides an API that lets you get this data out into a plaintext form. This allows me to correct what I see as two deficiencies in the data exports provided by Slack:

Installation is pip install slack_history, then run slack_history --help for usage instructions.


Another example of why strings are terrible

Here’s a programming assumption I used to make, that until today I’d never really thought about: changing the case of a string won’t change its length.

Now, thanks to Hypothesis, I know better:

>>> x = u'İ'
>>> len(x)
>>> len(x.lower())

I’m not going to pretend I understand enough about Unicode or Python’s string handling to say what’s going on here.

I discovered this while testing a moderately fiddly normalisation routine – this routine would normalise the string to lowercase, unexpectedly tripping a check that it was the right length. If you’d like to see this for yourself, here’s a minimal example:

from hypothesis import given, strategies as st

def test_changing_case_preserves_length(xs):
    assert len(xs) == len(xs.lower())

Update, 2 December 2016: David MacIver asked whether this affects Python 2, 3, or both, which I forgot to mention. The behaviour is different: Python 2 lowercases İ to an ASCII i, whereas Python 3 adds a double dot: .

This means that only Python 3 has the bug where the length changes under case folding (whereas Python 2 commits a different sin of throwing away information).

Cory Benfield pointed out that the Unicode standard has explicit character mappings that add or remove characters when changing case, and highlights a nice example in the other direction: when you uppercase the German esszett (ß), you replace it with a double-S.

Finally, Rob Wells wrote a follow-on post that explains this problem in more detail. He also points out the potential confusion of len(): should it count visible characters, or Unicode code points? The Swift String API does a rather good job here: if you haven’t used it, check out Apple’s introductory blog post.

Some low-tech ways to get more ideas

So on Tuesday, I saw this tweet from David MacIver:

What are some good low tech devices for intelligence expansion? Current list I have:* Pencil and paper * Cards/dice * A second person

“Intelligence expansion” is a phrase that here means “anything you can use to understand things or solve problems that would be hard with an unaided brain”.

I wanted to reply, but I have more to say than will fit in 140 characters. So instead, in blog post form, here are my low-tech suggestions for tackling tricky topics. This isn’t everything you could try, but these are the techniques I’ve found work best for me.

Read more →