Assume worst intent (designing against the abusive ex)

This is a talk I gave last Sunday at PyCon UK 2018.

Gail Ollis invited me to give a talk about online harassment in April, based on my talk about privilege and inclusion at the previous year’s conference. We were chatting afterwards, and I realised with a bit of tidying, I could reuse it. This is a refined and shortened (and hopefully better!) version of the April talk.

Here’s the abstract:

Apps and services often build features with good intent, trying to improve interactivity or connections between our users. But what if one of your users has a stalker, or an abusive ex? You may have given them another way to hurt or harass your user.

This session will help you identify common threat models – who is at most at-risk, and who is a threat to your most vulnerable users. Then we’ll look at some good practices that improve the safety of your users, and how to design with these risks in mind. There’s no silver bullet that totally eliminates risk, but you can make design decisions that give more control and safety to your users.

The talk was recorded, and you can watch it on YouTube:

You can read the slides and transcript on this page, or download the slides as a PDF. The transcript is based on the captions on the YouTube video, with some light tweaking and editorial notes where required.

Read more →

Building trust in an age of suspicious minds

This is a talk I gave on Saturday as the opening keynote of PyCon UK 2018.

I was already planning to do a talk about online harassment, and Daniele, the conference director, encouraged me to expand the idea, and turn it into a talk about positive behaviour. This is my abstract:

In 2018, trust is at a low. Over the past several decades, trust has declined – fewer and fewer people trust each other, and the reputation of big institutions (government, media, politicians, even non-profits) is in tatters.

What happened?

In this talk, I’ll explain how the design of certain systems have been exploited for abuse, and this has corroded our trust – and conversely, how we can build environments that encourage and sustain good behaviour.

The talk was recorded, and thanks to the wizardry of Tim Williams, it was on YouTube within a day:

You can read the slides and transcript on this page, or download the slides as a PDF. The transcript is based on the captions on the YouTube video, with some light editing and editorial notes where required.

Read more →

Signs of the time

One of the changes at this year’s PyCon UK was that I printed a bunch of signs for the venue. I talked about this in my post on inclusive conferences, but I’d never actually done it!

Overall the signs have been very well received – people have said some very nice things, they’ve been helpful, and the production mostly went off without a hitch. We got a lot of stuff right off the bat:

But this was a first attempt, and inevitably I didn’t get it all right – for the benefit of my future self, and anybody else making venue signage, here’s a few of the things I’ve learnt this year.

#PyConUK “Help me, I’m trapped in a sign printing factory!” pic.twitter.com/KlJnGDHjjz

Read more →

PyCon UK 2018

I’m sitting in Gorsedd Gardens, a stone’s throw from Cardiff City Hall. In my hand I have a fresh cup of Brodies coffee. In front of me is a bed of pink flowers, and an over-excitable squirrel. There’s a water fountain and a seagull behind me. Everything is quiet.

For now.

I’m back in Cardiff for PyCon UK 2018. This is my third year helping to organise the conference. Tomorrow is the set-up and A/V installation, and the conference itself starts on Saturday – the most hectic weekend of my year. It’s also one of the most fun – I’ve only been in Cardiff for an hour, and I’m already grinning at the thought of the long days ahead. The UK Python community is a lovely bunch of people, and it’s great to have so many of them in one place.

I’m doing three sessions this year:

I’ll update that list with links to slides and notes soon after the conference.

If you’re around, please do come and say hi, even if I look busy or shy! I’m never too busy for a chat, especially if you’re a first timer. And I’m sure all the other organisers would agree – it makes all the difference when somebody comes up to us and makes feel like our work was worthwhile.

Whether or not we run into each other, I hope you have a lovely conference.

~ Alex x

A basic error logger for Python Lambdas

At work, we use AWS Lambda functions for a bunch of “glue” pieces between different services. Sometimes, a quick Python script in a Lambda function is the most convenient way to run something.

This post has a snippet that I wrote to make it easier to debug errors.

As a quick reminder, this is the basic structure of a Lambda:

import json


def lambda_handler(event, context):
    # TODO implement
    return {
        "statusCode": 200,
        "body": json.dumps('Hello from Lambda!')
    }

The body of your code goes in the handler function, which is passed an event and a context. The event is the trigger for the function (for example, if you have a Lambda triggered by an SNS message, it would contain the SNS body). The context object contains some runtime information about the Lambda.

When I was first writing Lambdas, I’d log the event so I could see what the trigger was:

def lambda_handler(event, context):
    print('event = %r' % event)
    ...

This is fine when experimenting, but when the Lambda was run at scale, it got expensive – some of our events are quite large, and logging the entire event racks up quite a CloudWatch bill. And in 99% of cases, the log is unnecessary – I only ever looked at the log when I was developing something new, or if there’d been an error. If the Lambda completed successfully, I’d never read the log.

So I took it out, and the bill went back down – but now I can’t see what the trigger was if the Lambda has an error. This is annoying when I’m trying to debug an error.

So I wrote this snippet, which logs the event if and only if the Lambda throws an exception:

import functools
import sys


def log_event_on_error(handler):
    @functools.wraps(handler)
    def wrapper(event, context):
        try:
            return handler(event, context)
        except Exception:
            print('event = %r' % event)
            raise

    return wrapper

This is a decorator that runs the original handler, and if the handler throws an exception, it prints the trigger event and re-raises the exception. You could also log the context here; I don’t because I’ve yet to find anything useful in the context I’d want when debugging.

To use the decorator, I add @log_event_on_error to the handler function. Like so:

import json


@log_event_on_error
def lambda_handler(event, context):
    # TODO implement
    return {
        "statusCode": 200,
        "body": json.dumps('Hello from Lambda!')
    }

This gives me the best of both worlds: no logging if the handler completes successfully (and no logging costs!), and it logs the event if something unexpected goes wrong.

This snippet has been running in prod for several months, and it’s been a useful addition to my collection.

Making the venue maps for PyCon UK

We’ve just published the venue information for this year’s PyCon UK. The page includes maps of Cardiff City Hall (where the conference is hosted), which were a suggestion from Kirk, and a lot of fun to make.

This is what the maps look like:

Internal maps of Cardiff City Hall. Left: ground floor. Right: first floor. The yellow and blue are the same colours as used on the PyCon UK website.

A good internal map helps people find their way around an unfamiliar venue, and can save organisers a lot of questions! Chloe runs the registration desk, and I think she spent half of last year’s conference giving out directions. A map can also translate between venue room names and conference terminology, which don’t always match.

Since a few people were asking, this is how I made the maps:

  1. The Cardiff City Hall website has floor plan PDFs showing the location of each room. For example, here’s their floorplan showing the Marble Hall:

    I saved a floorplan for both floors, and used Acorn to remove the text at the top and the background image of City Hall. This gave me a pair of blank floor plans.

    Here’s the blank map I had for the first floor:

  2. I coloured in all the rooms we’re using, and added text labels. This is what I labelled:

    • Every room being used for sessions
    • The toilets, and which ones are accessible
    • The quiet room
    • The creche
    • The lift

    I used the conference blue and yellow, rather than the City Hall shade of red – a bit of branding makes it clearer these are PyCon-specific.

    Note that they’re labelled according to the PyCon terminology, not the City Hall room names. We use Room G for the creche, but most people don’t know that, so it’s labelled “creche”. Similarly, the quiet room is really the council chamber, but not everybody would know to look for “council chamber” on a map.

    In a few cases, I tweaked the layout to make the label fit, or make something easier to find. For example, the lift is larger than it appears on the plans.

    (I considered labelling the toilets as male/female, but we have non-binary and trans attendees, and our bathroom policy is “use whichever toilet you feel most comfortable with”. We’re stuck with the gendered bathrooms because of City Hall, but I figured I could avoid perpetuating that on the maps.)

  3. I added the help desk, which doesn’t appear on any City Hall floorplans, but is a key part of the conference!

  4. I lightly shaded the stairs, hallways, and bits of corridor I expect people to use or get to the rooms. I’m trying to say “you shouldn’t need to go outside these areas”.

  5. These floorplans are very accurate depictions of the building, showing all the internal rooms and doorways. Outside the main conference area, this detail is just noise that clutters the map, so I erased a bunch of internal walls to clean up the map.

    If you compare the PDF for the Marble Hall and my first floor map, you might see some of the walls I removed.

For now, the maps are just on the conference website. I’m hoping to get some printouts done, and put them around the venue with little red “you are here” dots – but I don’t know if we’re allowed to put up posters yet. Even so, this was a nice little project to work on, and I think it’s an improvement on what we had last year.

Implementing parallel scan in DynamoDB with Scanamo

At work, we use DynamoDB for storing large collections of records – these get processed by the catalogue pipeline that feeds our API, which ultimately powers search on the new Wellcome Collection website.

All of our models are defined as Scala case classes, and we use Scanamo to interact with DynamoDB. Scanamo is a wrapper around the DynamoDB SDK which hides the work of serialising and deserialising case classes into the DynamoDB internal format.

When we change the pipeline, we want to reprocess all the existing records in DynamoDB (we call this “reindexing”). If you want to iterate over the records in DynamoDB, you have to do a Scan operation. A Scan returns the records in sequence, so you can only run one worker at a time – this is pretty slow. We want to process the table in parallel, so we have a DIY mechanism for dividing the table into “shards”, and then we process each shard separately.

DynamoDB tables can produce an event stream of updates to the table. We connect this stream to a Lambda function, which picks a “reindex shard” for a row, and writes that shard back to the table. The shard ID is copied to a global secondary index (GSI), which allows us to efficiently work out which rows are in a particular reindex shard.

When we want to reindex the table, we run one worker per reindex shard – every row is in exactly one reindex shard, and the GSI lets us look up the contents of each shard. It runs significantly faster than processing the table in sequence.

It’s also pretty brittle. It relies on the DynamoDB stream and the Lambda working correctly (both of which can be flaky), it’s extra infrastructure for us to maintain, and we’re stuck with a fixed shard size. If we decide to change the shard size later, we need to go back and reshard the entire table.

This has a whiff of Not Invented Here syndrome. We can’t be the only people who want to process a DynamoDB table in parallel!

Yesterday, I stumbled across an old blog post announcing parallel scans in DynamoDB. This is exactly what we need – it’s a supported API, doesn’t require extra infrastructure from us, and it lets us pick a different shard size on each scan. It’s worth a look.

I couldn’t find an implementation of parallel scan that also uses Scanamo and case classes, so I decided to write my own. (I did Google it before diving in!) It’s a useful standalone component, so I thought I’d write up what I found.

Note: this is a prototype, not production code. We’ll probably put it in production at some point, but I don’t know how long that’ll be.

Read more →

Avoiding the automatic redirect on Tumblr posts

Recently I’ve noticed Tumblr being much more aggressive about redirecting me to the dashboard.

I often save permalinks to read later, but when I try to follow them I get a 303 redirect, and I’m bounced back to my dashboard. (I think it’s trying to redirect me to the Tumblr app, but I’m often in a desktop browser, and there isn’t a desktop Tumblr app.)

It doesn’t happen consistently, which makes it even more infuriating.

The best workaround I’ve found is to change my User-Agent. If I pretend to be Googlebot when I’m browsing Tumblr [Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)], I skip being shown any privacy policies or being sent to my dashboard.

Ideas for inclusive conferences and events

I semi-regularly do a braindump of my ideas for running inclusive conferences, and I realised it would be useful to have it all written up in one place. Last week, I had a meeting with Lauren Couch, the head of Diversity and Inclusion at Wellcome, and she was kind enough to lend me the notes she wrote – this is the tidied-up version.

For better or for worse, conferences can be really important for career development. Sharing ideas, having conversations, meeting new people – if you can’t attend conferences, you miss out on a lot of these ooportunities. It’s important to open these events to as wide a range of people as possible, and make them feel welcome when they attend. It addresses a serious unfairness, and everyone benefits from having a wider diversity of people and ideas.

This is mostly based on my experience with small, community conferences in the European tech industry. Underrepresented groups in tech include women, people with disabilities, people on low income, people who don’t have European or US citizenship – and the suggestions reflect that. It definitely isn’t a complete list.

If you’re an event organiser, you can take these ideas to make your events more inclusive and accessible.

If you attend, speak at, or sponsor an event, you have power – you can ask for these accommodations where they don’t already exist. Be picky about where you choose to participate, and walk away from events that don’t meet your standards.

Read more →

Finding slow builds in Travis

For a while, I’ve been whinging on Twitter about Scala compile times – mostly driven by the ever-increasing length of the Travis builds at work.

Somebody made a comment on Friday about how we don’t really track our build times, just our gut feeling. “They seem to be getting slower.” Since the Travis API provides a whole heap of information about builds, I decided to dig into the data to find a pattern.

This post contains the code I used, in case it’s useful to somebody else.

Read more →