A tool for backing up your message history from Slack

I’ve just pushed a small tool to PyPI for backing up message history from Slack. It downloads your message history as a collection of JSON files, including public/private channels and DM threads.

This is mainly scratching my own itch: I don’t like having my data tied up in somebody’s proprietary system. Luckily, Slack provides an API that lets you get this data out into a plaintext form. This allows me to correct what I see as two deficiencies in the data exports provided by Slack:

Installation is pip install slack_history, then run slack_history --help for usage instructions.


Another example of why strings are terrible

Here’s a programming assumption I used to make, that until today I’d never really thought about: changing the case of a string won’t change its length.

Now, thanks to Hypothesis, I know better:

>>> x = u'İ'
>>> len(x)
>>> len(x.lower())

I’m not going to pretend I understand enough about Unicode or Python’s string handling to say what’s going on here.

I discovered this while testing a moderately fiddly normalisation routine – this routine would normalise the string to lowercase, unexpectedly tripping a check that it was the right length. If you’d like to see this for yourself, here’s a minimal example:

from hypothesis import given, strategies as st

def test_changing_case_preserves_length(xs):
    assert len(xs) == len(xs.lower())

Update, 2 December 2016: David MacIver asked whether this affects Python 2, 3, or both, which I forgot to mention. The behaviour is different: Python 2 lowercases İ to an ASCII i, whereas Python 3 adds a double dot: .

This means that only Python 3 has the bug where the length changes under case folding (whereas Python 2 commits a different sin of throwing away information).

Cory Benfield pointed out that the Unicode standard has explicit character mappings that add or remove characters when changing case, and highlights a nice example in the other direction: when you uppercase the German esszett (ß), you replace it with a double-S.

Finally, Rob Wells wrote a follow-on post that explains this problem in more detail. He also points out the potential confusion of len(): should it count visible characters, or Unicode code points? The Swift String API does a rather good job here: if you haven’t used it, check out Apple’s introductory blog post.

Some low-tech ways to get more ideas

So on Tuesday, I saw this tweet from David MacIver:

What are some good low tech devices for intelligence expansion? Current list I have:

* Pencil and paper
* Cards/dice
* A second person

“Intelligence expansion” is a phrase that here means “anything you can use to understand things or solve problems that would be hard with an unaided brain”.

I wanted to reply, but I have more to say than will fit in 140 characters. So instead, in blog post form, here are my low-tech suggestions for tackling tricky topics. This isn’t everything you could try, but these are the techniques I’ve found work best for me.

Read more →

Use keyring to store your credentials

I write a lot of Python scripts that interact with online services, which usually means requires my passwords and API keys. But how to store them?

The simplest approach would be to save my variable in my unencrypted source code:

PASSWORD = 'password!'

This is a terrible idea. Don’t do this.

This password is now trivially accessible to anybody who has access to the source code. If I ever want to share my code (and I often do), I have to remember to carefully scrub it of sensitive information. If I use a version control system like Git, the password is permanently baked into the history of the repository.1

So what’s the alternative? If I don’t want to put secrets directly in the source code, how can I make them available at runtime? I use the keyring module.

Read more →

The A stands for Asexual

Today is a more personal post: I want to talk about asexuality.

The idea of asexuality is still quite a new one for a lot of people. Many people haven’t even heard of it, and among those who have, there are still some common misunderstandings and mistakes. Compared to other orientations, asexuality doesn’t get much time in the spotlight.

I’ve identified as asexual for the last couple of years, and so for this year’s Asexual Awareness Week, I’d like to give you a brief introduction to my orientation. (I spent a lot of time worrying over whether to post this, so much so that I missed the end of AAW. Oh well, better late than never.)

Read more →

Creating low contrast wallpapers with Pillow

In my last post, I explained how I’d been using Pillow to draw regular tilings of the plane. What I was actually trying to do was get some new desktop wallpapers, and getting to use a new Python library was just a nice bonus.

A while back, the Code Review Stack Exchange got a fresh design that featured, among other things, a low-contrast background of coloured squares:

I was quite keen on the effect, and wanted to use it as my desktop wallpaper, albeit in different colours. I like using low contrast wallpapers, and this was a nice pattern to try to mimic. My usual work is entirely text-based; this was a fun way to dip my toe into the world of graphics. And a few hours of Python later, I could generate these patterns in arbitrary colours:

In this post, I’ll explain how I went from having a tiling of the plane, to generating these wallpapers in arbitrary colours.

Read more →

Tiling the plane with Pillow

On a recent yak-shaving exercise, I’ve been playing with Pillow, an imaging library for Python. I’ve been creating some simple graphics: a task for which I usually use PGF or TikZ, but those both require LaTeX. In this case, I didn’t have a pre-existing LaTeX installation, so I took the opportunity to try Pillow, which is just a single pip install.1

Along the way, I had to create a regular tiling with Pillow. In mathematics, a tiling is any arrangement of shapes that completely covers the 2D plane (a flat canvas), without leaving any gaps. A regular tiling is one in which every shape is a regular polygon – that is, a polygon in which every angle is equal, and every side has the same length.

There are just three regular tilings of the plane: with squares, equilateral triangles, and regular hexagons. Here’s what they look like, courtesy of Wikipedia:

In this post, I’ll explain how I reproduced this effect with Pillow. This is a stepping stone for something bigger, which I’ll write about in a separate post.

If you just want the code, it’s all in a script you can download.

Read more →

Why I use py.test

A question came up in Slack at work recently: “What’s your favorite/recommended unit test framework [for Python]?” I gave a brief recommendation at the time, but I thought it’d be worth writing up my opinions properly.

In Python, the standard library has a module for writing unit tests – the aptly-named unittest – but I tend to eschew this in favour of py.test. There are a few reasons I like py.test: my tests tends to be cleaner, have less boilerplate, and I get better test results. If you aren’t using py.test already, maybe I can persuade you to start.

I’m assuming you’re already somewhat familiar with the idea of unit testing. If not, I’d recommend Ned Batchelder’s talk Getting Started Testing and Eevee’s post Testing, for people who hate testing.

So, why do I prefer py.test?

Read more →

A shell alias for tallying data

Here’s a tiny shell alias that I find useful when going through data on the command line.

Suppose I have a big collection of data, and I’d like to know which items occur most frequently: I want to build a tally. I have this shell alias defined that lets me build such a tally:

alias tally='sort | uniq -c | sort'

Here’s an example of the sort of output returned by piping to tally, a nice tabular format:

$ cat colors.txt | tally
   8 yellow
  45 red
  68 green
 100 blue

(Note: on some Linuxes, sort uses alphabetical sorting, so you’ll want to replace the second sort with sort -h to get a tally that sorts numerically.)

If you want to get the most common items from a tally, that’s just another pipe: send the output from tally to tail -n 5, replacing 5 with the number of most common items you’d like to see.

Another example: let’s see the five most common HTTP status codes in my Apache log. I read the entire log, use awk to extract the status code, and then pass the output to tally:

$ cat access.log | awk '{print $9}' | tally | tail -n5
  15804 302
  31955 204
  39115 301
  88825 404
 952709 200

This is one of the simplest aliases in my shell config, but I still like having it around. Anything that saves me a bit of typing and thinking is usually worthwhile.

My travelling tech bag

I have a small bag I carry whenever I’m travelling and taking my laptop or phone with me. It includes all the adapters and power cables I usually expect to need. The idea is that I could pick it up at any time, and have it be ready to go. I don’t have to faff around finding parts if I’m in a hurry.

I got a few questions about this at PyCon last week, so I thought I’d make a quick list of what it currently contains. Not everybody needs everything in this bag, but it’s worth thinking about how much (or little!) you could carry and always have what you need.

This is what my bag looks like, straight after PyCon:

A photograph of my tech bag. A rectangular pouch with two compartments, stuffed with electronics equipment.

Read more →