How to set the clock on a Worcester 28CDi boiler

Every month or so, there’s a power cut in my flat, and my boiler clock resets. My boiler is a Worcester 28CDi – these instructions explain how to change the clock on that boiler.

I’m writing these for the sake of my future self and to populate Google – if you don’t have this boiler, this won’t be of any interest. These instructions are available in the boiler manual, but digging that out and finding the right page gets tedious if you do it regularly.

Continue reading →


aspell, a command-line spell checker

At this month’s WriteTheDocs London, there was a discussion of “docs-as-code”. This is the idea of using plain-text formats for your documentation, and storing it alongside your code — as opposed to using a wiki or another proprietary format. This allows you to use the same tools for code and for docs: version control, code review, text editors, and so on. By making it easier to move between the two, it’s more likely that docs will be written and updated with code changes.

But one problem is that text editors for programmers tend to disable spellcheck. This is sensible for code: program code bears little resemblance to prose, and the spellcheck would be too noisy to be helpful. But what about writing prose? Where are the red and green squiggles to warn you of spelling mistakes?

To plug the gap, I’m a fan of the command-line spellchecker aspell.

Continue reading →


Silence is golden

As I write this, it’s the last day of PyCon UK. The air is buzzing with the sound of sprints and productivity. I’ll write a blog post about everything that happened at PyCon later (spoiler: I’ve had a great time), but right now I’d like to write about one specific feature – an idea I’d love to see at every conference. I’ve already talked about live captioning – now let’s talk about quiet rooms.

I’m an introvert. Don’t get me wrong: I enjoy socialising at conferences and meetups. I get to meet new people, or put faces to names I’ve seen online. Everybody I’ve met this week has been lovely and nice, but there’s still a limit to how much socialising I can do. Being in social situations is quite draining, and a full day of conference is more than I can manage in one go. At some point, I need to step back and recharge.

I don’t think this is atypical in the tech/geek communities.

So I’ve been incredibly grateful that the conference provides a quiet room. It’s exactly what the name suggests – a space set aside for quiet working and sitting. Whenever I’ve been feeling a bit overwhelmed by the bustle of the main conference, I can step into the quiet room. Some clear head space helps me through the day.

PyCon was held in Cardiff City Hall, and the designated quiet room was the Council Chamber. It’s a really nice and large space:

The council chamber at Cardiff City Hall

If there hadn’t been a quiet room, I’d have worn out much faster and probably been miserable towards the end of the conference. It made a big difference to my experience. I think it’s a great feature, and I’ll be looking for it at the next conference I attend.


Live captioning at conferences

This weekend, I’ve been attending PyCon UK in Cardiff. This is my first time at a PyCon (or indeed, at any tech conference), and one nice surprise has been the live captioning of the talks.

At the front of the main room, there are two speech-to-text reporters transcribing the talk in real-time. Their transcription is shown as live, scrolling text on several large screens throughout the room, and shows up within seconds of the speaker finishing a word.

Here’s what one of those screens looks like:

Photo by @drvinceknight on Twitter. Used with permission.

I’m an able-bodied person. I appreciate the potential value of live captioning for people with hearing difficulties – but my hearing is fine. I wasn’t expecting to use the transcription.

Turns out – live captioning is really useful, even if you can already hear what the speaker is saying!

Maintaining complete focus for a long time is remarkably hard. Inevitably, my focus slips, and I miss something the speaker says – a momentary distraction, my attention wanders, or somebody coughs at the wrong moment. Without the transcript, I have to fill in the blank myself, and there’s a few seconds of confusion before I get back into the talk. With the transcript, I can see what I missed. I can jump straight back in, without losing my place. I’ve come to rely on the transcript, and I miss it when I’m in talks without it. (Unfortunately, live captioning is only in one of the three rooms running talks.)

And I’m sure I wasn’t the only person who found them helpful. I saw and heard comments from lots of other people about the value of the live captioning, and it was great for them to get a call-out in Saturday’s opening remarks. This might be pitched as an accessibility feature, but it can help everybody.

If you’re running a conference (tech or otherwise), I would strongly recommend providing this service.


Python snippet: dealing with query strings in URLs

I spend a lot of time dealing with URLs: in particular, with URL query strings. The query string is the set of key-value pairs that comes after the question mark in a URL. For example:

http://example.net?name=alex&color;=red

Typically I want to do one of two things: get the value(s) associated with a particular key, or create a new URL with a different key-value pair.

This is possible with the Python standard library’s urllib.parse module, but it’s a bit fiddly and requires chaining several functions together. Since I do this fairly often, I have a pair of helper functions that I copy-and-paste into new projects when I need to do this. And since it’s fairly generic, I thought it might be worth sharing more widely.

Continue reading →


Python snippet: Is a URL from a Tumblr post?

I’ve been writing some code recently that takes a URL, and performs some special actions if that URL is a Tumblr post. The problem is working out whether a given URL points to Tumblr.

Most Tumblrs use a consistent naming scheme: username.tumblr.com, so I can detect them with a regular expression. But some Tumblrs use custom URLs, and mask their underlying platform: for example, http://travelingcolors.net or http://wordstuck.co.vu. Unfortunately, I encounter enough of these that I can’t just hard-code them, and I really should handle them properly.

So how can I know if an arbitrary URL belongs to Tumblr?

I’ve had to do this a couple of times now, so I thought it was worth writing up what to do – partly for my future reference, partly in case anybody else finds it useful.

In the HTTP headers on a Tumblr page, there are a couple of “X-Tumblr” headers. These are custom headers, defined by Tumblr – they aren’t part of the official HTTP spec. They aren’t documented anywhere, but it’s clear who’s sending them, and I’d be quite surprised to see another site send them. For my purposes, this is a sufficiently reliable indicator.

So this is the function I use to detect Tumblr URLs:

try:
    from urllib.parse import urlparse
except ImportError:  # Python 2
    from urlparse import urlparse

import requests


def is_tumblr_url(url):
    if urlparse(url).netloc.endswith('.tumblr.com'):
        return True
    else:
        req = requests.head(url)
        return any(h.startswith('X-Tumblr') for h in req.headers)

It’s by no means perfect, but it’s a step-up from basic string matching, and accurate and fast enough that I can usually get by.


Python snippets: Cleaning up empty/nearly empty directories

Last month, I wrote about some tools I’d been using to clear disk space on my Mac. I’ve been continuing to clean up my mess of files and folders as I try to simplify my hard drive, and there are two new scripts I’ve been using to help me. Neither is particularly complicated, but I thought they were worth writing up properly.

Depending on how messy your disk is, these may or may not be useful to you – but they’ve saved a lot of time for me.

Of course, you should always be very careful of code that deletes or rearranges files on your behalf, and make sure you have good backups before you start.

Continue reading →


Python snippets: Chasing redirects and URL shorteners

Quick post today. A few years back, there was a proliferation of link shorteners on Twitter: tinyurl, bit.ly, j.mp, goo.gl, and so on. When characters are precious, you don’t want to waste them with a long URL. This is frustrating for several reasons:

  • It becomes harder to see where a particular link goes.
  • If the link shortener goes away, all the links break, even if the pages behind the links are still up.
  • Often the same link would be wrapped multiple times: a j.mp link would redirect to goo.gl, then adf.ly, before finally getting to the destination.

Twitter have tried to address this with their t.co link shortener. All links in Twitter get wrapped with t.co, so long URLs no longer penalise your character count, and they show a short preview of the destination URL. But this is still fragile – Twitter may not last forever – and people still wrap links in multiple shorteners.

When I’m storing data with shortened links, I like to record where the link is supposed to go. I keep the shortened and the resolved link, which tends to be pretty future-proof.

To find out where a shortened URL goes, I could just open it in a web browser. But that’s slow and manual, and doesn’t work if I want to save the URL as part of a scripted pipeline. So I have a couple of utility functions to help me out.

Continue reading →


Clearing disk space on OS X

Over the weekend, I’ve been trying to clear some disk space on my Mac. I’ve been steadily accumulating lots of old and out-of-date files, and I just wanted a bit of a spring clean. Partly to get back that the disk space, partly so I didn’t have to worry about important information files that might be getting lost in the noise.

Over the course of a few hours, I was able to clean up over half a terabyte of old files. This wasn’t just trawling through the Finder by hand – I had a couple of tools and apps to help me do this – and I thought it would be worth writing down what I used.

Backups: Time Machine, SuperDuper! and CrashPlan

Embarking on an exercise like this without good backups would be foolhardy: what if you get cold feet, or accidentally delete a critical file? Luckily, I already have three layers of backup:

  • Time Machine backups to a local hard drive
  • Overnight SuperDuper! clones to a local hard drive
  • Backups to the CrashPlan cloud

The Time Machine backups go back to last October, the CrashPlan copies further still. I haven’t looked at the vast majority of what I deleted in months (some stuff years), so I don’t think I’ll miss it – but if I change my mind, I’ve got a way out.

For finding the big folders: DaisyDisk

DaisyDisk can analyse a drive or directory, and it presents you with a pie chart like diagram showing which folders are taking up the most space. You can drill down into the pie segments to see the contents of each folder. For example, this shows the contents of my home directory:

This is really helpful for making big space savings – it’s easy to see which folders have become bloated, and target my cleaning accordingly. If you want quick gains, this is a great app.

It’s also fast: scanning my entire 1TB boot drive took less than ten seconds.

For finding duplicate files: Gemini

Once I’ve found a large directory, I need to decide what (if anything) I want to delete. Sometimes I can look for big files that I know I don’t want any more, and move them straight to the Trash. But the biggest waste of space on my computer is multiple copies of the same file. Whenever I reorganise my hard drive, files get copied around, and I don’t always clean them up.

Gemini is a tool that can find duplicate files or folders within a given set of directories. For example, running it over a selection of my virtualenvs:

Once it’s found the duplicates, you can send files straight to the Trash from within the app. It has some handy filters for choosing which dupes to drop – oldest, newest, from within a specific directory – so doing so is pretty quick.

This is another fast way to reclaim space: deleting dupes saves space, but doesn’t lose any information.

Gemini isn’t perfect: it gets slow when scanning large directories (100k+ files), and sometimes it would miss duplicates. I often had to run it several times before it had found out all of the dupes in a directory. Note that I’m only running v1: these problems may be fixed in the new version.

File-by-file comparisons: Python’s filecmp module

Sometimes I wanted to compare a couple of individual files, not an entire directory. For this, I turned to Python’s filecmp module. This module contains a number of functions for comparing files and directories. This let me write a shell function for doing the comparisons on the command-line (this is the fish shell):

function filecmp
    python -c "import filecmp; print(filecmp.cmp('''$argv[1]''', '''$argv[2]'''))"
end

Fish drops in the two arguments to the function as $argv[1] and $argv[2]. The -c flag tells Python to run a command passed in as a string, and then it’s printing the result of calling filecmp.cmp() with the two files – True if they match, False if they don’t.

I’m using triple-quoted strings in the Python, so that filenames containing quote characters don’t prematurely terminate the string. I could still be bitten by a filename that contains a triple quote, but that would be very unusual. And unlike Python, where quote characters are interchangeable, it’s important that I use double-quotes for the string in the shell: shells only expand variables inside double-quoted strings, not single-quoted strings.

Usage is as follows:

$ filecmp hello.txt hello.txt
True

$ filecmp hello.txt foo.txt
False

I have this in my fish config file, so it’s available in any Terminal window. If you drag a file from the Finder to the Terminal, it auto-inserts the full path to that file, so it’s really easy to do comparisons – I type filecmp, and then drag in the two files I want to compare.

This is great if I only want to compare a few files at a time. I didn’t use it much on the big clean, but I’m sure it’ll be useful in the future.

Photos library: Duplicate Photos Cleaner

Part of this exercise was trying to consolidate my photo library. I’ve tried a lot of tools for organising my photos – iPhoto, Aperture, Lightroom, a folder hierarchy – and so photos are scattered across my disk. I’ve settled on using iCloud Photo Library for now, but I still had directories with photos that I hadn’t imported.

When I found a directory with new pictures, I just loaded everything into Photos. It was faster than cherry-picking the photos I already didn’t have, and ensures I didn’t miss anything – but of course, it also ensures that I import any duplicates.

Once I’d finished importing photos from the far corners of my disk, I was able to use this app to find duplicates in my photo library, and throw them away. It scans your entire Photo Library (it can do iPhoto and Photo Booth as well), and moves any duplicates to a dedicated album, for you to review/delete at will.

I chose the app by searching the Mac App Store; there are plenty of similar apps, and I don’t know how this one compares. I don’t have anything to particularly recommend it compared to other options, but it found legitimate duplicates, so it’s fine for my purposes.

Honourable mentions: find, du and df

There were a couple of other command-line utilities that I find useful.

If I wanted to find out which directories contain the most files – not necessarily the most space – I could use find. This isn’t about saving disk space, it’s about reducing the sheer number of unsorted files I keep. There were two commands I kept using:

  • Count all the files below the current directory: both files in this directory, and all of its subdirectories.
    $ find . | wc -l
            2443
  • Find out which of the subdirectories of the current directory contain the most files.
    $ for l in (ls); if [ -d $l ]; echo (find $l | wc -l)"  $l"; end; end
             627  _output
             262  content
              31  screenshots
               3  talks
              33  theme
              11  util

These two commands let me focus on processing directories that had a lot of files. It’s nice to clear away a large chunk of these unsorted files, so that I don’t have to worry about what they might contain.

And when I’m using Linux, I can mimic the functions of DaisyDisk with df and du. The df (display free space) command lets you see how much space is free on each of my disk partitions:

$ df -h
Filesystem      Size   Used  Avail Capacity   iused     ifree %iused  Mounted on
/dev/disk2     1.0Ti  295Gi  741Gi    29%  77290842 194150688   28%   /
devfs          206Ki  206Ki    0Bi   100%       714         0  100%   /dev
map -hosts       0Bi    0Bi    0Bi   100%         0         0  100%   /net
map auto_home    0Bi    0Bi    0Bi   100%         0         0  100%   /home
/dev/disk3s4   2.7Ti  955Gi  1.8Ti    35% 125215847 241015785   34%

And du (display disk usage) lets me see what’s using up space in a single directory:

$ du -hs *
 24K    experiments
 32K    favicon-a.acorn
 48K    favicon.acorn
 24K    style
 56K    templates
 40K    touch-icon-a.acorn

I far prefer DaisyDisk when I’m on the Mac, but it’s nice to have these tools in my back pocket.

Closing thought

These days, disk space is cheap (and even large SSDs are fairly affordable). So I don’t need to do this: I wasn’t running out of space, and it would be easy to get more if I was. But it’s useful for clearing the noise, and finding old files that have been lost in the bowels of my hard drive.

I do a really big cleanup about once a year, and having these tools always makes me much faster. If you ever need to clear large amounts of disk space, I’d recommend any of them.


A two-pronged iOS release cycle

One noticeable aspect of this year’s WWDC keynote was a lack of any new features focused on the iPad. Federico Viticci has written about this this on Mac Stories, in which he said:

I wouldn’t be surprised to see Apple move from a monolithic iOS release cycle to two major iOS releases in the span of six months – one focused on foundational changes, interface refinements, performance, and iPhone; the other primarily aimed at iPad users in the Spring.

I think this is a very plausible scenario, and between iOS 9.3 and WWDC, it seems like it might be coming true. Why? Education.

Apple doesn’t release breakdowns, but a big chunk of iPad sales seems to come from the education market. Education runs to a fixed schedule: the academic year starts in the autumn, continues over winter and spring, with a long break in the summer. A lot of work happens in the summer break, which includes lesson plans for the coming year, and deploying new tech.

The traditional iOS release cycle – preview the next big release at WWDC, release in the autumn – isn’t great for education. By the time the release ships, the school year is already underway. That can make it difficult for schools to adopt new features, often forcing them to wait for the next academic year.

If you look at the features introduced in iOS 9.3 – things like Shared iPad, Apple School Manager, or Managed Apple ID – these aren’t things that can be rolled out mid-year. They’re changes at the deployment stage. Once students have the devices, it’s too late. Even smaller things, like changes to iTunes U, can’t be used immediately, because they weren’t available when lesson plans were made over the summer. (And almost no teachers are going to run developer previews.)

This means there’s less urgency to get education-specific iPad features into the autumn release, because it’s often a year before they can be deployed. In a lot of cases, deferring that work for a later release (like with iOS 9.3) doesn’t make much of a difference for schools. And if you do that, it’s not a stretch to defer the rest of the iPad-specific work, and bundle it all into one big release that focuses on the iPad. Still an annual cycle, but six months offset from WWDC.

Moving to this cycle would have other benefits. Splitting the releases gives Apple more flexibility: they can spread the engineering work across the whole year, rather than focusing on one massive release for WWDC. It’s easier to slip an incomplete feature if the next major release is six months away, not twelve. And it’s a big PR item for the spring, a time that’s usually quiet for official Apple announcements.

I don’t know if this is Apple’s strategy; I’m totally guessing. But it seems plausible, and I’ll be interested to see if it pans out into 2017.


← Older Posts