Atomic, cross-filesystem moves in Python

If you want to move a file around in Python, the standard library gives you at least two options: os.rename() or shutil.move(). Both of them work in certain circumstances, but they make different tradeoffs:

Sometimes you want both of those properties: you want to move across a filesystem boundary and have an atomic move.

For example: Loris, an image server. When a user requests an image, we start by downloading it from the source to a temporary folder. If the download succeeds, we move the saved image into another cache, and that known-good cache is used to serve the image to the user. We want that move to be atomic – so we won’t serve a partial image from the cache – and in some setups, the temporary download folder and the image cache are on different filesystems. We need a move function that can be both atomic and work across filesystems.

I’ve had to write code for this a couple of times now, so I’m writing it up here both as a reminder to myself, and an instruction for other people in case it’s useful.

Writing the code

If we’re copying within the same filesystem, os.rename() gives us everything we need. Let’s try that first, and only do something different if we get an error:

import os


def safe_move(src, dst):
    try:
        os.rename(src, dst)
    except OSError:
        # do something else...

This except clause is quite broad – it catches and retries any error thrown by os.rename(). There are lots of errors that have nothing to do with a cross-filesystem move – for example, if the source file just doesn’t exist! We should only catch and retry the specific error that comes from copying across a filesystem boundary.

If you try it, this is the error you get:

>>> import os
>>> os.rename("/mnt/semele/hello.txt", "/mnt/dionysus/hello.txt")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 18] Cross-device link: '/mnt/semele/hello.txt' -> '/mnt/dionysus/hello.txt'

Error code 18 is what we want to retry – this is a standard Linux error number meaning “invalid cross-device link”. We can use the errno library to get 18 as a named variable that’s a little less of a magic number, like so:

import errno


def safe_move(src, dst):
    try:
        os.rename(src, dst)
    except OSError as err:
        if err.errno == errno.EXDEV:
            # do something else...
        else:
            raise

So now we need to decide what “something else” looks like.

To get the file onto the same filesystem, we can use shutil.move() to put it in the same directory as the intended destination, but with a different filename. As a first pass, we might try something like:

import shutil


def safe_move(src, dst):
    ...
    # do something else
    tmp_dst = dst + ".tmp"
    shutil.copyfile(src, tmp_dst)
    os.rename(tmp_dst, dst)
    os.unlink(tmp_dst)

This could be okay in certain circumstances, but if you have multiple worker processes you could end up with a corrupted destination. If you’re running multiple processes, and they both try to copy to the temporary destination, you could get garbage data in that file. One process might think it’s completed the copy, then rename the file as the other process is still writing to it.

To avoid processes treading on each other’s toes, add a unique ID to each copy – that way they can’t overlap. Closer to:

import uuid


def safe_move(src, dst):
    ...
    # do something else
    copy_id = uuid.uuid4()
    tmp_dst = "%s.%s.tmp" % (dst, copy_id)
    shutil.copyfile(src, tmp_dst)
    os.rename(tmp_dst, dst)
    os.unlink(tmp_dst)

This is an idea I originally got from a Stack Overflow answer about lock-free copy algorithms. This isn’t quite the same problem as that question – in particular, I don’t care if the file already exists – but the answer and the linked paper make interesting reading.

Putting it all together

If you just want the code, here’s the final version (with comments):

import errno
import os
import shutil


def safe_move(src, dst):
    """Rename a file from ``src`` to ``dst``.

    *   Moves must be atomic.  ``shutil.move()`` is not atomic.
        Note that multiple threads may try to write to the cache at once,
        so atomicity is required to ensure the serving on one thread doesn't
        pick up a partially saved image from another thread.

    *   Moves must work across filesystems.  Often temp directories and the
        cache directories live on different filesystems.  ``os.rename()`` can
        throw errors if run across filesystems.

    So we try ``os.rename()``, but if we detect a cross-filesystem copy, we
    switch to ``shutil.move()`` with some wrappers to make it atomic.
    """
    try:
        os.rename(src, dst)
    except OSError as err:

        if err.errno == errno.EXDEV:
            # Generate a unique ID, and copy `<src>` to the target directory
            # with a temporary name `<dst>.<ID>.tmp`.  Because we're copying
            # across a filesystem boundary, this initial copy may not be
            # atomic.  We intersperse a random UUID so if different processes
            # are copying into `<dst>`, they don't overlap in their tmp copies.
            copy_id = uuid.uuid4()
            tmp_dst = "%s.%s.tmp" % (dst, copy_id)
            shutil.copyfile(src, tmp_dst)

            # Then do an atomic rename onto the new name, and clean up the
            # source image.
            os.rename(tmp_dst, dst)
            os.unlink(src)
        else:
            raise

I’ve been running code like this in production for over a year (as part of our Loris installation at Wellcome), and used it in a few other places with no issues.