Create space-saving clones on macOS with Python
The standard Mac filesystem, APFS, has a feature called space-saving clones. This allows you to create multiple copies of a file without using additional disk space – the filesystem only stores a single copy of the data.
Although cloned files share data, they’re independent – you can edit one copy without affecting the other (unlike symlinks or hard links). APFS uses a technique called copy-on-write to store the data efficiently on disk – the cloned files continue to share any pieces they have in common.
Cloning files is both faster and uses less disk space than copying. If you’re working with large files – like photos, videos, or datasets – space-saving clones can be a big win.
Several filesystems support cloning, but in this post, I’m focusing on macOS and APFS.
For a recent project, I wanted to clone files using Python. There’s an open ticket to support file cloning in the Python standard library. In Python 3.14, there’s a new Path.copy()
function which adds support for cloning on Linux – but there’s nothing yet for macOS.
In this post, I’ll show you two ways to clone files in APFS using Python.
Table of contents
What are the benefits of cloning?
There are two main benefits to using clones rather than copies.
Cloning files uses less disk space than copying
Because the filesystem only has to keep one copy of the data, cloning a file doesn’t use more space on disk. We can see this with an experiment. Let’s start by creating a random file with 1GB of data, and checking our free disk size:
$ dd if=/dev/urandom of=1GB.bin bs=64M count=16
16+0 records in
16+0 records out
1073741824 bytes transferred in 2.113280 secs (508092550 bytes/sec)
$ df -h -I /
Filesystem Size Used Avail Capacity Mounted on
/dev/disk3s1s1 460Gi 14Gi 43Gi 25% /
My disk currently has 43GB available.
Let’s copy the file, and check the free disk space after it’s done. Notice that it decreases to 42GB, because the filesystem is now storing a second copy of this 1GB file:
$ # Copying
$ cp 1GB.bin copy.bin
$ df -h -I /
Filesystem Size Used Avail Capacity Mounted on
/dev/disk3s1s1 460Gi 14Gi 42Gi 25% /
Now let’s clone the file by passing the -c
flag to cp
. Notice that the free disk space stays the same, because the filesystem is just keeping a single copy of the data between the original and the clone:
$ # Cloning
$ cp -c 1GB.bin clone.bin
$ df -h -I /
Filesystem Size Used Avail Capacity Mounted on
/dev/disk3s1s1 460Gi 14Gi 42Gi 25% /
Cloning files is faster than copying
When you clone a file, the filesystem only has to write a small amount of metadata about the new clone. When you copy a file,it needs to write all the bytes of the entire file. This means that cloning a file is much faster than copying, which we can see by timing the two approaches:
$ # Copying
$ time cp 1GB.bin copy.bin
Executed in 260.07 millis
$ # Cloning
$ time cp -c 1GB.bin clone.bin
Executed in 6.90 millis
This 43× difference is with my Mac’s internal SSD. In my experience, the speed difference is even more pronounced on slower disks, like external hard drives.
How do you clone files on macOS?
Using the “Duplicate” command in Finder
If you use the Duplicate command in Finder (File > Duplicate or ⌘D), it clones the file.
Using cp -c
on the command line
If you use the cp
(copy) command with the -c
flag, and it’s possible to clone the file, you get a clone rather than a copy. If it’s not possible to clone the file – for example, if you’re on a non-APFS volume that doesn’t support cloning – you get a regular copy.
Here’s what that looks like:
$ cp -c src.txt dst.txt
Using the clonefile()
function
There’s a macOS syscall clonefile()
which creates space-saving clones. It was introduced alongside APFS.
Syscalls are quite low level, and they’re how programs are meant to interact with the operating system. I don’t think I’ve ever made a syscall directly – I’ve used wrappers like the Python os
module, which make syscalls on my behalf, but I’ve never written my own code to call them.
Here’s a rudimentary C program that uses clonefile()
to clone a file:
#include <stdio.h>
#include <stdlib.h>
#include <sys/clonefile.h>
int main(void) {
const char *src = "1GB.bin";
const char *dst = "clone.bin";
/* clonefile(2) supports several options related to symlinks and
* ownership information, but for this example we'll just use
* the default behaviour */
const int flags = 0;
if (clonefile(src, dst, flags) != 0) {
perror("clonefile failed");
return EXIT_FAILURE;
}
printf("clonefile succeeded: %s ~> %s\n", src, dst);
return EXIT_SUCCESS;
}
You can compile and run this program like so:
$ gcc clone.c
$ ./a.out
clonefile succeeded: 1GB.bin ~> clone.bin
$ ./a.out
clonefile failed: File exists
But I don’t use C in any of my projects – can I call this function from Python instead?
How do you clone files with Python?
Shelling out to cp -c
using subprocess
The easiest way to clone a file in Python is by shelling out to cp -c
with the subprocess
module. Here’s a short example:
import subprocess
# Adding the `-c` flag means the file is cloned rather than copied,
# if possible. See the man page for `cp`.
subprocess.check_call(["cp", "-c", "1GB.bin", "clone.bin"])
I think this snippet is pretty simple, and a new reader could understand what it’s doing. If they’re unfamiliar with file cloning on APFS, they might not immediately understand why this is different from shutil.copyfile
, but they could work it out quickly.
This approach gets all the nice behaviour of the cp
command – for example, if you try to clone on a volume that doesn’t support cloning, it falls back to a regular file copy instead. There’s a bit of overhead from spawning an external process, but the overall impact is negligible (and easily offset by the speed increase of cloning).
The problem with this approach is that error handling gets harder. The cp
command fails with exit code 1 for every error, so you need to parse the stderr to distinguish different errors, or implement your own error handling.
In my project, I wrapped this cp
call in a function which had some additional checks to spot common types of error, and throw them as more specific exceptions. Any remaining errors get thrown as a generic subprocess.CalledProcessError
. Here’s an example:
from pathlib import Path
import subprocess
def clonefile(src: Path, dst: Path):
"""Clone a file on macOS by using the `cp` command."""
# Check a couple of common error cases so we can get nice exceptions,
# rather than relying on the `subprocess.CalledProcessError` from `cp`.
if not src.exists():
raise FileNotFoundError(src)
if not dst.parent.exists():
raise FileNotFoundError(dst.parent)
# Adding the `-c` flag means the file is cloned rather than copied,
# if possible. See the man page for `cp`.
subprocess.check_call(["cp", "-c", str(src), str(dst)])
assert dst.exists()
For me, this code strikes a nice balance between being readable and returning good errors.
Calling the clonefile()
function using ctypes
What if we want detailed error codes, and we don’t want the overhead of spawning an external process? Although I know it’s possible to make syscalls from Python using the ctypes
library, I’ve never actually done it. This is my chance to learn!
Following the documentation for ctypes
, these are the steps:
Import
ctypes
and load a dynamic link library. This is the first thing we need to do – in this case, we’re loading the macOS link library that contains theclonefile()
function.import ctypes libSystem = ctypes.CDLL("libSystem.B.dylib")
I worked out that I need to load
libSystem.B.dylib
by looking at other examples ofctypes
code on GitHub. I couldn’t find an explanation of it in Apple’s documentation.I later discovered that I can use
otool
to see the shared libraries that a compiled executable is linking to. For example, I can see thatcp
is linking to the samelibSystem.B.dylib
:$ otool -L /bin/cp /bin/cp: /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1351.0.0)
This
CDLL()
call only works on macOS, which makes sense – it’s loading macOS libraries. If I run this code on my Debian web server, I get an error: OSError: libSystem.B.dylib: cannot open shared object file: No such file or directory.Tell
ctypes
about the function signature. If we look at the man page forclonefile()
, we see the signature of the C function:int clonefile(const char * src, const char * dst, int flags);
We need to tell
ctypes
to find this function insidelibSystem.B.dylib
, then describe the arguments and return type of the function:clonefile = libSystem.clonefile clonefile.argtypes = [ctypes.c_char_p, ctypes.c_char_p, ctypes.c_int] clonefile.restype = ctypes.c_int
Although
ctypes
can call C functions if you don’t describe the signature, it’s a good practice and gives you some safety rails.For example, now
ctypes
knows that theclonefile()
function takes three arguments. If I try to call the function with one or two arguments, I get aTypeError
. If I didn’t specify the signature, I could call it with any number of arguments, but it might behave in weird or unexpected ways.Define the inputs for the function. This function needs three arguments.
In the original C function,
src
anddst
arechar*
– pointers to a null-terminated string ofchar
values. In Python, this means the inputs need to bebytes
values. Thenflags
is a regular Pythonint
.# Source and destination files src = b"1GB.bin" dst = b"clone.bin" # clonefile(2) supports several options related to symlinks and # ownership information, but for this example we'll just use # the default behaviour flags = 0
Call the function. Now we have the function available in Python, and the inputs in C-compatible types, we can call the function:
import os if clonefile(src, dst, flags) != 0: errno = ctypes.get_errno() raise OSError(errno, os.strerror(errno)) print(f"clonefile succeeded: {src} ~> {dst}")
If the clone succeeds, this program runs successfully. But if the clone fails, we get an unhelpful error: OSError: [Errno 0] Undefined error: 0.
The point of calling the C function is to get useful error codes, but we need to opt-in to receiving them. In particular, we need to add the
use_errno
parameter to ourCDLL
call:libSystem = ctypes.CDLL("libSystem.B.dylib", use_errno=True)
Now, when the clone fails, we get different errors depending on the type of failure. The exception includes the numeric error code, and Python will throw named subclasses of
OSError
likeFileNotFoundError
,FileExistsError
, orPermissionError
. This makes it easier to writetry … except
blocks for specific failures.
Here’s the complete script, which clones a single file:
import ctypes
import os
# Load the libSystem library
libSystem = ctypes.CDLL("libSystem.B.dylib", use_errno=True)
# Tell ctypes about the function signature
# int clonefile(const char * src, const char * dst, int flags);
clonefile = libSystem.clonefile
clonefile.argtypes = [ctypes.c_char_p, ctypes.c_char_p, ctypes.c_int]
clonefile.restype = ctypes.c_int
# Source and destination files
src = b"1GB.bin"
dst = b"clone.bin"
# clonefile(2) supports several options related to symlinks and
# ownership information, but for this example we'll just use
# the default behaviour
flags = 0
# Actually call the clonefile() function
if clonefile(src, dst, flags) != 0:
errno = ctypes.get_errno()
raise OSError(errno, os.strerror(errno))
print(f"clonefile succeeded: {src} ~> {dst}")
I wrote this code for my own learning, and it’s definitely not production-ready. It works in the happy case and helped me understand ctypes
, but if you actually wanted to use this, you’d want proper error handling and testing.
In particular, there are cases where you’d want to fall back to shutil.copyfile
or similar if the clone fails – say if you’re on an older version of macOS, or you’re copying files on a volume which doesn’t support cloning. Both those cases are handled by cp -c
, but not the clonefile()
syscall.
In practice, how am I cloning files in Python?
In my project, I used cp -c
with a wrapper like the one described above. It’s a short amount of code, pretty readable, and returns useful errors for common cases.
Calling clonefile()
directly with ctypes
might be slightly faster than shelling out to cp -c
, but the difference is probably negligible. The downside is that it’s more fragile and harder for other people to understand – it would have been the only part of the codebase that was using ctypes
.
File cloning made a noticeable difference. The project involving copying lots of files on an external USB hard drive, and cloning instead of copying full files made it much faster. Tasks that used to take over an hour were now completing in less than a minute. (The files were copied between folders on the same drive – cloned files have to be on the same APFS volume.)
I’m excited to see how file cloning works on Linux in Python 3.14 with Path.copy()
, and I hope macOS support isn’t far behind.