Skip to main content

emptydir: look for (nearly) empty directories and delete them

I’ve posted a new command-line tool on GitHub: emptydir, which looks for directories which are empty or nearly empty, and deletes them.

This isn’t a completely trivial problem, because emptiness is deceptive. Consider the following folder. Finder tells us it has 0 items, so it must be empty, right?

A Finder window for a folder 'totally_empty' which doesn’t show any files, and the status bar says '0 items, 262.9 GB available'.

What you can’t see is the invisible .DS_Store file – this is a file that keeps some information about how you want the folder to appear in Finder. For example, if you arrange the icons on your Desktop, their positions get stored in a .DS_Store file. If you delete the file and relaunch Finder, your Desktop icons will revert to the default grid layout.

If there are files in the folder, the .DS_Store is a useful file to keep around. If the folder is empty, it’s not worth saving.

Because I don’t care about lonely .DS_Store files, I wrote emptydir with the following rules:

If a folder is completely empty, delete it.

If the only thing in a folder is a .DS_Store file, delete the entire folder.

If there’s anything else in the folder, leave it as-is.

This means that emptydir will clean up this apparently-empty folder and the hidden .DS_Store file it contains – but leave the .DS_Store file in place for folders where I want to keep it, like the Desktop.

There are a couple of other things which I’m similarly happy to delete if they’re the only thing in a folder – .venv (my Python virtual environments) and __pycache__ (compiled Python byte code), both of which are transient folders I can easily recreate.

Why not use find?

Deleting empty directories isn’t a new problem. For example, there’s an answer on Unix Stack Exchange with hundreds of upvotes that suggests the following command:

find . -type d -empty -delete

This is what I used for a while, but it only deletes folders that are completely empty – it will miss folders with .DS_Store files.

You could use find to delete all the .DS_Store files also:

find . -type f -name .DS_Store -delete

but this is too aggressive – it would also delete .DS_Store in non-empty folders where I’ve set some view options that I want to keep. If I’ve taken the time to arrange my icons carefully, I don’t want to reset them!

Maybe there’s a way to do what I want with find, but I couldn’t work out how to do it.

How does it work?

I started by looking for a Rust function that could walk a directory tree and recursively find all the subdirectories – the Rust equivalent of Python’s os.walk function.

I quickly stumbled upon the walkdir crate, which provides precisely this functionality. Adapting one of the examples from the README, I was able to build a simple iterator that prints all the subdirectories of the current directory:

use walkdir::WalkDir;

let directories = WalkDir::new(".")
    .into_iter()
    .filter_map(|e| e.ok())
    .filter(|e| e.file_type().is_dir());

for dir in directories {
    println!("{}", dir.path().display());
}

// .
// ./target
// ./target/debug
// ./target/debug/.fingerprint
// …

While I was reading the documentation, I discovered the contents_first() method – when you set this to true, it yields the contents of a directory before the directory itself. And my use case is called out explicitly: “this is useful when, e.g. you want to recursively delete a directory”.

let directories = WalkDir::new(".")
    .contents_first(true)
    .into_iter()
    .filter_map(|e| e.ok())
    .filter(|e| e.file_type().is_dir());
    
// ./target/debug/.fingerprint/example-7dcfb8b698ea9da0
// ./target/debug/.fingerprint
// ./target/debug
// …

(It turns out that Python’s os.walk has a similar argument topdown, which I’d never come across before writing this Rust code. Because I’ve been using os.walk for years and I “knew” how to use it, it’s been a long time since I looked at the Python docs.)

This iterator generates every directory, but I only want to get directories which are safe to delete. How do I know if a directory is empty, or only contains files/folders which are safe to delete?

I started with a function that lists all the entries in a given directory:

use std::collections::HashSet;
use std::ffi::OsString;
use std::fs;
use std::io;
use std::path::Path;

fn get_names_in_directory(dir: &Path) -> io::Result<HashSet<OsString>> {
    let mut names = Vec::new();

    for entry in fs::read_dir(dir)? {
        let entry = entry?;
        names.push(entry.file_name());
    }

    Ok(HashSet::from_iter(names))
}

println!("{:?}", get_names_in_directory(Path::from(".")));
// Ok({"Cargo.toml", "target", ".git", "src", ".gitignore", "Cargo.lock"})

println!("{:?}", get_names_in_directory(Path::from("/dev/null")));
// Err(Os { code: 20, kind: NotADirectory, message: "Not a directory" })

This returns a HashSet because sets are easy to compare.

I can also create a set of the names I consider safe to delete if they’re the only thing in a directory:

let deletable_names = HashSet::from([
    OsString::from(".DS_Store"),
    OsString::from("__pycache__"),
    OsString::from(".venv"),
]);

Then I can compare these two sets, to tell me if a directory is same to delete:

fn can_be_deleted(path: &Path) -> bool {
    let deletable_names = HashSet::from([
        OsString::from(".DS_Store"),
        OsString::from("__pycache__"),
        OsString::from(".venv"),
    ]);

    match get_names_in_directory(path) {
        Ok(names) if names.is_empty() => true,
        Ok(names) => names.is_subset(&deletable_names),
        Err(_) => false,
    }
}

If for some reason we can’t get a list of entries in a directory, we leave it as-is – we can’t be sure that it’s safe to delete, so err on the side of caution and don’t do anything.

I can add this new function on the end of my iterator:

let directories_to_delete = WalkDir::new(".")
    .contents_first(true)
    .into_iter()
    .filter_map(|e| e.ok())
    .filter(|e| e.file_type().is_dir())
    .filter(|e| can_be_deleted(e.path()));

and then I can iterate over this filtered list, and delete any directories which are safe to delete. It prints the path as it deletes a directory, so I can see what it’s doing:

for dir in directories_to_delete {
    match fs::remove_dir_all(dir.path()) {
        Ok(_) => println!("{}", dir.path().display()),
        Err(_) => (),
    };
}

To make this into a standalone tool, I added some tests, documentation, and a basic command-line interface using the clap crate. The CLI interface allows me to choose which directory will be searched for empty directories – either the working directory, or another directory of my choice:

$ emptydir
$ emptydir /path/to/other/directory

If you want to see the full code or install it yourself, I’ve put everything on GitHub.

Why did you make this?

Beyond the fact that finding and deleting empty directories is something I do on a semi-regular basis, there are a few reasons why I made this as a standalone project and wrote this article:

I want to make my code easier to find. I have a lot of handy tools and utilities, which I used to put in my scripts repo. But that repo is a grab bag of loosely related code, there’s not much reason for anybody else to look at it, and it’s hard for them to find the useful parts if they do.

Standalone projects with a clear purpose are more discoverable than a miscellaneous bag of bits.

Explaining my code makes it better. If I take the time to write an article that explains my code in more detail, the code always gets better. I read it more carefully, and every line gets more attention than it does during normal programming. I spot parts that are tricky or confusing, and I improve them. I also gather reference links, and I often discover something new as I do – like when I learnt that Python’s os.walk has a topdown argument as I wrote this article!

This is particularly important right now, because:

I wanted to get more practice with Rust. I like Rust as a way to write fast tools, and I want to use it more often. Informal benchmarking suggests this tool is 4–12× faster than a previous Python implementation – but more than just clock speed, this new version feels much snappier. It’s approaching the threshold where it feels instantaneous.

Although I first wrote Rust in 2016, I’m still pretty much a novice. I have no experience working in large or shared Rust codebases, and a lot of my code is fragile or unidiomatic. I’m getting the speed of Rust, but not the safety.

In this project, I tried to write more idiomatic Rust, and I’m proud of the result. For example, my older code makes liberal use of unwrap(), but this project uses proper Result types. This was a nice, small, self-contained task to get some Rust practice, and I learnt a lot.

I wrote about my Python projects in some of the earliest articles on this site, and I wince at that code now. I was still a beginner, I was still learning, and my initial code was clumsy and verbose. Today I’m a confident Python programmer, and writing those articles helped me get here. I hope to do the same with Rust, albeit over a longer period.

Today, at least, I’m proud of this code and I think it’s the best Rust I’ve written so far.