Creating preview thumbnails of PDF documents
Whenever I get an important document, I scan it or save it as a PDF. It’s a format that seems pretty likely to remain readable in the medium-term1, even if I start using a different computer or operating system. I’ve created a small app for managing my scans (docstore, code is on GitHub), and as part of the app I create small thumbnails of each PDF. The thumbnails make it easier to skim a list of documents, especially when companies use a consistent letterhead.
Here’s an example of some thumbnails, for the documents for a recent trip:
If I’m searching the list, the turquoise of the Trainline email stands out against, say, the dark green stripe used by my bank.
When I was working out how to do this, I found a lot of Google results for PDF thumbnails in other applications – the macOS Finder, Windows Explorer, Adobe Acrobat – but not much on creating them if you’re writing your own code.
After experimenting with a couple of different tools, I found one I like and which works consistently. The tool I’m use is pdftocairo, a command-line tool that converts PDFs to images.
Here’s the command I use:
$ pdftocairo my_document.pdf -jpg -singlefile -scale-to-x 400 -scale-to-y -1
These creates a new file
my_document.jpg in the same directory, which is a 400-pixel wide preview of the first page.
I’m using the following options:
-jpgcreates a JPEG output file. I’ve experimented a bit and the format doesn’t seem to make much difference for size/quality, so I picked JPEG somewhat arbitrarily.
-singlefileis an option that justs gets the first page.
-scale-to-x 400resizes the image to 400 pixels wide. This doesn’t preserve the aspect ratio automatically – it just squishes the document without changing the height. Adding
-scale-to-y -1gets it to resize the height to match.
The quality varies at larger sizes (particularly with font rendering if you don’t have the right fonts installed), but for creating small thumbnails the images look fine. I’ve used a wrapper around this utility for several thousand documents now, and they’ve all worked a treat.