Skip to main content

Using Loris for IIIF at Wellcome

I wrote this article while I was working at Wellcome Collection. It was originally published on their Stacks blog under a CC BY 4.0 license, and is reposted here in accordance with that license.

We’ve recently finished making over 100k images available to search on the new Wellcome Collection website, freely available and CC-licensed.

But how do we serve these images? Many of these are high-resolution archival or medical images, meant to capture lots of detail. The image files are big—routinely thousands of pixels wide, several megabytes a piece. If you wanted to skim a book, and the book had a hundred pages, you’d download over a gigabyte of data. Ouch!

We’re serving lots of images, and we want them to be fast. We want to send small, tightly-focused files—just the data you want, and nothing else — so we provide our images through a IIIF Image Server.

In this post, I’ll explain a little of what IIIF is, and how we provide it at Wellcome.

What is IIIF?

IIIF is the International Image Interoperability Framework, which is a set of standards for serving large images. It’s an open standard used by many libraries and archives, and it provides a way to combine resources from different collections. When a client asks for an image, they can make a very particular request — such as size, or crop, or rotation—and the server prepares the exact image they want, and sends that along. It’s a practical way to deal with really large images.

Let’s look at a few examples of how we’re already using it.

In a page of search thumbnails, we request very small images, which load much faster than the full-sized files.

A page of results if you search for “london”.

On a page showing a single image, we can display a bigger image, but still not the full-sized version.

An individual works page, showing vaccination points for smallpox. Credit: Science Museum, London. CC BY.

If we really want to see all the detail, we can zoom in further. Because a IIIF server allows us to request not just specific sizes, but specific crops, even this doesn’t need us to load the whole image. We can load the parts of the image we’re focusing on, and skip the rest.

A close-up view of an individual smallpox packet. Text: “Liverpool Station, Dec 4th 1913, No 48 (?), 50 Points.” Credit: Science Museum, London. CC BY.

In this example, we’ve loaded a full resolution tile for this particular card, but only that card. We’ve not downloaded the rest of the image yet, and we won’t until we move to look at a different region.

Because IIIF is a widely-used standard, there are lots of tools that can use it. For example, the zoom above is provided by OpenSeadragon, a client which knows how IIIF works, and how to request specific tiles. If we’d written our own bespoke image API, we’d have to write those tools ourselves.

Wellcome was a founder and early adopter of the IIIF ecosystem.

Picking a IIIF Image server

We knew we wanted to use an existing (ideally open source) IIIF Image server.

Writing our own was out of the question—it would have taken months to develop and test, and at the end of it we’d be the only user. It’s much better to use an existing server that other people are already using—we know it works in production, we all share the benefits of testing and bugfixes, and we can contribute to the wider community. Image manipulation is a massive problem space, with lots of edge cases. It’s where open source really shines: together, we’re more likely to find (and fix!) all the weird corners.

An illustration of two Lorises. “The Slender Loris, in waking and sleeping posture.” Taken from Royal Natural History volume 1, page 230. Public domain.

We decided to use Loris, a IIIF Image API server available under a 2-clause BSD license. It’s a Python web application, with a lot of processing managed by libraries like Pillow, Werkzeug and cryptography. That means Loris itself is a fairly small wrapper around these libraries, focused on providing the IIIF API—it does one thing, and it does it well.

Loris provides a standard WSGI server, which we run with uWGSI. This made it really easy to get up-and-running when we first tried it, which is one of the reasons we chose it. I sat down to play with it for an afternoon, and accidentally created our first production instance!

Architecture

All our applications run as Docker containers inside Amazon ECS, and Loris is no different.

Our architecture diagram. Requests arrive from users on the right, where they’re received by our CDN/caching layer (CloudFront). If the request isn’t cached, it’s sent to Loris running inside Docker containers on ECS, which fetch the full-sized image from S3.

In fact, we run two containers per Loris instance: a Python container which runs uWSGI, and an Nginx container which acts as a proxy. This is a common pattern, both for Python web applications generally, and in our platform. Nginx provides robust HTTP support and connection handling, and runs in front of Loris and our Scala applications.

Our full-sized image files, are stored in Amazon S3. When Loris receives a request, it starts by fetching the full-size image from S3, before it produces the specific image that it sends to the client. Loris fetches the images over HTTP, rather than using the AWS APIs, so we could easily swap out S3 as a backend if we wanted to.

In front of Loris we have Amazon CloudFront. This acts as a CDN and a caching layer. If a user asks for an image that has already been requested once, it gets served from CloudFront instead of Loris—so it arrives faster, and reduces load on our containers.

Monitoring

We monitor Loris with CloudWatch alarms. Whenever Loris returns a 500 error, it trips an alarm, which posts a message in our Slack channel:

A screenshot of one of our Slack alarms. Text: “The ALB spotted a 500 error in Loris at 02:45:00 on 13 Dec 2017.” “ALB” is “Application Load Balancer”, an AWS service which routes requests to individual Docker containers.

This means we’re instantly notified of any problem, and we can respond quickly.

We’ve used this information to make a number of stability fixes, and because Loris is open source, we can share them with everybody. Since we started using Loris, I’ve become one of the Loris maintainers because of my patches.

The future

We’re very happy with our current Loris set up, and we’re not planning any immediate changes. As we expose more works on the new Wellcome Collection website, we’ll serve many more images through Loris, but this setup will probably be around for a while.

If you want to run Loris yourself, you can find our Dockerfile, Loris configuration and Terraform config in our GitHub repo. You can also find Loris itself on GitHub, with all the code and installation instructions.

Thanks to Jonathan Tweed, Natalie Pollecutt, Robert Kenny, and Tom Scott for reviewing drafts of this post.