Listing keys in an S3 bucket with Python

A lot of my recent work has involved batch processing on files stored in Amazon S3. It’s been very useful to have a list of files (or rather, keys) in the S3 bucket – for example, to get an idea of how many files there are to process, or whether they follow a particular naming scheme.

The AWS APIs (via boto3) do provide a way to get this information, but API calls are paginated and don’t expose key names directly. It’s a bit fiddly, and I don’t generally care about the details of the AWS APIs when using this list – so I wrote a wrapper function to do it for me. All the messiness of dealing with the S3 API is hidden in general use.

Since this function has been useful in lots of places, I thought it would be worth writing it up properly.

