Skip to main content

Using zipstream to stream new zip files to object storage with boto3

You can construct a zipstream.ZipFile, add files, then wrap it in a file-like object to upload it with S3.upload_fileobj.

I’m writing some code to build large ZIP files and store them in S3-compatible object storage.

I could build the ZIP as a local file, and then upload it to the cloud – but that means I need to have enough local disk space to store the entire file before I begin uploading. Alternatively, I could store the zip in an in-memory buffer – but now I need lots of memory instead of lots of disk. What if I could generate the ZIP file piece-by-piece, so I don’t need to hold the whole thing at once?

I found Allan Lei’s zipstream module, which does exactly what I want:

Like Python’s ZipFile module, except it works as a generator that provides the file in many small chunks.

It took a bit of work to integrate it with boto3.

(I was using Linode object storage for all of these examples, but I think it should be the same for any S3-compatible storage that works with boto3.)

I’m using boto3 1.38.19 and zipstream 1.1.4.

You need to pass a file-like object

Here’s my first attempt, based on the code in the python-zipstream README. I construct the ZipFile, add files, then pass it directly to upload_fileobj.

import boto3
import zipstream

s3_client = boto3.client("s3")

zf = zipstream.ZipFile(mode="w")

zf.writestr(arcname="greeting.txt", data=b"Hello world!")
zf.writestr(arcname="numbers.json", data=b"[1, 2, 3, 4, 5]")

# ❌ Does not work ❌
s3_client.upload_fileobj(
    Fileobj=zf,
    Bucket="my-example-bucket",
    Key="example.zip"
)

This fails with an exception:

KeyError: 'There is no item named 8388608 in the archive'

This is because upload_fileobj expects to receive a file-like object, whereas zipstream.ZipFile is giving it an iterable of bytes. We can write a wrapper class that transforms the ZipFile to get our first working upload.

class FileLikeObject:
    """
    Wrap an iterable of ``bytes`` and turn it into a file-like object
    that can be passed to ``S3Client.upload_fileobj``.
    """

    def __init__(self, iterable):
        self.iterator = iter(iterable)
        self.buffer = b""

    def read(self, size=-1) -> bytes:
        """
        Read up to ``size`` bytes from the object and return them.

        If ``size`` is unspecified or -1, all bytes until EOF are returned.
        Fewer than ``size`` bytes may be returned if there are less than
        ``size`` bytes left in the iterator.
        """
        size: int = size or -1

        # Fill the buffer with enough bytes to fulfil the request.
        while size < 0 or len(self.buffer) < size:
            try:
                chunk = next(self.iterator)
                self.buffer += chunk
            except StopIteration:
                break

        if size < 0:
            result, self.buffer = self.buffer, b""
        else:
            result, self.buffer = self.buffer[:size], self.buffer[size:]

        return result


s3_client.upload_fileobj(
    Fileobj=FileLikeObject(zf),
    Bucket="example-bucket",
    Key="example.zip",
)

This wrapper class also gives an opportunity to capture other data as the stream is being uploaded – for example, if you wanted to get the size or checksum of the ZIP file as it’s being created.

Set allowZip64=True if you want big ZIP files

By default, ZIP files have a limit of 4 GB and 65,535 files. If you want bigger zip files, you need to use the ZIP64 extension. In zipstream, this is disabled by default.

If you try to upload a ZIP file that exceeds these limits:

zf = zipstream.ZipFile(mode="w")

for i in range(100000):
    s = str(i).zfill(6)

    zf.writestr(arcname=f'numbers/{s[1]}/{s[2]}/{s}.txt', data=s.encode("utf8"))

s3_client.upload_fileobj(
    Fileobj=FileLikeObject(zf),
    Bucket="example-bucket",
    Key="numbers.zip",
)

then the upload fails with an error:

zipfile.LargeZipFile: Files count would require ZIP64 extensions

You need to set allowZip64=True:

zf = zipstream.ZipFile(mode="w", allowZip64=True)

and then the upload succeeds.