Listing deleted secrets in AWS Secrets Manager with boto3 and the AWS CLI

Posted 5 July 2021
Tagged with aws, aws-secrets-manager, python

If you delete a secret from AWS Secrets Manager, it isn’t deleted immediately – instead, it gets scheduled for deletion. This gives you a recovery window, so you can retrieve the secret if it was deleted accidentally – but it also prevents you creating a new secret with the same name.

If you want to delete a secret more quickly, you can call the DeleteSecret API with the ForceDeleteWithoutRecovery parameter. To use this API, you need to know the ID of the secret – but what if it’s already been deleted? What if you wanted to speed up the deletion of a secret?

You can see deleted secrets in the AWS Console (notice the “Deleted on” column):

Screenshot of the Secrets Manager console. There's a table with one secret per row, and the rightmost column is titled 'Deleted on'. This column contains a date for the top five rows. — To see deleted secrets, select the gear icon in the top right-hand corner for settings, then make sure you have "Show disabled secrets" selected.

But if you call the ListSecrets API, they don’t appear. How can we retrieve deleted secrets programatically?

In this post, I’ll explain how I get a list of deleted secrets using boto3. It takes us on a surprisingly twisty journey through browser dev tools, AWS API definitions, and botocore loaders.

If you just want the answer, skip to the end.

Motivation

I can see the secrets in the console, so why not delete them there? It’s only a few clicks, and it’d be much faster than writing a script.

I wrote a script because I want to list and delete secrets repeatedly, which makes the reliability and speed of a script is more useful.

For a current project, I’m trying to write a Terraform module that somebody can use to spin up a large collection of services in a new AWS account. I’m repeatedly creating my resources, then tearing them down so I can try again. This includes include some secrets in Secrets Manager.

Because the secrets aren’t getting deleted immediately, Terraform can’t recreate the secret on the next attempt. I want a way to clear out Secrets Manager quickly, so I can get back to an empty account.

(I discovered Terraform will delete secrets immediately if you set recovery_window_in_days = 0, but I don’t want to use it because it’s not a setting I’d otherwise use. I don’t love the idea of putting a workaround for my development process in the Terraform module.)

My initial attempt

I started by looking at the boto3 documentation for list_secrets(). The response schema includes a DeletedDate field which is only present on secrets that are scheduled for deletion, so I thought something like this would work:

import boto3


def list_secrets(session):
    client = session.client("secretsmanager")

    for page in client.get_paginator("list_secrets").paginate():
        yield from page["SecretList"]


if __name__ == "__main__":
    session = boto3.Session()

    for secret in list_secrets(session):
        if "DeletedDate" in secret:
            print(secret)

But when I ran it, I didn’t see anything, even though I knew I had deleted secrets – the list_secrets() method was only finding active secrets. Hmm.

A secret parameter

I figured I couldn’t be the only person who wanted to do this, and maybe somebody else had made more progress. Searching around, I found someone with the same problem on Stack Overflow. Although they didn’t have an answer, Max Allan had found a useful clue:

In the AWS console I can see deleted secrets. A quick look at the dev tools and I can see my request payload on the Secrets Manager endpoint looks like:
{
  "method": "POST",
  "path": "/",
  "headers": {
    "Content-Type": "application/x-amz-json-1.1",
    "X-Amz-Target": "secretsmanager.ListSecrets",
    "X-Amz-Date": "Fri, 27 Nov 2020 13:19:06 GMT"
  },
  "operation": "ListSecrets",
  "content": {
    "MaxResults": 100,
    "IncludeDeleted": true,
    "SortOrder": "asc"
  },
  "region": "eu-west-2"
}
Is there any way to pass "IncludeDeleted": true to the CLI?

The other two parameters – MaxResults and SortOrder – are both documented parameters for ListSecrets. What if we pass that undocumented parameter into list_secrets()?

client.list_secrets(IncludeDeleted=True)

Unfortunately, that throws an error:

botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in input: "IncludeDeleted", must be one of: MaxResults, NextToken, Filters, SortOrder

Even though that parameter would probably do the right thing if we could get it into the HTTP request, boto3 rejects it. So what if we don’t go through boto3?

Bypassing boto3, briefly

Under the hood, boto3 and the other language-specific AWS SDKs are all making HTTP requests against the same APIs. If you look at AWS API docs, the first examples are always an HTTP request/response pair. The SDKs provide a convenient wrapper that build, sign, and send the requests, and in turn interpret the responses – but you don’t have to use them. What if you sent the HTTP request directly?

To me, the hard part is signing the requests. I’ve never done it before, and it seems like the sort of fiddly process that would take a few attempts to get right.

I tried to find a way to get boto3 to do the request signing for me, on an arbitrary request – a function that would take some a URL, some headers and a body, and do the signing process – but if one exists, I couldn’t find it. There’s nothing to suggest that’s possible in the documentation, I couldn’t find examples in Google, and the code itself is pretty complicated.

In fact, looking through the code, I couldn’t even find a method called list_secrets() in boto3 – so where does that come from?

How the language-specific AWS SDKs work

At time of writing, AWS has nine language-specific SDKs which have to support over 200 different services. Each SDK contains a client for each service, and the methods on those clients mirror the underlying HTTP APIs. It would be impractical to maintain those clients by hand – so they don’t.

Instead, AWS publish “service models” that describe each service. These models are JSON files that contain a complete description of the endpoints, the models, the documentation text, and so on. These models are used to autogenerate the service-specific clients in the SDKs, reducing the effort required to keep everything up-to-date. This approach has also allowed other people to write SDKs in languages that AWS don’t support, like Haskell and Clojure.

These models are a bit like Swagger, although they use an AWS-specific syntax. Here’s part of the Secrets Manager service model:

{
  "operations": {
    "ListSecrets": {
      "name": "ListSecrets",
      "http": {"method": "POST", "requestUri": "/"},
      "input": {"shape": "ListSecretsRequest"},
      "output": {"shape": "ListSecretsResponse"},
      "errors":[
        {"shape": "InvalidParameterException"},
        {"shape": "InvalidNextTokenException"},
        {"shape": "InternalServiceError"}
      ],
      "documentation": "Lists all of the secrets that are stored by Secrets Manager in the AWS account. …"
    },
    …
  },
  "shapes": {
    "ListSecretsRequest": {
      "type": "structure",
      "members": {
        "MaxResults": { … },
        "NextToken": { … },
        "Filters": { … },
        "SortOrder": { … }
      }
    },
    …
  },
  "documentation": "<fullname>AWS Secrets Manager API Reference</fullname> …",
  …
}

Looking at the operations object, we can see that the ListSecrets API requires a POST request to /, and it takes an instance of the ListSecretsRequest model (here called a “shape”). It returns an instance of ListSecretsResponse, or one of three different error types.

In turn, the shapes object tells us that the ListSecretsRequest model takes four different parameters (and in turn their types, and a description of how to use them).

You’ll find a copy of these service models in every SDK – for the Python SDK, they’re in the “data” directory of the botocore library. The methods are generated at runtime, which is why I couldn’t find a list_secrets() method in the codebase. For compiled languages like Java or C++, these models are used to generate source code.

I couldn’t find much publicly available documentation that describes how these service models work. I found a comment from Norm Johanson, an AWS employee, on the .NET AWS SDK repo:

AWS services use a few different internal tools and technologies to model their APIs. None of them are swagger. You have to remember many AWS services predate swagger. The json files are created using some tools we have to translate the models service teams write into a common format that all of the AWS SDKs can use.

Long term wise is to have services use our new open source protocol-agnostic interface definition language called Smithy.

I also found a Reddit comment from FatherJohnKissMe, where an AWS Principal Engineer explains a bit more of their internal process.

Scant documentation aside, the existence of service models suggests a way forward: what if we modified this file to add the IncludeDeleted parameter?

Modifying the API definition with loaders

I started by modifying the service model in my installed copy of botocore. I added a new member to the ListSecretsRequest shape:

       "members": {
+        "IncludeDeleted": {
+          "shape": "BooleanType",
+          "documentation": "<p>(Optional) If set, includes secrets that are disabled.</p>"
+        },
         "MaxResults": {

Once I’d done that, I could call client.list_secrets(IncludeDeleted=True), the function would work, and the response included deleted secrets – but this is pretty brittle. Next time I update botocore, this change will likely be reverted.

After digging through the botocore code some more, I discovered the idea of “loaders”, which have their own documentation. The loaders are classes that look for files with service models, and make them available to the boto3 clients:

def _load_service_model(self, service_name, api_version=None):
    json_model = self._loader.load_service_model(service_name, 'service-2',
                                                 api_version=api_version)
    service_model = ServiceModel(json_model, service_name=service_name)
    return service_model

So when you call, say, boto3.client("s3"), at some point a loader is called with load_service_model("s3", "service-2"). The service-2 tells the loader to look for a v2 service model.

The loader documentation explains that there are two default paths where it searches for these data definitions:

~/.aws/models
<botocore root>/data

The first path is for users to drop in new models that override the models that ship with botocore – which is just what I want to do.

I reverted my change to botocore proper, and instead copied my modified copy of the Secrets Manager service model to ~/.aws/model. Now my changes will survive an update to botocore, but I’m still duplicating the entire service model, even though I only want a small change. If the service model changes, I’ll need to update my patched copy.

Reading the rest of the loader documentation, there’s an intriguing notion of “extras” as a way to add additional parameters:

The sdk-extras and similar files represent extra data that needs to be applied to the model after it is loaded. Data in these files might represent information that doesn’t quite fit in the original models, but is still needed for the sdk. For instance, additional operation parameters might be added here which don’t represent the actual service api.

Like everything else around AWS service models, the documentation is pretty sparse, but it was the final clue I needed. By looking at examples of extras for other service models, I was able to write a Secrets Manager extra that describes the undocumented parameter:

{
  "version": 1.0,
  "merge": {
    "shapes": {
      "ListSecretsRequest": {
        "members": {
          "IncludeDeleted": {
            "shape": "BooleanType",
            "documentation": "<p>If set, includes secrets that are disabled.</p>"
          }
        }
      }
    }
  }
}

Because this extra is only describing the change, rather than reproducing the entire service model, it should be more robust to changes in botocore or the underlying service model.

As a nice side benefit, because the AWS CLI is written in Python and uses boto3, this change also affects the CLI. It gets a new --include-deleted/--no-include-deleted flag, which is even added to the CLI help text.

This turned out to be way, way more complicated than I thought it would be. I thought I’d write a quick script, a short blog post, and be done. Instead, I find myself knee-deep in rabbit holes – but I learnt a lot, and I got something that works!

Putting it all together

First, save the following file to ~/.aws/models/secretsmanager/2017-10-17/service-2.sdk-extras.json:

{
  "version": 1.0,
  "merge": {
    "shapes": {
      "ListSecretsRequest": {
        "members": {
          "IncludeDeleted": {
            "shape": "BooleanType",
            "documentation": "<p>If set, includes secrets that are disabled.</p>"
          }
        }
      }
    }
  }
}

Then use the following script:

import boto3


def list_secrets(session, **kwargs):
    client = session.client("secretsmanager")

    for page in client.get_paginator("list_secrets").paginate(**kwargs):
        yield from page["SecretList"]


if __name__ == "__main__":
    session = boto3.Session()

    for secret in list_secrets(session, IncludeDeleted=True):
        if "DeletedDate" in secret:
            print(secret)

Or run the following CLI command:

aws secretsmanager list-secrets --include-deleted