Skip to main content

Whose code am I running in GitHub Actions?

A week ago, somebody added malicious code to the tj-actions/changed-files GitHub Action. If you used the compromised action, it would leak secrets to your build log. Those build logs are public for public repositories, so anybody could see your secrets. Scary!

Mutable vs immutable references

This attack was possible because it’s common practice to refer to tags in a GitHub Actions workflow, for example:

jobs:
  changed_files:
    ...
    steps:
      - name: Get changed files
        id: changed-files
        uses: tj-actions/changed-files@v2
      ...

At a glance, this looks like an immutable reference to an already-released “version 2” of this action, but actually this is a mutable Git tag. If somebody changes the v2 tag in the tj-actions/changed-files repo to point to a different commit, this action will run different code the next time it runs.

If you specify a Git commit ID instead (e.g. a5b3abf), that’s an immutable reference that will run the same code every time.

Tags vs commit IDs is a tradeoff between convenience and security. Specifying an exact commit ID means the code won’t change unexpectedly, but tags are easier to read and compare.

Do I have any mutable references?

I wasn’t worried about this particular attack because I don’t use tj-actions, but I was curious about what other GitHub Actions I’m using. I ran a short shell script in the folder where I have local clones of all my repos:

find . -path '*/.github/workflows/*' -type f -name '*.yml' -print0 \
  | xargs -0 grep --no-filename "uses:" \
  | sed 's/\- uses:/uses:/g' \
  | tr '"' ' ' \
  | awk '{print $2}' \
  | sed 's/\r//g' \
  | sort \
  | uniq --count \
  | sort --numeric-sort

This prints a tally of all the actions I’m using. Here’s a snippet of the output:

 1 hashicorp/setup-terraform@v3
 2 dtolnay/rust-toolchain@v1
 2 taiki-e/create-gh-release-action@v1
 2 taiki-e/upload-rust-binary-action@v1
 4 actions/setup-python@v4
 6 actions/cache@v4
 9 ruby/setup-ruby@v1
31 actions/setup-python@v5
58 actions/checkout@v4

I went through the entire list and thought about how much I trust each action and its author.

I feel pretty good about my list. Most of my actions are from large organisations, and the rest are a few actions specific to my Rust command-line tools which are non-critical toys, where the impact of a compromised GitHub repo would be relatively slight.

How this script works

This is a classic use of Unix pipelines, where I’m chaining together a bunch of built-in text processing tools. Let’s step through how it works.

find . -path '*/.github/workflows/*' -type f -name '*.yml' -print0

This looks for any GitHub Actions workflow file – any file whose name ends with .yml in a folder like .github/workflows/. It prints a list of filenames, like:

./alexwlchan.net/.github/workflows/build_site.yml
./books.alexwlchan.net/.github/workflows/build_site.yml
./concurrently/.github/workflows/main.yml

It prints them with a null byte (\0) between them, which makes it possible to split the filenames in the next step. By default it uses a newline, but a null byte is a bit safer, in case you have filenames which include newline characters.

I know that I always use .yml as a file extension, but if you sometimes use .yaml, you can replace -name '*.yml' with \( -name '*.yml' -o -name '*.yaml' \)

I have a bunch of local repos that are clones of open-source projects, and not my code, so I care less about what GitHub Actions they’re using. I excluded them by adding extra -path rules, like -not -path './cpython/*'.

xargs -0 grep --no-filename "uses:"

Then we use xargs to go through the filenames one-by-one. The `-0` flag tells it to split on the null byte, and then it runs grep to look for lines that include "uses:" – this is how you use an action in your workflow file.

The --no-filename option means this just prints the matching line, and not the name of the file it comes from. Not all of my files are formatted or indented consistently, so the output is quite messy:

    - uses: actions/checkout@v4
        uses: "actions/cache@v4"
      uses: ruby/setup-ruby@v1

sed 's/\- uses:/uses:/g' \

Sometimes there's a leading hyphen, sometimes there isn’t – it depends on whether uses: is the first key in the YAML dictionary. This sed command replaces "- uses:" with "uses:" to start tidying up the data.

    uses: actions/checkout@v4
        uses: "actions/cache@v4"
      uses: ruby/setup-ruby@v1

I know sed is a pretty powerful tool for making changes to text, but I only know a couple of simple commands, like this pattern for replacing text: sed 's/old/new/g'.

tr '"' ' '

Sometimes the name of the action is quoted, sometimes it isn’t. This command removes any double quotes from the output.

    uses: actions/checkout@v4
        uses: actions/cache@v4
      uses: ruby/setup-ruby@v1

Now I’m writing this post, it occurs to me I could use sed to make this substitution as well. I reached for tr because I've been using it for longer, and the syntax is simpler for doing single character substitutions: tr '<oldchar>' '<newchar>'

awk '{print $2}'

This splits the string on spaces, and prints the second token, which is the name of the action:

actions/checkout@v4
actions/cache@v4
ruby/setup-ruby@v1

awk is another powerful text utility that I’ve never learnt properly – I only know how to print the nth word in a string. It has a lot of pattern-matching features I’ve never tried.

sed 's/\r//g'

I had a few workflow files which were using carriage returns (\r), and those were included in the awk output. This command gets rid of them, which makes the data more consistent for the final step.

sort | uniq --count | sort --numeric-sort

This sorts the lines so identical lines are adjacent, then it groups and counts the lines, and finally it re-sorts to put the most frequent lines at the bottom.

I have this as a shell alias called tally.

   6 actions/cache@v4
   9 ruby/setup-ruby@v1
  59 actions/checkout@v4

This step-by-step approach is how I build Unix text pipelines: I can write a step at a time, and gradually refine and tweak the output until I get the result I want. There are lots of ways to do it, and because this is a script I’ll use once and then discard, I don’t have to worry too much about doing it in the “purest” way – as long as it gets the right result, that’s good enough.

If you use GitHub Actions, you might want to use this script to check your own actions, and see what you’re using. But more than that, I recommend becoming familiar with the Unix text processing tools and pipelines – even in the age of AI, they’re still a powerful and flexible way to cobble together one-off scripts for processing data.