Whose code am I running in GitHub Actions?
A week ago, somebody added malicious code to the tj-actions/changed-files GitHub Action. If you used the compromised action, it would leak secrets to your build log. Those build logs are public for public repositories, so anybody could see your secrets. Scary!
Mutable vs immutable references
This attack was possible because it’s common practice to refer to tags in a GitHub Actions workflow, for example:
jobs:
changed_files:
...
steps:
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v2
...
At a glance, this looks like an immutable reference to an already-released “version 2” of this action, but actually this is a mutable Git tag. If somebody changes the v2
tag in the tj-actions/changed-files
repo to point to a different commit, this action will run different code the next time it runs.
If you specify a Git commit ID instead (e.g. a5b3abf
), that’s an immutable reference that will run the same code every time.
Tags vs commit IDs is a tradeoff between convenience and security. Specifying an exact commit ID means the code won’t change unexpectedly, but tags are easier to read and compare.
Do I have any mutable references?
I wasn’t worried about this particular attack because I don’t use tj-actions
, but I was curious about what other GitHub Actions I’m using. I ran a short shell script in the folder where I have local clones of all my repos:
find . -path '*/.github/workflows/*' -type f -name '*.yml' -print0 \
| xargs -0 grep --no-filename "uses:" \
| sed 's/\- uses:/uses:/g' \
| tr '"' ' ' \
| awk '{print $2}' \
| sed 's/\r//g' \
| sort \
| uniq --count \
| sort --numeric-sort
This prints a tally of all the actions I’m using. Here’s a snippet of the output:
1 hashicorp/setup-terraform@v3
2 dtolnay/rust-toolchain@v1
2 taiki-e/create-gh-release-action@v1
2 taiki-e/upload-rust-binary-action@v1
4 actions/setup-python@v4
6 actions/cache@v4
9 ruby/setup-ruby@v1
31 actions/setup-python@v5
58 actions/checkout@v4
I went through the entire list and thought about how much I trust each action and its author.
-
Is it from a large organisation like
actions
orruby
? They’re not perfect, but they’re likely to have good security procedures in place to protect against malicious changes. -
Is it from an individual developer or small organisation? Here I tend to be more wary, especially if I don’t know the author personally. That’s not to say that individuals can’t have good security, but there’s more variance in the security setup of random developers on the Internet than among big organisations.
-
Do I need to use somebody else’s action, or could I write my own script to replace it? This is what I generally prefer, especially if I’m only using a small subset of the functionality offered by the action. It’s a bit more work upfront, but then I know exactly what it’s doing and there’s less churn and risk from upstream changes.
I feel pretty good about my list. Most of my actions are from large organisations, and the rest are a few actions specific to my Rust command-line tools which are non-critical toys, where the impact of a compromised GitHub repo would be relatively slight.
How this script works
This is a classic use of Unix pipelines, where I’m chaining together a bunch of built-in text processing tools. Let’s step through how it works.
-
find . -path '*/.github/workflows/*' -type f -name '*.yml' -print0
-
This looks for any GitHub Actions workflow file – any file whose name ends with
.yml
in a folder like.github/workflows/
. It prints a list of filenames, like:./alexwlchan.net/.github/workflows/build_site.yml
./books.alexwlchan.net/.github/workflows/build_site.yml
./concurrently/.github/workflows/main.ymlIt prints them with a null byte (
\0
) between them, which makes it possible to split the filenames in the next step. By default it uses a newline, but a null byte is a bit safer, in case you have filenames which include newline characters.I know that I always use
.yml
as a file extension, but if you sometimes use.yaml
, you can replace-name '*.yml'
with\( -name '*.yml' -o -name '*.yaml' \)
I have a bunch of local repos that are clones of open-source projects, and not my code, so I care less about what GitHub Actions they’re using. I excluded them by adding extra
-path
rules, like-not -path './cpython/*'
. -
xargs -0 grep --no-filename "uses:"
-
Then we use
xargs
to go through the filenames one-by-one. The `-0` flag tells it to split on the null byte, and then it runsgrep
to look for lines that include"uses:"
– this is how you use an action in your workflow file.The
--no-filename
option means this just prints the matching line, and not the name of the file it comes from. Not all of my files are formatted or indented consistently, so the output is quite messy:- uses: actions/checkout@v4
uses: "actions/cache@v4"
uses: ruby/setup-ruby@v1 -
sed 's/\- uses:/uses:/g' \
-
Sometimes there's a leading hyphen, sometimes there isn’t – it depends on whether
uses:
is the first key in the YAML dictionary. Thissed
command replaces"- uses:"
with"uses:"
to start tidying up the data.uses: actions/checkout@v4
uses: "actions/cache@v4"
uses: ruby/setup-ruby@v1I know
sed
is a pretty powerful tool for making changes to text, but I only know a couple of simple commands, like this pattern for replacing text:sed 's/old/new/g'
. -
tr '"' ' '
-
Sometimes the name of the action is quoted, sometimes it isn’t. This command removes any double quotes from the output.
uses: actions/checkout@v4
uses: actions/cache@v4
uses: ruby/setup-ruby@v1Now I’m writing this post, it occurs to me I could use
sed
to make this substitution as well. I reached fortr
because I've been using it for longer, and the syntax is simpler for doing single character substitutions:tr '<oldchar>' '<newchar>'
-
awk '{print $2}'
-
This splits the string on spaces, and prints the second token, which is the name of the action:
actions/checkout@v4
actions/cache@v4
ruby/setup-ruby@v1awk
is another powerful text utility that I’ve never learnt properly – I only know how to print the nth word in a string. It has a lot of pattern-matching features I’ve never tried. -
sed 's/\r//g'
-
I had a few workflow files which were using carriage returns (
\r
), and those were included in theawk
output. This command gets rid of them, which makes the data more consistent for the final step. -
sort | uniq --count | sort --numeric-sort
-
This sorts the lines so identical lines are adjacent, then it groups and counts the lines, and finally it re-sorts to put the most frequent lines at the bottom.
I have this as a shell alias called
tally
.6 actions/cache@v4
9 ruby/setup-ruby@v1
59 actions/checkout@v4
This step-by-step approach is how I build Unix text pipelines: I can write a step at a time, and gradually refine and tweak the output until I get the result I want. There are lots of ways to do it, and because this is a script I’ll use once and then discard, I don’t have to worry too much about doing it in the “purest” way – as long as it gets the right result, that’s good enough.
If you use GitHub Actions, you might want to use this script to check your own actions, and see what you’re using. But more than that, I recommend becoming familiar with the Unix text processing tools and pipelines – even in the age of AI, they’re still a powerful and flexible way to cobble together one-off scripts for processing data.