Skip to main content

Playing with 404 pages

Until yesterday, mistyped or broken URLs would just show the generic GitHub Pages 404 page. It conveys the error, but it’s not very useful.

Brett Terpstra does something rather clever with his 404 pages: he reads the URL, and tries to guess where you were trying to go. Single-character typos or transpositions get redirected automatically, and if it’s not obvious where you were trying to go, he gives a list of suggestions.

He wrote about some of this in Fun with intelligent 404 pages, and I decided to try to build a version of my own. My system isn’t as sophisticated as Brett’s, but it was still a fun problem to tackle.

The site starts as a collection of Markdown files, which get processed by Pelican (my blogging engine) and turned into HTML. I have a few small scripts which tidy up the HTML, and then the directory of HTML outputs gets pushed to GitHub Pages, where it gets served to the web.

I’ve added a new script which walks the output directory, and gets a list of every URL on the site. This list gets saved into a file called search.json1, and that file gets uploaded to GitHub.

On the 404 page itself, I load search.json, and then I do fuzzy matching between the actual URL, and the list of valid URLs. I’m using Glen Chiacchieri’s fuzzyset.js library to do the fuzzy matching. It was really easy to get the matching in pure JavaScript:

<script type="text/javascript" src="/theme/js/fuzzyset.js"></script>
<script type="text/javascript" src="/search.json"></script>

<script>
  a = FuzzySet(urls);
  matching_urls = a.get(String(window.location.href));

  // display the results on the page
</script>

After that, it’s just a matter of displaying the results on the page.

The next step is adding those “smart” redirects, like Brett. Unfortunately I can’t find a post explaining exactly what he’s doing, but all of these URLs will redirect correctly:

I have yet to experiment with redirects on GitHub Pages, so that’s an idea I’ll put off for another day.

You can see my simple URL matching version at 404.html. I’ll continue to tweak the parameters to improve the matching (and hide some of the more extreme results). If I ever implement the smart redirection feature, I’ll write about it again here.

  1. Technically this is a JavaScript file, not JSON, but I just copied it from Brett and can’t think of a better name. ↩︎