Text and Unicode

Storing text in computers gets tricky, especially when you want to represent more than the basic Latin alphabet A–Z. The Unicode standard is the most popular character encoding, which can handle 172 different scripts.

These entries are about interesting things I’ve learnt while working with text and Unicode.

3 articles

Operations on strings don’t always commute · Text and Unicode · 22 Sep 2021
Is uppercasing then reversing a string the same as reversing and then uppercasing? Of course not.
Using fuzzy string matching to find duplicate tags · Python · 6 Aug 2020
Another example of why strings are terrible · Text and Unicode · 1 Dec 2016
Pop quiz: if I lowercase a string, does it still have the same length as the original string?

4 notes

When fixing mojibake, use ftfy.fix_and_explain() to understand how it’s fixing a piece of text · Python · 26 Apr 2025
Editing a filename in Finder will convert it to NFD · macOS · 21 Dec 2024
Even if the filename looks the same, it may be invisibly converted to a different sequence of bytes.
How to create flag emojis for countries in Python · Python · 10 Jan 2024
Use Unicode property escapes to detect emoji in JavaScript · JavaScript · 6 Sep 2023

Text and Unicode

3 articles

Operations on strings don’t always commute · Text and Unicode · 22 Sep 2021

Using fuzzy string matching to find duplicate tags · Python · 6 Aug 2020

Another example of why strings are terrible · Text and Unicode · 1 Dec 2016

4 notes

When fixing mojibake, use ftfy.fix_and_explain() to understand how it’s fixing a piece of text · Python · 26 Apr 2025

Editing a filename in Finder will convert it to NFD · macOS · 21 Dec 2024

How to create flag emojis for countries in Python · Python · 10 Jan 2024

Use Unicode property escapes to detect emoji in JavaScript · JavaScript · 6 Sep 2023

When fixing mojibake, use `ftfy.fix_and_explain()` to understand how it’s fixing a piece of text · Python · 26 Apr 2025