The COUNT(X) function only counts non-null values
I was running a tally in a photos table with a text column photos, and I got a surprising result:
sqlite> SELECT media, COUNT(media) FROM photos GROUP BY media;
|0
photo|1813253
video|1131
Why is there a value with 0 results in the tally? That seems odd!
The issue turned out to be a misunderstanding on my part about the COUNT(X) function – it only counts the number of times that X is not NULL.
I had a number of null rows in the database, but they weren’t being counted:
sqlite> SELECT COUNT(*) FROM photos WHERE media IS NULL;
340
The tally query I actually want uses COUNT(*), not COUNT(X):
sqlite> SELECT media, COUNT(*) FROM photos GROUP BY media;
|340
photo|1813253
video|1131
Other thoughts/clues
- I initially asked ChatGPT, which told me that - GROUP BYcan find deleted rows. I haven’t been able to find an independent source for this, so I think it might be a hallucination.- It did suggest running the - VACUUMcommand as a way to trigger some sort of database rebuild, but that didn’t change the behaviour I was seeing.
- One of ChatGPT’s other suggestions was creating a new table – that felt like a more helpful suggestion, because it might force a table rebuild and flush any transient data. - I ended up running the following lines repeatedly, adjusting the limit/offset and trying to cut down the table to find the rows that triggered the problem: - DROP TABLE new_photos; CREATE TABLE new_photos AS SELECT id, media FROM photos LIMIT 5 OFFSET 54850; SELECT media, COUNT(media) FROM new_photos GROUP BY media;- Once I’d got a small enough set, I inspected those rows manually and found the data that was causing the mysterious tally behaviour: - sqlite> SELECT id, media FROM photos LIMIT 5 OFFSET 54850; 14541672258|photo 14541680439|photo 14728030192|photo 14725963694|photo 52966470900|- This was the clue I needed to go double-check how - COUNT(X)works.
- The tally should count every row in the database. If I’d cross-checked the total of the tally with the number of rows in the database, I’d have noticed the missing rows: - sqlite> SELECT COUNT(*) FROM photos; 1814724- but the initial tally has 1813253 + 1131 = 1814384, which is short.