Skip to main content

HTML strings may not be equivalent if you minify them

There’s a lot of whitespace in HTML which looks irrelevant at first glance, but may be significant and cause the document to render differently.

I was working on some HTML templates, and I wanted to test they were creating the right HTML. In particular, I wanted to write an assertion that the actual HTML and the expected HTML were the same.

The actual HTML was minified, for example:

<ul><li>Hello world</li><li>Hello world</li><li>Hello world</li></ul>

This is tricky to read, and I wanted the expected HTML to be prettified, such as:

<ul>
  <li>
    Hello world
  </li>
  <li>
    Hello world
  </li>
  <li>
    Hello world
  </li>
</ul>

These two strings aren’t equal, but I thought they might be equivalent because the whitespace inside the <li> tags gets collapsed and they look visually the same:

minified:

  • Hello world
  • Hello world
  • Hello world

prettified:

  • Hello world
  • Hello world
  • Hello world

For a while I tried various approaches to parse the HTML so I could treat it as something other than a string, and compare it that way – e.g. parsing it as a DOM element and using isEqualNode() – but I couldn’t find a way to prove the HTML was equivalent without minifying it.

And then it hit me: I can’t prove these two HTML elements are equivalent, because they’re not. They may render the same right now, but that’s because my browser is collapsing the whitespace. But consider what happens if I tell the browser to treat the whitespace as significant:

li { white-space: pre; }

Now the two lists look different:

minified:

  • Hello world
  • Hello world
  • Hello world

prettified:

  • Hello world
  • Hello world
  • Hello world

So the whitespace can be significant!

This is one of those things that makes sense now I think about it, but it’s helpful to have a working example.

I already knew that <pre> is whitespace-sensitive, but that feels like a special snowflake tag. It occurred to me that this could affect other tags when I remembered that <pre>’s special handling of whitespace is really just a CSS rule which is applied by default.