Posts tagged with “unicode”:

Another example of why strings are terrible

Here’s a programming assumption I used to make, that until today I’d never really thought about: changing the case of a string won’t change its length.

Now, thanks to Hypothesis, I know better:

>>> x = u'İ'
>>> len(x)
1
>>> len(x.lower())
2

I’m not going to pretend I understand enough about Unicode or Python’s string handling to say what’s going on here.

I discovered this while testing a moderately fiddly normalisation routine – this routine would normalise the string to lowercase, unexpectedly tripping a check that it was the right length. If you’d like to see this for yourself, here’s a minimal example:

from hypothesis import given, strategies as st

@given(st.text())
def test_changing_case_preserves_length(xs):
    assert len(xs) == len(xs.lower())