September 26, 2023

How Errors Compound

I was messing around with a project recently, and I thought everything was going swimmingly. My unit tests were passing, the output to stdin looked like I’d expect. And then I made one small change, and instead of the binary number I was expecting, I saw:

1010101001001020

“Ones and zeroes everywhere. And I think I saw a two!”; “It was just a dream, Bender. There’s no such thing as a two.” I added a bunch more unit tests but they didn’t seem to be finding the two. It took me quite a while to figure out what the problem was because there were in fact two problems: one with the function code, and one with the tests.

Background

First, a little background. I’ve seen a few examples of so-called “binary clocks” where what they do is give you a binary representation of the digits in a standard 24-hour clock. There’s a binary representation of the hours, a binary representation of the minutes, and one for the seconds. That’s no fun. What I wanted to do was create a binary clock that just gives you a binary number that represents how far through the day you are. So it starts at 000000 at midnight and ticks through until hitting 111111 just before midnight the next day. 100000 is midday. It is currently 110101 which is about 9pm in old money. So that’s what I wanted to do: take a number of seconds elapsed since midnight, and turn it into a binary number that represents how far through the day you are.

Now, six digits only gives you one tick every 22 and a half minutes, so that’s no good. There are 86400 seconds in a day (give or take leap seconds) and $2^{16} = 65536$ which is near enough. Better than one tick every two seconds. Sixteen bits is probably a good enough resolution for a clock, I think.

The code

It’s a small enough project that I’m just going to replicate most of it here. Before continuing, see if you can find both problems.

import pytest
from hypothesis import given
from hypothesis.strategies import integers, floats

SECONDS_IN_DAY = 86400
# 24*60*60
    
# The actual function I'm testing
def binclock(seconds, precision = 6):
    """Represent elapsed time as binary number."""
    digits = []
    rem = seconds
    divisor = SECONDS_IN_DAY
    for i in range(1,precision+1):
        divisor >>= 1
        digits.append(rem//divisor)
        rem %= divisor
    return "".join([str(int(x)) for x in digits])

# Fixtures and helper functions for tests
seconds_ints = integers(min_value=0,max_value = 86399)

def check_binary(x,precision=6):
    return all([y in "01" for y in binclock(x)])

# Tests
class TestBinClock:
    def test_first_division(self):
        assert binclock(0, precision = 1) == "0"
        assert binclock(43201, precision=1) == "1"

    def test_second_division(self):
        assert binclock(0, precision=2) == "00"
        assert binclock(43201,precision=2) == "10"
        assert binclock(21601, precision=2) == "01"
        
    @given(seconds_ints)
    def test_binary_six(self,x):
        assert check_binary(x)

    @given(seconds_ints)
    def test_binary_seven(self,x):
        assert check_binary(x, 7)
    
    @given(seconds_ints)
    def test_binary_eight(self,x):
        assert check_binary(x,8)

    @given(seconds_ints)
    def test_binary_sixteen(self,x):
        assert check_binary(x,16)

    def test_nearby_range(self):
        seconds = 57066
        for i in range(300):
            assert check_binary(seconds+i)

    @given(floats(min_value = 0,max_value=86399.999))
    def test_fractions(self,x):
        assert check_binary(x)
        
    def test_specific_fraction(self):
        assert check_binary(57468.627816)

What’s going on?

Now, all these tests pass. And you can see me getting more desperate as I add tests trying to make it fail. But I saw a two! There shouldn’t be a two.

I decided to try using the hypothesis module, to test a bunch of values. This didn’t seem to help.

Do some debugging… I saw a two at about 57066 seconds! Nope. Oh maybe it’s something to do with floats Nope. OK it’s this precise value that gives me trouble in the output. Nope, the test passes.

What’s going on here? Maybe it’s obvious, but I wasn’t seeing the wood for the trees. The more obvious problem is that I added a parameter to the check_binary function, but I didn’t actually do anything with it. So when I thought I was testing higher precision values, I was not: I was always testing the default precision=6. Once I modified check_binary to actually use its argument, all my tests failed, except for the tests with precision=6 and precision=7. So it all works with the default precision, but turn up it at all and it falls over.

The unlucky thing for me is that I had accidentally picked a default value for this precision parameter that masks the flaw in my way of calculating the binary fraction. I’m going to go through how I got to my implementation of binclock to explain how I ended up with this faulty version before explaining exactly what’s going wrong.

What is binclock doing?

I want to explain how I ended up with the version of the algorithm in binclock: why I thought it should work. You might recognise this algorithm as kind of similar to the algorithm you use to turn an integer into its binary representation. In that case, you do something like this:

def make_binary1(num):
    digits = []
    while num >0:
        digits.append(str(num % 2))
        num //= 2
    return "".join(digits[::-1])

Keep dividing your number by two, keeping track of all the remainders. Flip that string around and that’s your binary representation. So for the number 6 you do the following: divide it by 2 and get 3 remainder 0. So after one loop, the digits list is holding [0] and num is now 3. (digits is actually a list of strings, because join isn’t smart enough to cast what gets passed into it, but let’s ignore that for now). Divide 3 by 2 – we’re doing integer division – so that’s 1 remainder 1. So digits is now [0, 1] and num is 1. Repeat again and we add another 1 to digits and num is now 0, so the while loop ends. Reverse [0, 1, 1] and turn it into a string to get: 110, which is 6 in binary.

Now, here’s another algorithm that gives the same output, but is doing something slightly different internally.

def make_binary2(num):
    
    divisor = 2
    while divisor<= num:
        divisor <<= 1

    digits = []
    while divisor > 1:
        divisor >>= 1
        digits.append(str(num // divisor))
        num %= divisor
    return "".join(digits)

The first part of this function finds a power of two bigger or equal to the number we’re looking at. Then, instead of generating the digits of the binary representation in reverse order (smallest to biggest), we generate them in the order they appear in the final output. Note here that instead of using divisor *= 2, I use divisor <<= 1. This is a useful trick if you’re dealing with powers of two, but if you wanted to make your function able to output, say, ternary representations of numbers as well, then you’d be better off sticking with multiplication and division, instead of bit shifts.

This function gives the same outputs as make_binary1 for positive integer inputs. (They give different outputs for 0, see if you can figure out how they differ).

Why would you want to do conversion to binary in this way rather than the first way? Well, one reason is that this second approach can be more easily modified to generate binary representations of floats, rather than just integers. Here’s a first attempt at doing that.

# THIS CODE DOESN'T WORK
def make_binary3(num,precision=6):
    divisor = 2
    power = 1
    while divisor <= num:
        divisor <<= 1
        power += 1

    digits = []
    for i in range(power+precision):
        if i==power:
            digits.append(".")
        divisor >>= 1
        digits.append(str(int(num // divisor)))
        num %= divisor
    return "".join(digits)

Obviously we need to specify the precision we want to generate binary representations of floats with: how else would we prevent looping forever? But there is something wrong with the code above. You might be able to guess what it is from what I said about the binclock code above.

The problem is that at some point, divisor >>= 1 yields a zero, and thus num // divisor causes a divide-by-zero error. Let’s try working through make_binary3(1.5). The first part of the function doesn’t do anything, because divisor is already larger than num. Now, for the second loop. i is 0, but power is 1, so that first if gets skipped. We right shift divisor by 1 so it’s now 1. Then we append 1.5 // 1.0 to digits (that is, 1). We then assign 1.5 % 1.0 – which is 0.5 – to num. (Yeah “integer” division and modulo work fine with floats, little bit surprising, but once you think about what they’re doing, it kind of makes sense). The next time through the loop, i == power so we add a “.” to digits (this is the decimal point, or, I guess, the binary point). Then we right shift divisor by 1. Now, 1 >> 1 == 0, so we get a divide by zero error when we come to do num // divisor which, at this stage is 0.5 // 0.

It was this incorrect implementation of a function to create binary representations of floats that I used as the basis for binclock. I’ll explain how binclock is different from this once I’ve laid out how the correct version of make_binary works.

The correct thing to do is to replace divisor >>= 1 with divisor /= 2 so that instead of rounding to zero, the divisor keeps getting smaller. Let’s work through just that second part of the calculation again with an amended loop.

    digits = []
    for i in range(power+precision):
        if i==power:
            digits.append(".")
        divisor /= 2
        digits.append(str(int(num // divisor)))
        num %= divisor
    return "".join(digits)

The first go through the loop works just the same. At the end of the first iteration, digits is [1], divisor is 1 and num is 0.5. Now, on the second iteration, we add our binary point as before, and then we divide divisor by 2, giving 0.5. 0.5 // 0.5 == 1 so we append 1 to digits. We then assign 0.5 % 0.5 – which is 0 – to num. All the remaining iterations have num equal to 0, so although we keep halving divisor, we keep adding 0 to digits. Then we gather up our digits to produce 1.100000.

Fixing binclock

The algorithm in binclock is pretty much the same as our make_binary3 algorithm, except we already know the maximum number that will be passed in: it’s the number of seconds in a day, 86400. We’re actually calculating the fraction that input / 86400 represents, but we’ve dropped the leading binary point. So rather than divide the input by 86400, we just multiply the initial divisor by 86400. But other than that, it’s the same calculation.

Now we can see that binclock has basically the same problem that make_binary3 has: using right shift causes some rounding down that we don’t want. The problem boils down to the fact that using the right shift operator >>= leads to rounding errors once you start shifting out digits that are set. So the algorithim in binclock works so long as you’re bit shifting out a zero (so long as you’re dividing a number that’s divisible by 2). But at some point, that stops being the case, you get some rounding, and then things start coming off the rails, and you start seeing twos.

Here’s an example. Let’s say that our divisor is 14, and we’re trying to see what proportion of 14 is 6. That is, imagine there are 14 seconds in a day, and we’re at second 6: what time is it in binary? Step through the main loop of binclock. 14 is 1110 which means we can safely right shift without rounding down. So 14 >> 1 is 7. We append 6 // 7 to digits (that’s a 0). Then we assign 6 % 7 to rem (that’s a 6). So far so good. But now the trouble starts on the second time through the loop. 7 is 111 and so, right shifting, we chop off that rightmost 1 to leave ourselves with 11. Thus, 7 >> 1 is 3. Then we do 6//3 which gives us our 2. The rest of the calculation proceeds as you’d expect. What we should have done, instead, is divide 7 by 2, yielding 3.5, and then 6//3.5 is 1, while 6 % 3.5 is 2.5.

At what point do you start seeing twos in binclock? Well, (86400 >> 6) << 6 == 86400, but (86400 >> 8) << 8 != 86400. Or, to put it another way f"{86400:b}"[-8:] == "10000000". What I’m saying is that you can right shift 86400 seven times and you won’t lose any information, but that eighth shift, that eighth digit of precision starts causing problems. So the fact that I’d picked the value of 6 for the default precision, coupled with the fact that I’d forgotten to make my test helper function actually use the parameter value meant that a bunch of tests passed when they shouldn’t have.

In hindsight, this was an obvious error to make, but, then, in hindsight, aren’t they all?

© Seamus Bradley 2021–3

Powered by Hugo & Kiss.