January 27, 2023

Python's Counter

The problem

Python has a Counter object – it’s found in the collections module – and I’ve become convinced that it’s… a bit weird? The basic idea is fine: it’s a dictionary but the keys are things you want to count, and the values are the amount of the thing you have counted. The neat thing is if you initialise a counter with an iterable like a string or a list, it’ll give you the counts for the number of times each element appears. And it gives you convenience methods for finding the most common elements and that sort of thing. It’s pretty neat and I’ve used it a few times.

The main problem with the Counter object is that it has some fairly inconsistent behaviour when it comes to negative numbers.

Here’s a warmup.

>>> a = Counter(z=1)
>>> b = Counter(z=2)
>>> a-b
Counter()

Create two counters, and subtract one away from the other. Because the count would go negative if we subtract 2 from 1, the outcome is the empty counter. And the same thing happens if we do the subtraction inplace:

>>> a = Counter(z=1)
>>> b = Counter(z=2)
>>> a-=b
>>> a
Counter()

OK, so Counter just doesn’t want to permit negative counts, well, maybe that’s fine. I’d want, perhaps a warning that the count went negative, but I guess that’s fair enough. But wait:

>>> a = Counter(z=1)
>>> b = Counter(z=2)
>>> a.subtract(b)
>>> a
Counter({'z': -1})

So Counter is willing to permit negative counts if we use subtract rather than -= (or __isub__). And it won’t make any kind of complaint if we do Counter(z=-1). So it is sort of permitting negative counts, but only sometimes? And it’s not permitting negative counts in any way that really permits them to be used in any sensible way.

>>> a = Counter(z=-1)
>>> b = Counter(z=-2)
>>> a+b
Counter()

So negative counts get dropped whenever we do addition or subtraction, unless we use subtract or update, in which case they don’t. Why permit negative values in Counters at all if they’re going to be dealt with kind of inconsistently? (I mean, Counter(z = 3.5) is also something that python does not complain about, but that’s a different story). Wouldn’t it be better to build a Counter class that handles negative numbers better?

Well, part of the issue there is that it isn’t clear that there’s one obviously right answer to the question “How should a counter respond to negative numbers?” One option would be to raise an error whenever a Counter encounters a negative value. Another would be to raise a warning. Another would be to just let negative values through. Any of these options seems preferable to the current implementation.

A Solution

For a project, I wanted a counter that would throw an error when it hit a negative number, and so I decided to try subclassing the Counter object. At first I just explored the code for the collections module, but this wasn’t quite enough, since Counter is a subclass of dict and so there might be methods of dict that get used by Counter that I wouldn’t necessarily notice if I’m just skimming the method definitions in Counter. It was part of this exercise that led me to learn about inspect and do the stuff I talked about in a previous post. I’m not sure whether the final project will make use of this solution, since it’s kind of slow and I’m exploring other implementations, but I’ll discuss it here.

One interesting thing is that each way of subtracting numbers (__sub__, __isub__, and subtract) is defined explicitly: none is defined in terms of the others. I presume this is done for efficiency reasons, to remove one layer of indirection from some method calls? But it means that I needed to redefine all of those methods.

So using the MethodInspector I defined in a previous post, I poked about in the definition of Counter and copied out the methods I needed to modify. The modifications are all rather minor (except that I just defined subtract in terms of __isub__ because I’m lazy). For your viewing pleasure, here’s the subclass.

from collections import Counter


class NegativeNumberError(ValueError):
    def __init__(self, message="values must be non-negative"):
        super().__init__(message)


class SubtractionError(ValueError):
    def __init__(self, message="subtraction would drop count below zero"):
        super().__init__(message)


class NNCounter(Counter):
    def __init__(self, iterable=None, /, **kwds):
        """
        Create a new NNCounter object.
        """
        if (
            iterable is not None
            and issubclass(type(iterable), dict)
            and any([x < 0 for x in iterable.values()])
        ):
            raise NegativeNumberError

        if kwds and any([x < 0 for x in kwds.values()]):
            raise NegativeNumberError
        super().__init__()
        self.update(iterable, **kwds)

    def __isub__(self, other):
        """
        Inplace subtract counter.

        Raises a SubtractionError if any count drops below 0.
        """
        for elem, count in other.items():
            newcount = self[elem] - count
            if newcount < 0:
                raise SubtractionError
            else:
                self[elem] = newcount
        return self

    def __setitem__(self, k, v):
        if v < 0:
            raise NegativeNumberError
        else:
            super().__setitem__(k, v)

    def __sub__(self, other):
        """
        Subtract one NNCounter from another.

        Raises a SubtractionError if any count drops below 0.
        """
        if not isinstance(other, NNCounter):
            return NotImplemented
        result = NNCounter()
        for elem, count in self.items():
            newcount = count - other[elem]
            if newcount > 0:
                result[elem] = newcount
            elif newcount < 0:
                raise SubtractionError
        for elem, count in other.items():
            if elem not in self and count > 0:
                raise SubtractionError

        return result

    def subtract(self, *args, **kwargs):
        other = NNCounter(*args, **kwargs)
        self -= other

Looking at this again, I think I would probably prefer to have one new kind of error, and then maybe adjust the message depending on whether it’s a subtraction or an init problem. Or at least maybe have both errors subclass a “negative number error” superclass so you can catch both without catching all other kinds of ValueError. I’m not going ot change it because, as I said, I’m probably not going to use this in the project I built it for.

The other limitation is that I haven’t redefined the addition methods for NNCounter, and so adding two NNCounters together returns a Counter. This is because the definition of Counter explicitly initialises the counters it returns in addition and subtraction operations as Counter() rather than being aware it might be subclassed and doing something like type(self)(). For my use case, this isn’t an issue, but if I were being thorough, I would have to redefine __add__, __iadd__ and __update__ as well, and possibly __and__, __or__ etc which also hardcode the type of their output.

At some point, I hope to get round to actually explaining what the project is that I was using this for, but that will have to wait.

© Seamus Bradley 2021–3

Powered by Hugo & Kiss.