The problem
Python has a Counter
object – it’s found in
the collections
module –
and I’ve become convinced that it’s… a bit weird?
The basic idea is fine:
it’s a dictionary but the keys are things you want to count, and the values are the amount of
the thing you have counted.
The neat thing is if you initialise a counter with an iterable like a string or a list,
it’ll give you the counts for the number of times each element appears.
And it gives you convenience methods for finding the most common elements and that sort of thing.
It’s pretty neat and I’ve used it a few times.
The main problem with the Counter
object is that it has some fairly inconsistent behaviour
when it comes to negative numbers.
Here’s a warmup.
>>> a = Counter(z=1)
>>> b = Counter(z=2)
>>> a-b
Counter()
Create two counters, and subtract one away from the other. Because the count would go negative if we subtract 2 from 1, the outcome is the empty counter. And the same thing happens if we do the subtraction inplace:
>>> a = Counter(z=1)
>>> b = Counter(z=2)
>>> a-=b
>>> a
Counter()
OK, so Counter
just doesn’t want to permit negative counts, well, maybe that’s fine.
I’d want, perhaps a warning that the count went negative, but I guess that’s fair enough.
But wait:
>>> a = Counter(z=1)
>>> b = Counter(z=2)
>>> a.subtract(b)
>>> a
Counter({'z': -1})
So Counter
is willing to permit negative counts if we use subtract
rather than -=
(or __isub__
).
And it won’t make any kind of complaint if we do Counter(z=-1)
.
So it is sort of permitting negative counts, but only sometimes?
And it’s not permitting negative counts in any way that really permits them to be used in any sensible way.
>>> a = Counter(z=-1)
>>> b = Counter(z=-2)
>>> a+b
Counter()
So negative counts get dropped whenever we do addition or subtraction, unless we use subtract
or update
, in which case they don’t.
Why permit negative values in Counter
s at all if they’re going to be dealt with kind of inconsistently?
(I mean, Counter(z = 3.5)
is also something that python does not complain about, but that’s a different story).
Wouldn’t it be better to build a Counter
class that handles negative numbers better?
Well, part of the issue there is that it isn’t clear that there’s one
obviously right answer to the question “How should a counter respond to negative numbers?”
One option would be to raise an error whenever a Counter
encounters a negative value.
Another would be to raise a warning.
Another would be to just let negative values through.
Any of these options seems preferable to the current implementation.
A Solution
For a project, I wanted a counter that would throw an error when it hit a negative number,
and so I decided to try subclassing the Counter
object.
At first I just explored the code for
the collections
module,
but this wasn’t quite enough, since Counter
is a subclass of dict
and so there might be methods of dict
that get used by Counter
that I wouldn’t necessarily
notice if I’m just skimming the method definitions in Counter
.
It was part of this exercise that led me to learn about inspect
and do the stuff
I talked about in a previous post.
I’m not sure whether the final project will make use of this solution,
since it’s kind of slow and I’m exploring other implementations,
but I’ll discuss it here.
One interesting thing is that each way of subtracting numbers (__sub__
, __isub__
, and subtract
)
is defined explicitly:
none is defined in terms of the others.
I presume this is done for efficiency reasons,
to remove one layer of indirection from some method calls?
But it means that I needed to redefine all of those methods.
So using the MethodInspector
I defined in a previous post,
I poked about in the definition of Counter
and copied out the methods
I needed to modify.
The modifications are all rather minor (except that I just defined subtract
in terms of __isub__
because I’m lazy).
For your viewing pleasure, here’s the subclass.
from collections import Counter
class NegativeNumberError(ValueError):
def __init__(self, message="values must be non-negative"):
super().__init__(message)
class SubtractionError(ValueError):
def __init__(self, message="subtraction would drop count below zero"):
super().__init__(message)
class NNCounter(Counter):
def __init__(self, iterable=None, /, **kwds):
"""
Create a new NNCounter object.
"""
if (
iterable is not None
and issubclass(type(iterable), dict)
and any([x < 0 for x in iterable.values()])
):
raise NegativeNumberError
if kwds and any([x < 0 for x in kwds.values()]):
raise NegativeNumberError
super().__init__()
self.update(iterable, **kwds)
def __isub__(self, other):
"""
Inplace subtract counter.
Raises a SubtractionError if any count drops below 0.
"""
for elem, count in other.items():
newcount = self[elem] - count
if newcount < 0:
raise SubtractionError
else:
self[elem] = newcount
return self
def __setitem__(self, k, v):
if v < 0:
raise NegativeNumberError
else:
super().__setitem__(k, v)
def __sub__(self, other):
"""
Subtract one NNCounter from another.
Raises a SubtractionError if any count drops below 0.
"""
if not isinstance(other, NNCounter):
return NotImplemented
result = NNCounter()
for elem, count in self.items():
newcount = count - other[elem]
if newcount > 0:
result[elem] = newcount
elif newcount < 0:
raise SubtractionError
for elem, count in other.items():
if elem not in self and count > 0:
raise SubtractionError
return result
def subtract(self, *args, **kwargs):
other = NNCounter(*args, **kwargs)
self -= other
Looking at this again, I think I would probably prefer to have one
new kind of error, and then maybe adjust the message depending on
whether it’s a subtraction or an init problem.
Or at least maybe have both errors subclass a “negative number error” superclass
so you can catch both without catching all other kinds of ValueError
.
I’m not going ot change it because, as I said, I’m probably not going to use this
in the project I built it for.
The other limitation is that I haven’t redefined the addition methods for NNCounter
,
and so adding two NNCounter
s together returns a Counter
.
This is because the definition of Counter
explicitly
initialises the counters it returns in addition and subtraction operations
as Counter()
rather than being aware it might be subclassed and doing something like
type(self)()
.
For my use case, this isn’t an issue, but if I were being thorough,
I would have to redefine __add__
, __iadd__
and __update__
as well,
and possibly __and__
, __or__
etc which also hardcode the type of their output.
At some point, I hope to get round to actually explaining what the project is that I was using this for, but that will have to wait.