失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > 【源码共读】Python 标准模块 collections 中 Counter 类详解

【源码共读】Python 标准模块 collections 中 Counter 类详解

时间:2021-01-06 06:31:44

相关推荐

【源码共读】Python 标准模块 collections 中 Counter 类详解

文章目录

1. 简介2. 使用3. 分析

1. 简介

Python标准模块collections中的类Counter是字典类dict的一个子类,该类一般用于对可哈希对象进行计数,有时也被称为多重集合。在Counter类的对象中,数据存储方式为:可哈希对象作为键,而可哈希对象的计数作为值进行存储。

2. 使用

创建Counter对象:

>>> from collections import Counter>>> c = Counter('abcdeabcdabcaba')

获得出现频率最高的元素:

>>> c.most_common(3)[('a', 5), ('b', 4), ('c', 3)]

列出所有的键:

>>> sorted(c) ['a', 'b', 'c', 'd', 'e']

列出所有元素:

>>> ''.join(sorted(c.elements()))'aaaaabbbbcccdde'

计算所有键的频率和:

>>> sum(c.values())15

获得某可哈希对象的计数:

>>> c['a']5

根据可迭代对象更新计数:

>>> for elem in 'shazam':... c[elem] += 1>>> c['a'] 7

删除某可迭代对象:

>>> del c['b'] >>> c['b'] 0

合并两个Counter对象:

>>> d = Counter('simsalabim') >>> c.update(d) >>> c['a'] 9>>> c['i']2

清空Counter对象:

>>> c.clear()>>> c['a']0

需要特别注意的是,当某可哈希对象的计数被设为 000 或减至 000 后,该对象将仍然保存在Counter对象中直到该对象被删除或Counter对象本身被清空:

>>> c = Counter('aaabbc')>>> c['b'] -= 2 >>> c.most_common() [('a', 3), ('c', 1), ('b', 0)]>>> del c['b']>>> c.most_common() [('a', 3), ('c', 1)]

关于Counter对象的其他详细使用,请见 /python/python-collections-counter.html 。

3. 分析

########################################################################### Counter########################################################################def _count_elements(mapping, iterable):'Tally elements from the iterable.'mapping_get = mapping.getfor elem in iterable:mapping[elem] = mapping_get(elem, 0) + 1try:# Load C helper function if availablefrom _collections import _count_elementsexcept ImportError:passclass Counter(dict):'''Dict subclass for counting hashable items. Sometimes called a bagor multiset. Elements are stored as dictionary keys and their countsare stored as dictionary values.>>> c = Counter('abcdeabcdabcaba') # count elements from a string>>> c.most_common(3)# three most common elements[('a', 5), ('b', 4), ('c', 3)]>>> sorted(c) # list all unique elements['a', 'b', 'c', 'd', 'e']>>> ''.join(sorted(c.elements())) # list elements with repetitions'aaaaabbbbcccdde'>>> sum(c.values()) # total of all counts15>>> c['a']# count of letter 'a'5>>> for elem in 'shazam': # update counts from an iterable...c[elem] += 1# by adding 1 to each element's count>>> c['a']# now there are seven 'a'7>>> del c['b'] # remove all 'b'>>> c['b']# now there are zero 'b'0>>> d = Counter('simsalabim') # make another counter>>> c.update(d) # add in the second counter>>> c['a']# now there are nine 'a'9>>> c.clear() # empty the counter>>> cCounter()Note: If a count is set to zero or reduced to zero, it will remainin the counter until the entry is deleted or the counter is cleared:>>> c = Counter('aaabbc')>>> c['b'] -= 2 # reduce the count of 'b' by two>>> c.most_common() # 'b' is still in, but its count is zero[('a', 3), ('c', 1), ('b', 0)]'''# References:# /wiki/Multiset# /software/smalltalk/manual-base/html_node/Bag.html# /Tutorial/Cpp/0380__set-multiset/Catalog0380__set-multiset.htm# /recipes/259174/# Knuth, TAOCP Vol. II section 4.6.3def __init__(self, iterable=None, /, **kwds):'''Create a new, empty Counter object. And if given, count elementsfrom an input iterable. Or, initialize the count from another mappingof elements to their counts.>>> c = Counter() # a new, empty counter>>> c = Counter('gallahad') # a new counter from an iterable>>> c = Counter({'a': 4, 'b': 2}) # a new counter from a mapping>>> c = Counter(a=4, b=2) # a new counter from keyword args'''super(Counter, self).__init__()self.update(iterable, **kwds)def __missing__(self, key):'The count of elements not in the Counter is zero.'# Needed so that self[missing_item] does not raise KeyErrorreturn 0def most_common(self, n=None):'''List the n most common elements and their counts from the mostcommon to the least. If n is None, then list all element counts.>>> Counter('abracadabra').most_common(3)[('a', 5), ('b', 2), ('r', 2)]'''# Emulate Bag.sortedByCount from Smalltalkif n is None:return sorted(self.items(), key=_itemgetter(1), reverse=True)return _heapq.nlargest(n, self.items(), key=_itemgetter(1))def elements(self):'''Iterator over elements repeating each as many times as its count.>>> c = Counter('ABCABC')>>> sorted(c.elements())['A', 'A', 'B', 'B', 'C', 'C']# Knuth's example for prime factors of 1836: 2**2 * 3**3 * 17**1>>> prime_factors = Counter({2: 2, 3: 3, 17: 1})>>> product = 1>>> for factor in prime_factors.elements():# loop over factors...product *= factor # and multiply them>>> product1836Note, if an element's count has been set to zero or is a negativenumber, elements() will ignore it.'''# Emulate Bag.do from Smalltalk and Multiset.begin from C++.return _chain.from_iterable(_starmap(_repeat, self.items()))# Override dict methods where necessary@classmethoddef fromkeys(cls, iterable, v=None):# There is no equivalent method for counters because the semantics# would be ambiguous in cases such as Counter.fromkeys('aaabbc', v=2).# Initializing counters to zero values isn't necessary because zero# is already the default value for counter lookups. Initializing# to one is easily accomplished with Counter(set(iterable)). For# more exotic cases, create a dictionary first using a dictionary# comprehension or dict.fromkeys().raise NotImplementedError('Counter.fromkeys() is undefined. Use Counter(iterable) instead.')def update(self, iterable=None, /, **kwds):'''Like dict.update() but add counts instead of replacing them.Source can be an iterable, a dictionary, or another Counter instance.>>> c = Counter('which')>>> c.update('witch') # add elements from another iterable>>> d = Counter('watch')>>> c.update(d) # add elements from another counter>>> c['h'] # four 'h' in which, witch, and watch4'''# The regular dict.update() operation makes no sense here because the# replace behavior results in the some of original untouched counts# being mixed-in with all of the other counts for a mismash that# doesn't have a straight-forward interpretation in most counting# contexts. Instead, we implement straight-addition. Both the inputs# and outputs are allowed to contain zero and negative counts.if iterable is not None:if isinstance(iterable, _collections_abc.Mapping):if self:self_get = self.getfor elem, count in iterable.items():self[elem] = count + self_get(elem, 0)else:super(Counter, self).update(iterable) # fast path when counter is emptyelse:_count_elements(self, iterable)if kwds:self.update(kwds)def subtract(self, iterable=None, /, **kwds):'''Like dict.update() but subtracts counts instead of replacing them.Counts can be reduced below zero. Both the inputs and outputs areallowed to contain zero and negative counts.Source can be an iterable, a dictionary, or another Counter instance.>>> c = Counter('which')>>> c.subtract('witch') # subtract elements from another iterable>>> c.subtract(Counter('watch')) # subtract elements from another counter>>> c['h']# 2 in which, minus 1 in witch, minus 1 in watch0>>> c['w']# 1 in which, minus 1 in witch, minus 1 in watch-1'''if iterable is not None:self_get = self.getif isinstance(iterable, _collections_abc.Mapping):for elem, count in iterable.items():self[elem] = self_get(elem, 0) - countelse:for elem in iterable:self[elem] = self_get(elem, 0) - 1if kwds:self.subtract(kwds)def copy(self):'Return a shallow copy.'return self.__class__(self)def __reduce__(self):return self.__class__, (dict(self),)def __delitem__(self, elem):'Like dict.__delitem__() but does not raise KeyError for missing values.'if elem in self:super().__delitem__(elem)def __repr__(self):if not self:return '%s()' % self.__class__.__name__try:items = ', '.join(map('%r: %r'.__mod__, self.most_common()))return '%s({%s})' % (self.__class__.__name__, items)except TypeError:# handle case where values are not orderablereturn '{0}({1!r})'.format(self.__class__.__name__, dict(self))# Multiset-style mathematical operations discussed in:# Knuth TAOCP Volume II section 4.6.3 exercise 19# and at /wiki/Multiset## Outputs guaranteed to only include positive counts.## To strip negative and zero counts, add-in an empty counter:# c += Counter()## Rich comparison operators for multiset subset and superset tests# are deliberately omitted due to semantic conflicts with the# existing inherited dict equality method. Subset and superset# semantics ignore zero counts and require that p≤q ∧ p≥q → p=q;# however, that would not be the case for p=Counter(a=1, b=0)# and q=Counter(a=1) where the dictionaries are not equal.def __add__(self, other):'''Add counts from two counters.>>> Counter('abbb') + Counter('bcc')Counter({'b': 4, 'c': 2, 'a': 1})'''if not isinstance(other, Counter):return NotImplementedresult = Counter()for elem, count in self.items():newcount = count + other[elem]if newcount > 0:result[elem] = newcountfor elem, count in other.items():if elem not in self and count > 0:result[elem] = countreturn resultdef __sub__(self, other):''' Subtract count, but keep only results with positive counts.>>> Counter('abbbc') - Counter('bccd')Counter({'b': 2, 'a': 1})'''if not isinstance(other, Counter):return NotImplementedresult = Counter()for elem, count in self.items():newcount = count - other[elem]if newcount > 0:result[elem] = newcountfor elem, count in other.items():if elem not in self and count < 0:result[elem] = 0 - countreturn resultdef __or__(self, other):'''Union is the maximum of value in either of the input counters.>>> Counter('abbb') | Counter('bcc')Counter({'b': 3, 'c': 2, 'a': 1})'''if not isinstance(other, Counter):return NotImplementedresult = Counter()for elem, count in self.items():other_count = other[elem]newcount = other_count if count < other_count else countif newcount > 0:result[elem] = newcountfor elem, count in other.items():if elem not in self and count > 0:result[elem] = countreturn resultdef __and__(self, other):''' Intersection is the minimum of corresponding counts.>>> Counter('abbb') & Counter('bcc')Counter({'b': 1})'''if not isinstance(other, Counter):return NotImplementedresult = Counter()for elem, count in self.items():other_count = other[elem]newcount = count if count < other_count else other_countif newcount > 0:result[elem] = newcountreturn resultdef __pos__(self):'Adds an empty counter, effectively stripping negative and zero counts'result = Counter()for elem, count in self.items():if count > 0:result[elem] = countreturn resultdef __neg__(self):'''Subtracts from an empty counter. Strips positive and zero counts,and flips the sign on negative counts.'''result = Counter()for elem, count in self.items():if count < 0:result[elem] = 0 - countreturn resultdef _keep_positive(self):'''Internal method to strip elements with a negative or zero count'''nonpositive = [elem for elem, count in self.items() if not count > 0]for elem in nonpositive:del self[elem]return selfdef __iadd__(self, other):'''Inplace add from another counter, keeping only positive counts.>>> c = Counter('abbb')>>> c += Counter('bcc')>>> cCounter({'b': 4, 'c': 2, 'a': 1})'''for elem, count in other.items():self[elem] += countreturn self._keep_positive()def __isub__(self, other):'''Inplace subtract counter, but keep only results with positive counts.>>> c = Counter('abbbc')>>> c -= Counter('bccd')>>> cCounter({'b': 2, 'a': 1})'''for elem, count in other.items():self[elem] -= countreturn self._keep_positive()def __ior__(self, other):'''Inplace union is the maximum of value from either counter.>>> c = Counter('abbb')>>> c |= Counter('bcc')>>> cCounter({'b': 3, 'c': 2, 'a': 1})'''for elem, other_count in other.items():count = self[elem]if other_count > count:self[elem] = other_countreturn self._keep_positive()def __iand__(self, other):'''Inplace intersection is the minimum of corresponding counts.>>> c = Counter('abbb')>>> c &= Counter('bcc')>>> cCounter({'b': 1})'''for elem, count in self.items():other_count = other[elem]if other_count < count:self[elem] = other_countreturn self._keep_positive()

如果觉得《【源码共读】Python 标准模块 collections 中 Counter 类详解》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。