失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > Python简单方法实现英文文本词频统计

Python简单方法实现英文文本词频统计

时间:2023-09-05 02:06:18

相关推荐

Python简单方法实现英文文本词频统计

一 问题描述:

给定一段英文字符串,要求统计其中所有单词出现的频率,将结果封装为字典

二 解题思路:

使用到的方法:

replace("a","b") 将字符串中的a字符替换成b

split() 将字符串以空格符,制表符,回车符为标志分割成单独元素并封装为列表

步骤:

步骤一 . 因为给出的文本为英文,则可以使用空格和标点符号来划分各个单词.首先处理标点符号,可以使用replace()方法先将其中出现的标点符号替换为空格(替换为空格是为了方便后续操作),然后在使用split()方法将单词分割并封装进列表.

步骤二 . 创建一个空字典,遍历列表中的元素。判断该元素是否在字典中存在:若不存在,则将该元素作为键,添加进字典;若存在,则将该键的值加1。如此该字典的键值对就是单词及出现的频率。

三 实现代码及结果

该实例使用《小王子》片段作为测试文本。

#data的值为测试文本data = '''The shrub soon stopped growing, and began to get ready to produce a flower. The little prince, who was present at the first appearance of a huge bud, felt at once that some sort of miraculous apparition must emerge from it. But the flower was not satisfied to complete the preparations for her beauty in the shelter of her green chamber. She chose her colours with the greatest care. She adjusted her petals one by one. She did not wish to go out into the world all rumpled, like the field poppies. It was only in the full radiance of her beauty that she wished to appear. Oh, yes! She was a coquettish creature! And her mysterious adornment lasted for days and days.'''

#替换掉文本中出现的标点符号str_data = data.replace("!"," ").replace(","," ").replace("."," ")#将字符串中的单词封装成列表list_data = str_data.split()

将单词拆分后得到的列表:

dic_data = {}#遍历列表,将单词与其出现频率封装成字典for i in list_data:if(i in dic_data):dic_data[i] += 1else:dic_data[i] = 1

查看封装在字典中的数据,该字典的键为被统计的单词,值为该次出现的频率,即{"被统计的单词":出现次数}

如果觉得《Python简单方法实现英文文本词频统计》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。