失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > Python对商品属性进行二次分类并输出多层嵌套字典

Python对商品属性进行二次分类并输出多层嵌套字典

时间:2018-09-19 09:10:47

相关推荐

Python对商品属性进行二次分类并输出多层嵌套字典

题目有点长,感觉好像也解释的不太清楚,但是大概意思就是,我们在逛一个网站的时候,譬如天猫,你会看到有“女装”、“男鞋”、“手机”等等分类,点击进去又会有相应的品牌,女装下面会有“snidle”、“伊芙丽”等品牌,男鞋下面会有“nike”、“adidas”等分类,如果一个用户在搜索nike,那么相应的标签应该会带上“男鞋”,通俗的说是会在输入框下面弹出“在男鞋下面搜索nike”,那么我写这篇文章就是要预测我们在输入一个品牌的时候,相对应的一级分类的概率是多少。

然并卵,我并没有天猫的相关数据,只有我公司的数据,但是这个数据肯定不能外泄,编数据又很麻烦,所以就不讲怎么用机器学习的算法去计算这个概率了,不过这也不难,待我有时间写个爬虫把数据弄下来再写,嘿嘿。

总之,做完后的预测数据应该是酱紫的:

这个表怎么看呢,第一行是一级分类的类别,第一列是二级分类的类别。以第三行为例,我们可以看到“scofield”这个品牌被分类为“女装/内衣”的概率是0.87473829,“女鞋/男鞋/箱包”的概率是0.03394293,“化妆品/个人护理”的概率是0.21392374。所以如果你在天猫的搜索框里搜索“scofield”,下面最可能弹出来的是“在女装/内衣中搜索scofield”。

但是这个表有个缺陷,就是0值太多,而且没有排序,看起来很乱,所以我们用python中的字典进行排序。

废话不多说,上代码:

#coding:utf-8import numpy as npimport pandas as pdfrom odo import odofrom odo import convertimport jsonfrom operator import itemgetterimport collectionsfrom collections import OrderedDictimport sys reload(sys) sys.setdefaultencoding('utf8')#加载数据集result = pd.read_table('tmalltest.txt',header =None)listall = odo(result,list)result1 = pd.read_table('tmalltest.txt')result2 = result1.drop('class',axis = 1)listvalue = odo(result2,list)count = len(range(result.shape[1]))id = result.iloc[0,1:16]listvalueout = [result.iloc[y,0] for y in range(1,result.shape[0])] outid =tuple(out)d = [dict(zip(id,tuple(listvalue[i]))) for i in range(0,len(listvalue))]#将字典的键值对反转func = lambda b:dict([(x,y) for y,x in b.items()])dd = [func(d[i]) for i in range(len(d)) ]#删除字典中key为0的键值对delete = [dd[i].pop(0.0) for i in range(len(d))]#将字典反转回来ddvalue = [func(dd[i]) for i in range(len(d))]#两个列表合成dictdictall = dict(zip(out,ddvalue))#使输出到控制台的时候显示的是中文print json.dumps(dictall).decode("unicode-escape")#将字典中的值取出来,放到一个新列表中lista = []for k in dictall.keys():sorted_d =sorted(dictall[k].iteritems(),key = itemgetter(1),reverse = True)print sorted_dlista.append(sorted_d)#只选取预测值排前三的类别listb = [lista[i][0:3] for i in range(len(lista))]listc = [json.dumps(listb[i]).decode("unicode-escape") for i in range(len(listb))]#二级分类排序,可以用OrderedDict有序字典排序dictorder = [OrderedDict(lista[i]) for i in range(0,len(lista))] print json.dumps(dictorder).decode("unicode-escape")#将排序号的列表重新组合成字典dictall_sort= dict(zip(dictall.keys(),listc))#写个函数使输出嵌套字典更美观def pretty_dict(obj, indent=' '):def _pretty(obj, indent):for i, tup in enumerate(obj.items()):k, v = tup#如果是字符串则拼上""if isinstance(k, basestring): k = '"%s"'% kif isinstance(v, basestring): v = '"%s"'% v#如果是字典则递归if isinstance(v, dict):v = ''.join(_pretty(v, indent + ' '* len(str(k) + ': {')))#计算下一层的indent#case,根据(k,v)对在哪个位置确定拼接什么if i == 0:#开头,拼左花括号if len(obj) == 1:yield '{%s: %s}'% (k, v)else:yield '{%s: %s,\n'% (k, v)elif i == len(obj) - 1:#结尾,拼右花括号yield '%s%s: %s}'% (indent, k, v)else:#中间yield '%s%s: %s,\n'% (indent, k, v)print ''.join(_pretty(obj, indent))#输出原始未排序的字典,美化后print pretty_dict(dictall)#输出排序后的字典,美化前print json.dumps(dictall_sort).decode("unicode-escape")#输出排序后的字典,美化后print pretty_dict(dictall_sort)

输出结果:

#输出原始未排序的字典,美化后{"太平鸟": {"男装/户外运动/": 0.847823719,"家纺/家饰/鲜花": "0","化妆品/个人护理": 0.11242904,"腕表/珠宝饰品/眼镜": 0.05923729},"博士伦": {"家纺/家饰/鲜花": "0","化妆品/个人护理": 0.11323213,"医药保健": 0.89348974},"a02": {"家纺/家饰/鲜花": "0","女装/内衣": 0.984447322,"女鞋/男鞋/箱包": 0.12493492},"周黑鸭": {"零食/进口食品/茶酒": 0.87323123,"家纺/家饰/鲜花": "0","厨具/收纳/宠物": 0.12432232},"3M": {"家纺/家饰/鲜花": "0","厨具/收纳/宠物": 0.32344534,"家居建材": 0.68213814},"博士": {"家纺/家饰/鲜花": "0"},"sk-II": {"家纺/家饰/鲜花": "0","化妆品/个人护理": 0.98843487,"腕表/珠宝饰品/眼镜": 0.02324442},"洗洁精": {"图书音像": 0.02124194,"家纺/家饰/鲜花": "0"},"finity": {"家纺/家饰/鲜花": "0","女装/内衣": 0.93392424,"女鞋/男鞋/箱包": 0.07323483},"selected": {"男装/户外运动/": 0.934439842,"家纺/家饰/鲜花": "0","女鞋/男鞋/箱包": 0.07438472},"scofield": {"家纺/家饰/鲜花": "0","化妆品/个人护理": 0.21392374,"女鞋/男鞋/箱包": 0.03394293,"女装/内衣": 0.87473829},"米其林": {"家纺/家饰/鲜花": "0.02432412","汽车/配件/用品": 0.98233342},"好奇": {"零食/进口食品/茶酒": 0.11321412,"母婴玩具": 0.89472934,"家纺/家饰/鲜花": "0"},"佐卡伊": {"家纺/家饰/鲜花": "0","化妆品/个人护理": 0.13232944,"腕表/珠宝饰品/眼镜": 0.87342324},"波司登": {"母婴玩具": 0.02134243,"家纺/家饰/鲜花": "0","女装/内衣": 0.78765673,"化妆品/个人护理": 0.3924},"breadbutter": {"零食/进口食品/茶酒": 0.29434974,"家纺/家饰/鲜花": "0","女鞋/男鞋/箱包": 0.03329473,"女装/内衣": 0.684728232},"北极绒": {"家纺/家饰/鲜花": "0.84932498","大家电/生活电器": 0.05213923,"家居建材": 0.11321321},"Adidas": {"男装/户外运动/": 0.829743434,"家纺/家饰/鲜花": "0","女鞋/男鞋/箱包": 0.14974892,"手机/数码/电脑办公": 0.04232553},"当当网": {"图书音像": 0.78947234,"家纺/家饰/鲜花": "0"},"snidle": {"家纺/家饰/鲜花": "0","女装/内衣": 0.83927289,"女鞋/男鞋/箱包": 0.15237234,"腕表/珠宝饰品/眼镜": 0.02432324},"TISSOT": {"家纺/家饰/鲜花": "0","大家电/生活电器": 0.13942309,"腕表/珠宝饰品/眼镜": 0.87545234},"曼妮芬": {"家纺/家饰/鲜花": "0","化妆品/个人护理": 0.07239742,"女装/内衣": 0.93837427},"New Balance": {"母婴玩具": 0.43237442,"家纺/家饰/鲜花": "0","女鞋/男鞋/箱包": 0.57823432},"Jackjones": {"男装/户外运动/": 0.883293743,"家纺/家饰/鲜花": "0","女鞋/男鞋/箱包": 0.10343298,"手机/数码/电脑办公": 0.02234927},"ZARA": {"女鞋/男鞋/箱包": 0.12429483,"家纺/家饰/鲜花": "0","女装/内衣": 0.78283128,"腕表/珠宝饰品/眼镜": 0.10213943},"海尔": {"家纺/家饰/鲜花": "0。1323243","厨具/收纳/宠物": 0.09354832,"大家电/生活电器": 0.79103821},"nike": {"男装/户外运动/": 0.891232313,"家纺/家饰/鲜花": "0","化妆品/个人护理": 0.06163211,"手机/数码/电脑办公": 0.04293713},"双立人": {"家纺/家饰/鲜花": "0","厨具/收纳/宠物": 0.98943242,"医药保健": 0.01943242},"苹果": {"手机/数码/电脑办公": 0.89232342,"家纺/家饰/鲜花": "0","汽车/配件/用品": 0.05293713,"腕表/珠宝饰品/眼镜": 0.05230971},"兰芝": {"家纺/家饰/鲜花": "0","女装/内衣": 0.09238374,"化妆品/个人护理": 0.78423234,"腕表/珠宝饰品/眼镜": 0.13213232}}#输出排序后的字典,美化前{"太平鸟": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8478237190000001], ["化妆品/个人护理", 0.11242904]]", "博士伦": "[["家纺/家饰/鲜花", "0"], ["医药保健", 0.89348974], ["化妆品/个人护理", 0.11323213]]", "a02": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.9844473220000001], ["女鞋/男鞋/箱包", 0.12493492]]", "周黑鸭": "[["家纺/家饰/鲜花", "0"], ["零食/进口食品/茶酒", 0.87323123], ["厨具/收纳/宠物", 0.12432232]]", "3M": "[["家纺/家饰/鲜花", "0"], ["家居建材", 0.68213814], ["厨具/收纳/宠物", 0.32344534]]", "博士": "[["家纺/家饰/鲜花", "0"]]", "sk-II": "[["家纺/家饰/鲜花", "0"], ["化妆品/个人护理", 0.98843487], ["腕表/珠宝饰品/眼镜", 0.02324442]]", "洗洁精": "[["家纺/家饰/鲜花", "0"], ["图书音像", 0.02124194]]", "finity": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.93392424], ["女鞋/男鞋/箱包", 0.07323483]]", "selected": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.9344398420000001], ["女鞋/男鞋/箱包", 0.07438472]]", "scofield": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.87473829], ["化妆品/个人护理", 0.21392374]]", "米其林": "[["家纺/家饰/鲜花", "0.02432412"], ["汽车/配件/用品", 0.98233342]]", "好奇": "[["家纺/家饰/鲜花", "0"], ["母婴玩具", 0.89472934], ["零食/进口食品/茶酒", 0.11321412]]", "佐卡伊": "[["家纺/家饰/鲜花", "0"], ["腕表/珠宝饰品/眼镜", 0.87342324], ["化妆品/个人护理", 0.13232944]]", "波司登": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.78765673], ["化妆品/个人护理", 0.3924]]", "breadbutter": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.684728232], ["零食/进口食品/茶酒", 0.29434974]]", "北极绒": "[["家纺/家饰/鲜花", "0.84932498"], ["家居建材", 0.11321321], ["大家电/生活电器", 0.05213923]]", "Adidas": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8297434340000001], ["女鞋/男鞋/箱包", 0.14974892]]", "当当网": "[["家纺/家饰/鲜花", "0"], ["图书音像", 0.78947234]]", "snidle": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.83927289], ["女鞋/男鞋/箱包", 0.15237234]]", "TISSOT": "[["家纺/家饰/鲜花", "0"], ["腕表/珠宝饰品/眼镜", 0.87545234], ["大家电/生活电器", 0.13942309]]", "曼妮芬": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.93837427], ["化妆品/个人护理", 0.07239742]]", "New Balance": "[["家纺/家饰/鲜花", "0"], ["女鞋/男鞋/箱包", 0.57823432], ["母婴玩具", 0.43237442]]", "Jackjones": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.883293743], ["女鞋/男鞋/箱包", 0.10343298]]", "ZARA": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.78283128], ["女鞋/男鞋/箱包", 0.12429483]]", "海尔": "[["家纺/家饰/鲜花", "0。1323243"], ["大家电/生活电器", 0.79103821], ["厨具/收纳/宠物", 0.09354832]]", "nike": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8912323129999999], ["化妆品/个人护理", 0.06163211]]", "双立人": "[["家纺/家饰/鲜花", "0"], ["厨具/收纳/宠物", 0.98943242], ["医药保健", 0.01943242]]", "苹果": "[["家纺/家饰/鲜花", "0"], ["手机/数码/电脑办公", 0.89232342], ["汽车/配件/用品", 0.05293713]]", "兰芝": "[["家纺/家饰/鲜花", "0"], ["化妆品/个人护理", 0.78423234], ["腕表/珠宝饰品/眼镜", 0.13213232]]"}#输出排序后的字典,美化后{"太平鸟": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8478237190000001], ["化妆品/个人护理", 0.11242904]]","博士伦": "[["家纺/家饰/鲜花", "0"], ["医药保健", 0.89348974], ["化妆品/个人护理", 0.11323213]]","a02": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.9844473220000001], ["女鞋/男鞋/箱包", 0.12493492]]","周黑鸭": "[["家纺/家饰/鲜花", "0"], ["零食/进口食品/茶酒", 0.87323123], ["厨具/收纳/宠物", 0.12432232]]","3M": "[["家纺/家饰/鲜花", "0"], ["家居建材", 0.68213814], ["厨具/收纳/宠物", 0.32344534]]","博士": "[["家纺/家饰/鲜花", "0"]]","sk-II": "[["家纺/家饰/鲜花", "0"], ["化妆品/个人护理", 0.98843487], ["腕表/珠宝饰品/眼镜", 0.02324442]]","洗洁精": "[["家纺/家饰/鲜花", "0"], ["图书音像", 0.02124194]]","finity": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.93392424], ["女鞋/男鞋/箱包", 0.07323483]]","selected": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.9344398420000001], ["女鞋/男鞋/箱包", 0.07438472]]","scofield": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.87473829], ["化妆品/个人护理", 0.21392374]]","米其林": "[["家纺/家饰/鲜花", "0.02432412"], ["汽车/配件/用品", 0.98233342]]","好奇": "[["家纺/家饰/鲜花", "0"], ["母婴玩具", 0.89472934], ["零食/进口食品/茶酒", 0.11321412]]","佐卡伊": "[["家纺/家饰/鲜花", "0"], ["腕表/珠宝饰品/眼镜", 0.87342324], ["化妆品/个人护理", 0.13232944]]","波司登": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.78765673], ["化妆品/个人护理", 0.3924]]","breadbutter": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.684728232], ["零食/进口食品/茶酒", 0.29434974]]","北极绒": "[["家纺/家饰/鲜花", "0.84932498"], ["家居建材", 0.11321321], ["大家电/生活电器", 0.05213923]]","Adidas": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8297434340000001], ["女鞋/男鞋/箱包", 0.14974892]]","当当网": "[["家纺/家饰/鲜花", "0"], ["图书音像", 0.78947234]]","snidle": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.83927289], ["女鞋/男鞋/箱包", 0.15237234]]","TISSOT": "[["家纺/家饰/鲜花", "0"], ["腕表/珠宝饰品/眼镜", 0.87545234], ["大家电/生活电器", 0.13942309]]","曼妮芬": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.93837427], ["化妆品/个人护理", 0.07239742]]","New Balance": "[["家纺/家饰/鲜花", "0"], ["女鞋/男鞋/箱包", 0.57823432], ["母婴玩具", 0.43237442]]","Jackjones": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.883293743], ["女鞋/男鞋/箱包", 0.10343298]]","ZARA": "[["家纺/家饰/鲜花", "0"], ["女装/内衣", 0.78283128], ["女鞋/男鞋/箱包", 0.12429483]]","海尔": "[["家纺/家饰/鲜花", "0。1323243"], ["大家电/生活电器", 0.79103821], ["厨具/收纳/宠物", 0.09354832]]","nike": "[["家纺/家饰/鲜花", "0"], ["男装/户外运动/", 0.8912323129999999], ["化妆品/个人护理", 0.06163211]]","双立人": "[["家纺/家饰/鲜花", "0"], ["厨具/收纳/宠物", 0.98943242], ["医药保健", 0.01943242]]","苹果": "[["家纺/家饰/鲜花", "0"], ["手机/数码/电脑办公", 0.89232342], ["汽车/配件/用品", 0.05293713]]","兰芝": "[["家纺/家饰/鲜花", "0"], ["化妆品/个人护理", 0.78423234], ["腕表/珠宝饰品/眼镜", 0.13213232]]"}

这里结果显示的不太好看,其实在linux下输出很清晰,看图片:

这个的难点在于python的多层嵌套字典的输出和删除python字典中的值,譬如在这里就是删除字典中value = 0的值,我最开始的时候是把value值提取出来放到一个列表里去删除,但是删除之后至少还会保留一个0值,后来想到可以把字典的key和value反转,用dict.pop删除key = 0的键值对就可以了。第二个难点就是多层嵌套字典的排序。我们知道字典是无序的,所以只能把字典按照value排序,然后把排序后的结果存到一个list里,在和原来对应的key值列表组合成字典,这样就方便多了。

记录一下上周的工作,以后忘记了回来再看,如果大家有更好的方法,欢迎交流~

ps:这个天猫数据是我编的,如果需要我可以分享出来 = =

如果觉得《Python对商品属性进行二次分类并输出多层嵌套字典》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。