失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > 利用 Python 实现 K-means 算法

利用 Python 实现 K-means 算法

时间:2019-03-23 05:08:31

相关推荐

利用 Python 实现 K-means 算法

利用 Python 实现 K-means 算法

使用 Python 实现K-means算法,采用随机函数随机在二维平面上生成100个点,然后使用所写程序对这100个点进行聚类【可以采用SSE(Sum of the Squared Errors,误差平方和)来确定最佳聚类数,即确定K值】。

问题的聚类算法分析:

①程序先随机在二维平面生成100个点,再随机从中选取k个点作为初始化质心;

②计算每个点到每个质心的距离,加入到距离最近的点中;

③计算每个聚类中所有点的距离平均值作为该聚类的新的质心;

④对比原质心与新质心是否改变,如果不改变则聚类已稳定,否则就重复2-4步,直到重复指定的n次,防止不收敛。

K值选取过程分析:

利用for循环指定K值范围,然后遍历计算每个K值的SSE,可以绘制下图,选择该图的拐点作为最终K值即可。本题中选题K=4较为合适。

实验结果:

****************************************k: 1center_point_new:{0: [43.46, 47.88]} cluster:{0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]} estimator: 171203.39999999997****************************************k: 2center_point_new:{0: [42.86538461538461, 23.692307692307693], 1: [44.104166666666664, 74.08333333333333]} cluster:{0: [0, 2, 4, 5, 6, 8, 9, 12, 13, 14, 15, 16, 17, 18, 19, 21, 23, 26, 27, 31, 36, 40, 41, 42, 44, 46, 47, 49, 51, 52, 56, 58, 60, 61, 62, 64, 65, 66, 69, 70, 71, 73, 80, 81, 82, 83, 87, 88, 89, 90, 91, 93], 1: [1, 3, 7, 10, 11, 20, 22, 24, 25, 28, 29, 30, 32, 33, 34, 35, 37, 38, 39, 43, 45, 48, 50, 53, 54, 55, 57, 59, 63, 67, 68, 72, 74, 75, 76, 77, 78, 79, 84, 85, 86, 92, 94, 95, 96, 97, 98, 99]} estimator: 107785.28044871795****************************************k: 3center_point_new:{0: [77.28571428571429, 18.523809523809526], 1: [57.84848484848485, 73.24242424242425], 2: [17.695652173913043, 43.08695652173913]} cluster:{0: [2, 6, 8, 12, 14, 16, 17, 23, 26, 36, 58, 61, 62, 64, 65, 71, 73, 83, 89, 90, 91], 1: [3, 7, 10, 11, 20, 28, 29, 30, 34, 35, 37, 38, 39, 48, 50, 54, 55, 63, 68, 72, 74, 76, 78, 79, 84, 85, 92, 94, 95, 96, 97, 98, 99], 2: [0, 1, 4, 5, 9, 13, 15, 18, 19, 21, 22, 24, 25, 27, 31, 32, 33, 40, 41, 42, 43, 44, 45, 46, 47, 49, 51, 52, 53, 56, 57, 59, 60, 66, 67, 69, 70, 75, 77, 80, 81, 82, 86, 87, 88, 93]} estimator: 69427.21814417467****************************************k: 4center_point_new:{0: [78.75, 18.3], 1: [64.6923076923077, 69.26923076923077], 2: [20.333333333333332, 81.04761904761905], 3: [20.060606060606062, 27.848484848484848]} cluster:{0: [2, 6, 8, 12, 14, 16, 17, 23, 26, 36, 58, 61, 62, 64, 65, 71, 73, 89, 90, 91], 1: [3, 7, 10, 11, 20, 25, 28, 29, 30, 34, 35, 39, 48, 50, 54, 55, 72, 74, 78, 79, 84, 85, 92, 94, 95, 98], 2: [1, 24, 32, 33, 37, 38, 43, 45, 53, 57, 59, 63, 67, 68, 75, 76, 77, 86, 96, 97, 99], 3: [0, 4, 5, 9, 13, 15, 18, 19, 21, 22, 27, 31, 40, 41, 42, 44, 46, 47, 49, 51, 52, 56, 60, 66, 69, 70, 80, 81, 82, 83, 87, 88, 93]} estimator: 39536.34410589411****************************************k: 5center_point_new:{0: [73.0, 74.33333333333333], 1: [77.28571428571429, 18.523809523809526], 2: [15.75, 14.6875], 3: [20.77777777777778, 85.33333333333333], 4: [29.0, 47.77777777777778]} cluster:{0: [3, 7, 11, 20, 28, 29, 30, 34, 48, 50, 55, 78, 79, 85, 92, 94, 95, 98], 1: [2, 6, 8, 12, 14, 16, 17, 23, 26, 36, 58, 61, 62, 64, 65, 71, 73, 83, 89, 90, 91], 2: [4, 15, 18, 19, 21, 27, 31, 42, 44, 49, 56, 66, 69, 80, 87, 93], 3: [1, 24, 32, 37, 38, 43, 45, 53, 59, 63, 67, 68, 75, 76, 84, 96, 97, 99], 4: [0, 5, 9, 10, 13, 22, 25, 33, 35, 39, 40, 41, 46, 47, 51, 52, 54, 57, 60, 70, 72, 74, 77, 81, 82, 86, 88]} estimator: 30705.73908730159****************************************k: 6center_point_new:{0: [44.42857142857143, 47.0], 1: [88.0, 34.125], 2: [77.36363636363636, 8.0], 3: [20.952380952380953, 82.04761904761905], 4: [14.16, 24.92], 5: [76.0, 78.14285714285714]} cluster:{0: [0, 3, 9, 10, 11, 25, 35, 39, 40, 46, 47, 54, 58, 72, 73, 74, 77, 81, 82, 83, 92], 1: [6, 8, 23, 65, 71, 78, 89, 90], 2: [2, 12, 14, 16, 17, 26, 36, 61, 62, 64, 91], 3: [1, 24, 32, 33, 37, 38, 43, 45, 53, 57, 59, 63, 67, 68, 75, 76, 84, 86, 96, 97, 99], 4: [4, 5, 13, 15, 18, 19, 21, 22, 27, 31, 41, 42, 44, 49, 51, 52, 56, 60, 66, 69, 70, 80, 87, 88, 93], 5: [7, 20, 28, 29, 30, 34, 48, 50, 55, 79, 85, 94, 95, 98]} estimator: 26203.382359307354****************************************k: 7center_point_new:{0: [46.857142857142854, 41.5], 1: [81.22222222222223, 17.0], 2: [47.84615384615385, 72.92307692307692], 3: [83.9090909090909, 78.36363636363636], 4: [13.615384615384615, 86.76923076923077], 5: [15.666666666666666, 13.533333333333333], 6: [16.9375, 47.5]} cluster:{0: [9, 25, 39, 40, 46, 47, 54, 58, 72, 73, 74, 82, 83, 92], 1: [2, 6, 8, 12, 14, 16, 17, 23, 26, 36, 61, 62, 64, 65, 71, 89, 90, 91], 2: [3, 10, 11, 20, 30, 35, 37, 48, 68, 76, 84, 85, 99], 3: [7, 28, 29, 34, 50, 55, 78, 79, 94, 95, 98], 4: [1, 24, 32, 38, 43, 45, 53, 59, 63, 67, 75, 96, 97], 5: [4, 15, 18, 21, 27, 31, 42, 44, 49, 56, 66, 69, 80, 87, 93], 6: [0, 5, 13, 19, 22, 33, 41, 51, 52, 57, 60, 70, 77, 81, 86, 88]} estimator: 19237.784108946606****************************************k: 8center_point_new:{0: [83.9090909090909, 78.36363636363636], 1: [18.125, 47.9375], 2: [38.44444444444444, 83.22222222222223], 3: [20.105263157894736, 16.63157894736842], 4: [79.94736842105263, 17.42105263157895], 5: [10.0, 74.66666666666667], 6: [8.0, 92.85714285714286], 7: [50.0, 55.5625]} cluster:{0: [7, 28, 29, 34, 50, 55, 78, 79, 94, 95, 98], 1: [0, 5, 13, 22, 33, 40, 41, 51, 52, 57, 60, 70, 77, 81, 86, 88], 2: [30, 37, 38, 68, 76, 84, 96, 97, 99], 3: [4, 15, 18, 19, 21, 27, 31, 42, 44, 46, 47, 49, 56, 66, 69, 80, 83, 87, 93], 4: [2, 6, 8, 12, 14, 16, 17, 23, 26, 36, 61, 62, 64, 65, 71, 73, 89, 90, 91], 5: [1, 43, 67], 6: [24, 32, 45, 53, 59, 63, 75], 7: [3, 9, 10, 11, 20, 25, 35, 39, 48, 54, 58, 72, 74, 82, 85, 92]} estimator: 19305.17060644034****************************************k: 9center_point_new:{0: [53.5, 20.5], 1: [15.785714285714286, 12.428571428571429], 2: [26.857142857142858, 54.142857142857146], 3: [12.090909090909092, 40.81818181818182], 4: [35.45454545454545, 82.63636363636364], 5: [85.13333333333334, 18.866666666666667], 6: [49.6, 56.93333333333333], 7: [5.25, 89.25], 8: [83.9090909090909, 78.36363636363636]} cluster:{0: [16, 36, 46, 47, 58, 73, 83, 91], 1: [15, 18, 21, 27, 31, 42, 44, 49, 56, 66, 69, 80, 87, 93], 2: [0, 33, 40, 57, 77, 81, 86], 3: [4, 5, 13, 19, 22, 41, 51, 52, 60, 70, 88], 4: [1, 30, 37, 38, 63, 68, 76, 84, 96, 97, 99], 5: [2, 6, 8, 12, 14, 17, 23, 26, 61, 62, 64, 65, 71, 89, 90], 6: [3, 9, 10, 11, 20, 25, 35, 39, 48, 54, 72, 74, 82, 85, 92], 7: [24, 32, 43, 45, 53, 59, 67, 75], 8: [7, 28, 29, 34, 50, 55, 78, 79, 94, 95, 98]} estimator: 14449.77272727272

实验源码:

# coding=utf-8import numpy as npfrom numpy import randomimport matplotlib.pyplot as plt# K-meansdef k_means(matrix, center_point, k, n):# print('k=', k, ' n=', n)# 2.对每个样本点,计算得到距其最近的质心,将其类别标为该质心所对应的clustercluster = {}for i in range(0, k):cluster[i] = []for i in range(0, 100):# 记录所有距离d = []# 记录下标信息index = {}for j in range(0, k):s = ((matrix[0][i] - center_point[j][0]) ** 2 + (matrix[1][i] - center_point[j][1]) ** 2)**0.5index[s] = jd.append(s)# 排序d.sort()# 将 i 点加入到距离最近的质心中cluster[index[d[0]]].append(i)# 3.重新计算k个 cluser 对应的质心center_point_new = {}# for i in range(0, k):#center_point_new[i] = center_point[i]# print(cluster)for i in cluster:if len(cluster[i]) != 0:x = 0y = 0for point in cluster[i]:x += matrix[0][point]y += matrix[1][point]center_point_new[i] = [x/len(cluster[i]), y/len(cluster[i])]else:center_point_new[i] = center_point[i]# print(center_point_new)# print(center_point)is_same = True# 遍历判断是否质心不再改变for i in center_point:# print(center_point[i])# print(center_point_new[i])if center_point[i] != center_point_new[i]:is_same = False# 如果质心相同,或者已经迭代 n 次即不再继续if is_same or n >= 10:# print(center_point_new, '\n', n)# print(cluster)estimator = computeSSE(center_point_new, cluster)return center_point_new, cluster, estimatorelse:return k_means(matrix, center_point_new, k=k, n=n+1)# 计算SSEdef computeSSE(center_point, cluster):estimator = 0for i in cluster:if len(cluster[i]) != 0:for point in cluster[i]:estimator += (matrix[0][point] - center_point[i][0]) ** 2 + (matrix[1][point] - center_point[i][1]) ** 2return estimatorif __name__ == '__main__':matrix = np.array(random.randint((100), size=(100, 100)))# 利用SSE选择最优kSSE = [] # 存放每次结果的误差平方和for k in range(1, 10): # K的范围 : 1-10# k = 10print('*'*40)print('k:', k)# 1.随机选定 聚类中心center_point_index = np.array(random.randint((100), size=(k)))center_point = {}for i in range(0, k):center_point[i] = [matrix[0][center_point_index[i]], matrix[1][center_point_index[i]]]center_point_new, cluster, estimator = k_means(matrix, center_point, k=k, n=1)print('center_point_new:\n', center_point_new, '\ncluster:\n', cluster, '\nestimator:', estimator)SSE.append(estimator)X = range(1, 10)plt.xlabel('k')plt.ylabel('SSE')plt.plot(X, SSE, 'o-')plt.show()

如果觉得《利用 Python 实现 K-means 算法》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。