k邻近算法的伪代码:
对未知类别属性的数据集中的每个点一次执行以下操作:
(1)计算已知类别数据集中的点与当前点之间的距离;
(2)按照距离递增次序排列
(3)选取与当前点距离最小的k个点
(4)确定前k个点所在类别的出现频率
(5)返回前k个点出现频率最好的类别作为当前点的预测分类
python函数实现
'''Created on Sep 16, kNN: k Nearest NeighborsInput:inX: vector to compare to existing dataset (1xN)dataSet: size m data set of known vectors (NxM)labels: data set labels (1xM vector)k: number of neighbors to use for comparison (should be an odd number)Output:the most popular class label@author: pbharrin'''def classify0(inX, dataSet, labels, k):dataSetSize = dataSet.shape[0]//输入的训练样本集dataSet的列数diffMat = tile(inX, (dataSetSize,1)) - dataSet //先对inX进行向量化处理,使之格式与dataSet一致,然后相减sqDiffMat = diffMat**2 //向量对应值差的平方sqDistances = sqDiffMat.sum(axis=1)//列的平方和的汇总distances = sqDistances**0.5 //开平方求距离sortedDistIndicies = distances.argsort() classCount={}for i in range(k):voteIlabel = labels[sortedDistIndicies[i]]classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1 //选择距离最小的k个点sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True) //排序return sortedClassCount[0][0]
如果觉得《机器学习实战读书笔记--k邻近算法KNN》对你有帮助,请点赞、收藏,并留下你的观点哦!