失眠网 > 【统计学习方法】朴素贝叶斯

【统计学习方法】朴素贝叶斯

时间：2024-01-07 15:25:21

相关推荐

【统计学习方法】朴素贝叶斯

一、前言

首先介绍朴素贝叶斯的核心公式：

P A ∣ B ) = P ( A ) P ( B ∣ A ) P ( B ) PA|B) = \frac{{P(A)P(B|A)}}{{P(B)}} PA∣B)=P(B)P(A)P(B∣A)

Wikipedia贝叶斯定理

朴素贝叶斯是基于贝叶斯定理与特征条件独立假设的分类方法。”朴素贝叶斯“之”朴素“之名即来源于其特征条件独立的假设。对于给定的数据集，首先基于特征条件独立假设学习输入输出的联合概率分布，而后基于此模型，对于给定的输入x，利用贝叶斯定理求出后验概率最大的输出y。具体地，条件独立性假设是：

P ( X = x ∣ Y = c k ) = P ( X ( 1 ) = x ( 1 ) , . . . , X ( n ) = x ( n ) ∣ Y = c k ) = ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) P(X = {\rm{x}}|Y = {c_k}) = P({X^{(1)}} = {x^{(1)}},...,{X^{(n)}} \\= {x^{(n)}}|Y = {c_k}) = \prod\limits_{j = 1}^n {P({X^{(j)}} = {x^{(j)}}|Y = {c_k})} P(X=x∣Y=ck)=P(X(1)=x(1),...,X(n)=x(n)∣Y=ck)=j=1∏nP(X(j)=x(j)∣Y=ck)

二、朴素贝叶斯算法

对于训练数据：

T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } , x i = ( x i ( 1 ) , x i ( 2 ) , . . . , x i ( n ) ) T , x i ( j ) ∈ { a j 1 , a j 2 , . . . , a j S j } T = \{ ({x_1},{y_1}),({x_2},{y_2}),...,({x_N},{y_N})\} ,\\{x_i} = {(x_i^{(1)},x_i^{(2)},...,x_i^{(n)})^T} ,x_i^{(j)} \in \{ {a_{j1}},{a_{j2}},...,{a_{j{S_j}}}\} T={(x1,y1),(x2,y2),...,(xN,yN)},xi=(xi(1),xi(2),...,xi(n))T,xi(j)∈{aj1,aj2,...,ajSj}， a j l {a_{jl}} ajl是第j个特征可能取的第 l l l个值， y i ∈ { c 1 , c 2 , . . . , c K } y_i^{} \in \{ {c_1},{c_2},...,{c_K}\} yi∈{c1,c2,...,cK}。

计算先验概率和条件概率

先验概率： P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N , k = 1 , 2 , . . . , K ( K 为 Y 的类别总数 ) 先验概率：P(Y = {c_k}) = \frac{{\sum\limits_{i = 1}^N {I({y_i} = {c_k})} }}{N},k = 1,2,...,K(K为Y的类别总数) 先验概率：P(Y=ck)=Ni=1∑NI(yi=ck),k=1,2,...,K(K为Y的类别总数)

条件概率： P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) , j = 1 , 2 , . . , n ; l = 1 , 2 , . . . , S j ; k = 1 , 2 , . . . , K 条件概率：P({X^{(j)}} = {a_{jl}}|Y = {c_k}) = \frac{{\sum\limits_{i = 1}^N {I(x_i^{(j)} = {a_{jl}},{y_i} = {c_k})} }}{{\sum\limits_{i = 1}^N {I({y_i} = {c_k})} }},\\ \\ \\ j = 1,2,..,n; l = 1,2,...,{S_j};k = 1,2,...,K 条件概率：P(X(j)=ajl∣Y=ck)=i=1∑NI(yi=ck)i=1∑NI(xi(j)=ajl,yi=ck),j=1,2,..,n;l=1,2,...,Sj;k=1,2,...,K

对于给定实例，计算：

P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) , k = 1 , 2 , . . . , K P(Y = {c_k})\prod\limits_{j = 1}^n {P({X^{(j)}} = {x^{(j)}}|Y = {c_k}),k = 1,2,...,K} P(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck),k=1,2,...,K

确定实例x的类别

y = arg ⁡ max ⁡ P ( Y = c k ) ∏ j = 1 n P ( X ( j ) = x ( j ) ∣ Y = c k ) y = \arg \max P(Y = {c_k})\prod\limits_{j = 1}^n {P({X^{(j)}} = {x^{(j)}}|Y = {c_k})} y=argmaxP(Y=ck)j=1∏nP(X(j)=x(j)∣Y=ck)

注意：如果某个属性值在训练集中没有与某个类同时出现同时出现过，则直接基于上述算法进行概率估计进而判别将出现问题，这可能导致算法步骤 2 2 2连乘式计算出来的概率值为零，为了避免上述情况即其他属性携带的信息被训练集中未出现的属性值抹去，在估计概率值时通常要进行平滑：常用拉普拉斯修正来解决这个问题。

三、代码实现

In the case of categorical variables, such as counts or labels, a multinomial distribution can be used. If the variables are binary, such as yes/no or true/false, a binomial distribution can be used. If a variable is numerical, such as a measurement, often a Gaussian distribution is used.
Binary: Binomial distribution.Categorical: Multinomial distribution.Numeric: Gaussian distribution.
These three distributions are so common that the Naive Bayes
implementation is often named after the distribution. For example:
Binomial Naive Bayes: Naive Bayes that uses a binomial distribution.Multinomial Naive Bayes: Naive Bayes that uses a multinomial distribution.Gaussian Naive Bayes: Naive Bayes that uses a Gaussian distribution.
Using one of the three common distributions is not mandatory; for example, if a real-valued variable is known to have a different specific distribution, such as exponential, then that specific distribution may be used instead. If a real-valued variable does not have a well-defined distribution, such as bimodal or multimodal, then a kernel density estimator can be used to estimate the probability distribution instead. a good example of a kernel density estimator

3.1 离散数据

对于书中P63页例4.1的Python实现如下。首先将数据用pandas.DataFrame存储：

3.1.1 数据准备

#train datax = [[1,'S'],[1,'M'],[1,'M'], [1,'S'], [1,'S'], [2, 'S'], [2, 'M'], [2, 'M'], [2, 'L'], [2, 'L'], [3, 'L'], [3, 'M'], [3, 'M'], [3, 'L'], [3, 'L']]y = [-1,-1,1,1,-1,-1,-1,1,1,1,1,1,1,1,-1]#test datat = [2, 'S']x = pd.DataFrame(x)y = pd.DataFrame(y, columns=['class'])data = pd.concat([x, y], axis=1)

3.1.2 Python实现

def prior_prob(y_class):# 计算先验概率return sum(data['class']==y_class) / len(data)def cond_prob(y_class, x_j):# 计算条件概率prob = {}for i in data[x_j].unique():prob[i] = sum((data[x_j]==i) & (data['class']==y_class)) / sum(data['class']==y_class)return probdef predict(k):# k为带预测样本temp = {}for j in data['class'].unique():x = 1for i in range(len(k)):temp_cond_prob = cond_prob(j, i)temp_prior_prob = prior_prob(j)x = x * temp_cond_prob[k[i]]x = x * temp_prior_probtemp[j] = xreturn max(temp, key=temp.get)

3.2连续数据

本节将利用编程语言实现高斯朴素贝叶斯，其适用于连续变量，高斯朴素贝叶斯假设各个特征 x i {x_i} xi在各个类别下服从正态分布：

P ( x i ∣ y k ) = 1 2 π σ y 2 exp ⁡ ( − ( x i − y i ) 2 2 σ y 2 ) P({{\rm{x}}_i}|{y_k}) = \frac{1}{{\sqrt {2\pi \sigma _y^2} }}\exp ( - \frac{{{{({x_i} - {y_i})}^2}}} {{2\sigma _y^2}}) P(xi∣yk)=2πσy2 1exp(−2σy2(xi−yi)2)

μ y {{\mu}_y} μy指的是在类别为 y y y的样本中，特征 x i {x_i} xi的样本均值 σ y {{\sigma}_y} σy指的是在类别为 y y y的样本中，特征 x i {x_i} xi的标准差

3.2.1 数据准备

from sklearn.datasets import make_blobsfrom sklearn.model_selection import train_test_splitfeatures, labels = make_blobs(100, n_features=2, centers=2, cluster_std=0.5, random_state=42)x_train, x_test, y_train, y_test = train_test_split(features, labels, train_size=0.8, random_state=42)traindf = pd.DataFrame(x_train)testdf = pd.DataFrame(x_test)trainlabel = pd.DataFrame(y_train,columns=['class'])data = pd.concat([traindf, trainlabel],axis=1)

3.2.2 Python实现

未引入拉普拉斯平滑：

class gaussian_NB:def __init__(self, traindata):self.data = traindataself.label_list = traindata['class'].unique()def prior_prob(self, y_class):# 计算先验概率return sum(self.data['class']==y_class) / len(self.data)def sigmu(self, x_j):# 计算类别为i的样本中，训练样本第x_j维特征的样本均值和标准差# 并存储在一个dict中，格式为{'label':[sigma, mu],...}prob = {}for i in self.label_list:sigma_y = self.data[x_j][self.data['class']==i].std()mu_y = self.data[x_j][self.data['class']==i].mean()prob[i] = [sigma_y, mu_y]return probdef cond_prob(self, feat):# 计算条件概率，probs中存储格式：{'label':[probability]}# probability的计算与算法步骤2符合# probability大小为1*len(feat)probs = {label: [] for label in self.label_list}for k in range(len(feat)):# 第k维的数据sigma_mu = sigmu(k)for j in sigma_mu.keys():sigma, mu = sigma_mu[j]temp = (1 / np.sqrt(2*np.pi*np.square(sigma))) * np.exp(-np.square(feat[k]-mu) / 2*np.square(sigma))probs[j].append(temp)return probsdef predict(self, feat):# 对于新样本样本进行预测，输出判断的类别标签temp_cond_prob = cond_prob(feat)pred_prob = {}for i in self.label_list:temp = np.log(prior_prob(i))for j in temp_cond_prob[i]:temp = temp * np.log(j)pred_prob[i] = tempreturn max(pred_prob, key=pred_prob.get)def test(self, testdata):# 批量预测pred_list = []for i in testdata.values:pred_list.append(self.predict(i))return pred_listdef acc(pred_label, true_label):# 计算精确率return sum(pred_label==true_label) / len(true_label)

四、扩展阅读

How to Develop a Naive Bayes Classifier from Scratch in PythonA Gentle Introduction to Bayes Theorem for Machine LearningBetter Naive Bayes: 12 Tips To Get The Most From The Naive Bayes Algorithm统计学习方法习题解答-DataWhale

如果觉得《【统计学习方法】朴素贝叶斯》对你有帮助，请点赞、收藏，并留下你的观点哦！

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。