普通最小二乘法线性回归
若数据集DDD由nnn个属性描述,则线性回归的假设函数为:
hw,b(x)=∑i=1nwixi+b=wTx+bh_{\boldsymbol{w}, b}(\boldsymbol{x})=\sum_{i=1}^{n} w_{i} x_{i}+b=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}+b hw,b(x)=i=1∑nwixi+b=wTx+b
其中,w∈Rn\boldsymbol{w}\in \mathbb{R}^nw∈Rn与b∈Rb\in \mathbb{R}b∈R为模型参数。
为了方便,我们通常将bbb纳入权向量w\boldsymbol{w}w,作为w0w_0w0,同时为输入向量x\boldsymbol{x}x添加一个常数1,作为x0x_0x0.
w=(b,w1,w2,…wn)Tx=(1,x1,x2,…xn)T\begin{array}{c}\boldsymbol{w}=\left(b, w_{1}, w_{2}, \ldots w_{n}\right)^{\mathrm{T}} \\\boldsymbol{x}=\left(1, x_{1}, x_{2}, \ldots x_{n}\right)^{\mathrm{T}}\end{array} w=(b,w1,w2,…wn)Tx=(1,x1,x2,…xn)T
此时,假设函数为:
hw(x)=∑i=0nwixi=wTxh_{\boldsymbol{\boldsymbol{w}}}(\boldsymbol{x})=\sum_{i=0}^{n} w_{i} x_{i}=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x} hw(x)=i=0∑nwixi=wTx
其中,w∈Rn+1\boldsymbol{w}\in \mathbb{R}^{n+1}w∈Rn+1,通过训练确定模型参数w\boldsymbol{w}w后,便可使用模型对新的输入实例进行预测。
使用均方误差(MSE)作为损失函数,假设训练集DDD有mmm个样本,均方误差损失函数定义为
J(w)=12m∑i=1m(hw(xi)−yi)2=12m∑i=1m(wTx−yi)2\begin{aligned}J(\boldsymbol{w}) &=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\boldsymbol{w}}\left(\boldsymbol{x}_{i}\right)-y_{i}\right)^{2} \\&=\frac{1}{2 m} \sum_{i=1}^{m}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}-y_{i}\right)^{2}\end{aligned} J(w)=2m1i=1∑m(hw(xi)−yi)2=2m1i=1∑m(wTx−yi)2
损失函数J(w)J(w)J(w)最小值点是其极值点,可先求J(w)J(w)J(w)对www的梯度并令其为0,再通过解方程求得。
计算J(w)J(\boldsymbol{w})J(w)的梯度:
∇J(w)=12m∑i=1m∂∂w(wTxi−yi)2=12m∑i=1m2(wTxi−yi)∂∂w(wTxi−yi)=1m∑i=1m(wTxi−yi)xi\begin{aligned}\nabla J(\boldsymbol{w}) &=\frac{1}{2 m} \sum_{i=1}^{m} \frac{\partial}{\partial \boldsymbol{w}}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right)^{2} \\&=\frac{1}{2 m} \sum_{i=1}^{m} 2\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \frac{\partial}{\partial \boldsymbol{w}}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \\&=\frac{1}{m} \sum_{i=1}^{m}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \boldsymbol{x}_{i}\end{aligned} ∇J(w)=2m1i=1∑m∂w∂(wTxi−yi)2=2m1i=1∑m2(wTxi−yi)∂w∂(wTxi−yi)=m1i=1∑m(wTxi−yi)xi
以上公式使用矩阵运算描述形式更为简洁,设:
X=[1,x11,x12…x1n1,x21x22…x2n⋮⋮⋮⋱⋮1,xm1xm2…xmn]=[x1Tx2T⋮xmT]\boldsymbol{X}=\left[\begin{array}{ccccc}1, & x_{11}, & x_{12} & \ldots & x_{1 n} \\1, & x_{21} & x_{22} & \ldots & x_{2 n} \\\vdots & \vdots & \vdots & \ddots & \vdots \\1, & x_{m 1} & x_{m 2} & \ldots & x_{m n}\end{array}\right]=\left[\begin{array}{c}\boldsymbol{x}_{1}^{\mathrm{T}} \\\boldsymbol{x}_{2}^{\mathrm{T}} \\\vdots \\\boldsymbol{x}_{m}^{\mathrm{T}}\end{array}\right] X=⎣⎢⎢⎢⎡1,1,⋮1,x11,x21⋮xm1x12x22⋮xm2……⋱…x1nx2n⋮xmn⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡x1Tx2T⋮xmT⎦⎥⎥⎥⎤
y=[y1y2⋮ym]\boldsymbol{y}=\left[\begin{array}{c}y_{1} \\y_{2} \\\vdots \\y_{m}\end{array}\right] y=⎣⎢⎢⎢⎡y1y2⋮ym⎦⎥⎥⎥⎤
w=[bw1w2⋮wn]\boldsymbol{w}=\left[\begin{array}{c}b \\w_{1} \\w_{2} \\\vdots \\w_{n}\end{array}\right] w=⎣⎢⎢⎢⎢⎢⎡bw1w2⋮wn⎦⎥⎥⎥⎥⎥⎤
那么,梯度计算公式可写为:
∇J(w)=1m∑i=1m(wTxi−yi)xi\nabla J(\boldsymbol{w})=\frac{1}{m} \sum_{i=1}^{m}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}-y_{i}\right) \boldsymbol{x}_{i} ∇J(w)=m1i=1∑m(wTxi−yi)xi
=[x1,x2,…,xm][wTx1−y1wTx2−y2⋮wTxm−ym]=\left[\begin{array}{c}\boldsymbol{x}_1,\boldsymbol{x}_2,\dots,\boldsymbol{x}_m\end{array}\right]\left[\begin{array}{c}\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{1}-y_{1} \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{2}-y_{2} \\\vdots \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{m}-y_{m}\end{array}\right]=[x1,x2,…,xm]⎣⎢⎢⎢⎡wTx1−y1wTx2−y2⋮wTxm−ym⎦⎥⎥⎥⎤
=[x1,x2,…,xm]([wTx1wTx2⋮wTxm]−[y1y2⋮ym])=\left[\begin{array}{c}\boldsymbol{x}_1,\boldsymbol{x}_2,\dots,\boldsymbol{x}_m\end{array}\right]\left(\left[\begin{array}{c}\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{1} \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{2} \\\vdots \\\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{m}\end{array}\right]-\left[\begin{array}{c}y_{1} \\y_{2} \\\vdots \\y_m\end{array}\right]\right)=[x1,x2,…,xm]⎝⎜⎜⎜⎛⎣⎢⎢⎢⎡wTx1wTx2⋮wTxm⎦⎥⎥⎥⎤−⎣⎢⎢⎢⎡y1y2⋮ym⎦⎥⎥⎥⎤⎠⎟⎟⎟⎞
=1mXT(Xw−y)=\frac{1}{m} \boldsymbol{X}^{\mathrm{T}}(\boldsymbol{X} \boldsymbol{w}-\boldsymbol{y})=m1XT(Xw−y)
令梯度为0,解得:
w^=(XTX)−1XTy\boldsymbol{\hat{w}}=\left(\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{\mathrm{T}} \boldsymbol{y} w^=(XTX)−1XTy
w^\boldsymbol{\hat{w}}w^即为使得损失函数(均方误差)最小的w\boldsymbol{w}w。以上求解最优w\boldsymbol{w}w的方法被称为普通最小二乘法(Ordinary Least Squares,OLS)。
import numpy as npclass OLSLinearRession:def _ols(self, X, y):'''普通最小二乘法估算w'''tmp = np.linalg.inv(np.matmul(X.T, X))tmp = np.matmul(tmp, X.T)w = np.matmul(tmp, y)return wdef _preprocess_data(self, X):'''数据预处理:添加x0=1'''m, n = X.shapeX_ = np.ones((m, n + 1))X_[:, 1:] = Xreturn X_def train(self, X, y):'''训练模型'''X = self._preprocess_data(X)self.w = self._ols(X, y)def predict(self, X):'''预测'''X = self._preprocess_data(X)y = np.matmul(X, self.w)return y
如果觉得《【python机器学习】普通最小二乘法多元线性回归》对你有帮助,请点赞、收藏,并留下你的观点哦!