失眠网 > PyTorch常用参数初始化方法详解

PyTorch常用参数初始化方法详解

时间：2023-07-24 12:44:32

相关推荐

PyTorch常用参数初始化方法详解

Python微信订餐小程序课程视频

/course/detail/36074

Python实战量化交易理财系统

/course/detail/35475

1、均匀分布初始化

torch.nn.init.uniform_(tensor, a=0, b=1)

从均匀分布U(a, b)中采样，初始化张量。参数：

tensor - 需要填充的张量 a - 均匀分布的下界b - 均匀分布的上界

例子：

w = torch.empty(3, 5)nn.init.uniform\_(w)"""tensor([[0.2116, 0.3085, 0.5448, 0.6113, 0.7697],[0.8300, 0.2938, 0.4597, 0.4698, 0.0624],[0.5034, 0.1166, 0.3133, 0.3615, 0.3757]])"""

均匀分布详解：

若 xxx 服从均匀分布，即 xU(a,b)xU(a,b)x~U(a,b)，其概率密度函数（表征随机变量每个取值有多大的可能性）为，

f(x)={1b−a,a<x<b0,elsef(x)={1b−a,a<x<b0,elsef(x)=\left{\begin{array}{l}\frac{1}{b-a}, \quad a<x<b \ 0, \quad else \end{array}\right.

则有期望和方差，

E(x)=∫∞−∞xf(x)dx=12(a+b)D(x)=E(x2)−[E(x)]2=(b−a)212E(x)=∫∞−∞xf(x)dx=12(a+b)D(x)=E(x2)−[E(x)]2=(b−a)212\begin{array}{c}E(x)=\int_{-\infty}^{\infty} x f(x) d x=\frac{1}{2}(a+b) \D(x)=E\left(x{2}\right)-[E(x)]{2}=\frac{(b-a)^{2}}{12}\end{array}

2、正态(高斯)分布初始化

torch.nn.init.normal_(tensor, mean=0.0, std=1.0)

从给定的均值和标准差的正态分布 N(mean,std2)N(mean,std2)N\left(\right. mean, \left.s t d^{2}\right) 中生成值，初始化张量。

参数:

tensor - 需要填充的张量 mean - 正态分布的均值std - 正态分布的标准偏差

例子：

w = torch.Tensor(3, 5)torch.nn.init.normal\_(w, mean=0, std=1)"""tensor([[-1.3903, 0.4045, 0.3048, 0.7537, -0.5189],[-0.7672, 0.1891, -0.2226, 0.2913, 0.1295],[ 1.4719, -0.3049, 0.3144, -1.0047, -0.5424]])"""

正态分布详解:

若随机变量 xxx 服从正态分布，即 x∼N(μ,σ2)x∼N(μ,σ2)x \sim N\left(\mu, \sigma^{2}\right) , 其概率密度函数为，

f(x)=1σ√2πexp(−(x−μ2)2σ2)f(x)=\frac{1}{\sigma \sqrt{2 \pi}} \exp \left(-\frac{\left(x-\mu^{2}\right)}{2 \sigma^{2}}\right)

正态分布概率密度函数中一些特殊的概率值:

68.268949% 的面积在平均值左右的一个标准差 σ\sigma 范围内 (μ±σ\mu \pm \sigma) 95.449974% 的面积在平均值左右两个标准差 2σ2 \sigma 的范围内 (μ±2σ\mu \pm 2 \sigma)99.730020% 的面积在平均值左右三个标准差 3σ3 \sigma 的范围内 (μ±3σ\mu \pm 3 \sigma)99.993666% 的面积在平均值左右四个标准差 4σ4 \sigma 的范围内 (μ±4σ\mu \pm 4 \sigma)

μ=0\mu=0, σ=1\sigma=1 时的正态分布是标准正态分布。

3. Xavier初始化

3.1 Xavier均匀分布初始化

torch.nn.init.xavier_uniform_(tensor, gain=1.0)

又称 Glorot 初始化，按照 Glorot, X. & Bengio, Y.()在论文Understanding the difficulty of training deep feedforward neural networks 中描述的方法，从均匀分布 U(−a,a)U(−a, a) 中采样，初始化输入张量 tensortensor，其中 aa 值由下式确定：

a=gain×√6fan_in+fan_outa=\text { gain } \times \sqrt{\frac{6}{\text { fan_in }+\text { fan_out }}}

例子：

w = torch.Tensor(3, 5)nn.init.xavier\_uniform\_(w, gain=torch.nn.init.calculate\_gain('relu'))"""tensor([[ 0.7695, -0.7687, -0.2561, -0.5307, 0.5195],[-0.6187, 0.4913, 0.3037, -0.6374, 0.9725],[-0.2658, -0.4051, -1.1006, -1.1264, -0.1310]])"""

3.2 Xavier正态分布初始化

torch.nn.init.xavier_normal_(tensor, gain=1.0)

又称 Glorot 初始化，按照 Glorot, X. & Bengio, Y.()在论文Understanding the difficulty of training deep feedforward neural networks 中描述的方法，从均匀分布 N(0,std2)N\left(0, s t d^{2}\right) 中采样，初始化输入张量 tensortensor，其中 stdstd 值由下式确定：

std=gain×√2fan_in+fan_out\operatorname{std}=\text { gain } \times \sqrt{\frac{2}{\text { fan_in }+\text { fan_out }}}

参数:

tensor - 需要初始化的张量 gain - 可选的放缩因子

例子：

w = torch.arange(10).view(2,-1).type(torch.float32)torch.nn.init.xavier\_normal\_(w)"""tensor([[-0.3139, -0.3557, 0.1285, -0.9556, 0.3255],[-0.6212, 0.3405, -0.4150, -1.3227, -0.0069]])"""

4. kaiming初始化

4.1 kaiming均匀分布初始化

torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')

又称 He 初始化，按照He, K. et al. ()在论文Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification中描述的方法，从均匀分布U(−bound,bound)U(−bound, bound) 中采样，初始化输入张量 tensor，其中 bound 值由下式确定：

bound=gain×√3fan_mode\text { bound }=\text { gain } \times \sqrt{\frac{3}{\text { fan_mode }}}

参数:

tensor - 需要初始化的张量； a\mathrm{a}- 这层之后使用的 rectifier的斜率系数，用来计算gain =\sqrt{\frac{2}{1+\mathrm{a}^{2}}} (此参数仅在参数nonlinea rity为’leaky_relu’时生效)；mode - 可以为“fan_in”（默认）或“fan_out”。“fan_in”维持前向传播时权值方差，“fan_out”维持反向传播时的方差；nonlinearity - 非线性函数（nn.functional中的函数名），pytorch建议仅与“relu”或“leaky_relu”(默认)一起使用；

例子：

w = torch.Tensor(3, 5)torch.nn.init.kaiming\_uniform\_(w, mode='fan\_in', nonlinearity='relu')"""tensor([[-0.4362, -0.8177, -0.7034, 0.7306, -0.6457],[-0.5749, -0.6480, -0.8016, -0.1434, 0.0785],[ 1.0369, -0.0676, 0.7430, -0.2484, -0.0895]])"""

4.2 kaiming正态分布初始化

torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')

又称He初始化，按照He, K. et al. ()在论文Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification中描述的方法，从正态分布 N(0,std2)N\left(0, s t d^{2}\right) 中采样，初始化输入张量tensor，其中std值由下式确定：

参数:

tensor - 需要初始化的张量； a\mathrm{a} - 这层之后使用的 rectifier 的斜率系数，用来计算 gain=√21+a2gain =\sqrt{\frac{2}{1+\mathrm{a}^{2}}} (此参数仅在参数nonlinea rity为’leaky_relu’时生效)；mode - 可以为"fan_in" (默认) 或“fan_out"。"fan_in"维持前向传播时权值方差，"fan_out"维持反向传播时的方差；nonlinearity - 非线性函数 (nn.functional中的函数名)，pytorch建议仅与“relu”或"leaky_relu”(默认)一起使用；

5、正交矩阵初始化

torch.nn.init.orthogonal_(tensor, gain=1)

用一个(半)正交矩阵初始化输入张量，参考Saxe, A. et al. () - Exact solutions to the nonlinear dynamics of learning in deep linear neural networks。输入张量必须至少有 2 维，对于大于 2 维的张量，超出的维度将被flatten化。

正交初始化可以使得卷积核更加紧凑，可以去除相关性，使模型更容易学到有效的参数。

参数:

tensor - 需要初始化的张量 gain - 可选的放缩因子

例子：

w = torch.Tensor(3, 5)torch.nn.init.orthogonal\_(w)"""tensor([[ 0.7395, -0.1503, 0.4474, 0.4321, -0.2090],[-0.2625, 0.0112, 0.6515, -0.4770, -0.5282],[ 0.4554, 0.6548, 0.0970, -0.4851, 0.3453]])"""

6、稀疏矩阵初始化

torch.nn.init.sparse_(tensor, sparsity, std=0.01)

将2维的输入张量作为稀疏矩阵填充，其中非零元素由正态分布 N(0,0.012)N\left(0,0.01^{2}\right) 生成。参考Martens, J.()的 Deep learning via Hessian-free optimization。

参数:

tensor - 需要填充的张量 sparsity - 每列中需要被设置成零的元素比例std - 用于生成非零元素的正态分布的标准偏差

例子：

w = torch.Tensor(3, 5)torch.nn.init.sparse\_(w, sparsity=0.1)"""tensor([[-0.0026, 0.0000, 0.0100, 0.0046, 0.0048],[ 0.0106, -0.0046, 0.0000, 0.0000, 0.0000],[ 0.0000, -0.0005, 0.0150, -0.0097, -0.0100]])"""

7、常数初始化

torch.nn.init.constant_(tensor, val)

使值为常数 val 。

例子：

w=torch.Tensor(3,5)nn.init.constant\_(w,1.2)"""tensor([[1.2000, 1.2000, 1.2000, 1.2000, 1.2000],[1.2000, 1.2000, 1.2000, 1.2000, 1.2000],[1.2000, 1.2000, 1.2000, 1.2000, 1.2000]])"""

8、单位矩阵初始化

torch.nn.init.eye_(tensor)

将二维 tensor 初始化为单位矩阵（the identity matrix）

例子：

w=torch.Tensor(3,5)nn.init.eye\_(w)"""tensor([[1., 0., 0., 0., 0.],[0., 1., 0., 0., 0.],[0., 0., 1., 0., 0.]])"""

9、零填充初始化

torch.nn.init.zeros_(tensor)

例子：

w = torch.empty(3, 5)nn.init.zeros\_(w)"""tensor([[0., 0., 0., 0., 0.],[0., 0., 0., 0., 0.],[0., 0., 0., 0., 0.]])"""

10、应用

例子：

print('module-----------')print(model)print('setup-----------')for m in model.modules():if isinstance(m,nn.Linear):nn.init.xavier\_uniform\_(m.weight, gain=nn.init.calculate\_gain('relu'))"""module-----------Sequential((flatten): FlattenLayer()(linear1): Linear(in\_features=784, out\_features=512, bias=True)(activation): ReLU()(linear2): Linear(in\_features=512, out\_features=256, bias=True)(linear3): Linear(in\_features=256, out\_features=10, bias=True))setup-----------"""

例子：

for param in model.parameters():nn.init.uniform\_(param)

例子：

def weights\_init(m):classname = m.\_\_class\_\_.\_\_name\_\_if classname.find('Conv2d') != -1:nn.init.xavier\_normal\_(m.weight.data)nn.init.constant\_(m.bias.data, 0.0)elif classname.find('Linear') != -1:nn.init.xavier\_normal\_(m.weight)nn.init.constant\_(m.bias, 0.0)model.apply(weights\_init) #apply函数会递归地搜索网络内的所有module并把参数表示的函数应用到所有的module上。

1、均匀分布初始化

2、正态(高斯)分布初始化

3. Xavier初始化

3.1 Xavier均匀分布初始化

3.2 Xavier正态分布初始化

4. kaiming初始化

4.1 kaiming均匀分布初始化

4.2 kaiming正态分布初始化

5、正交矩阵初始化

6、稀疏矩阵初始化

7、常数初始化

8、单位矩阵初始化