目录
- 1 改善模型性能
- 2 线性模型 fit 圆圈数据
- 3 线性模型 fit 线性方程
- 4 加入非线性激活函数 fit 圆圈数据
- 5 复现非线性激活函数
- 5.1 ReLU
- 5.2 sigmoid
OK, 今天我们学习如何改善模型的性能
1 改善模型性能
以下有几种供我们考虑的思路:
- 添加模型层数
- 添加神经元的个数
- 增加训练的周期
- 选择更好的损失函数
- 调整学习率
- 选择更优的优化器
- 更改激活函数
让我们创建一个模型看看吧,Let’s have a try!
2 线性模型 fit 圆圈数据
# 制作数据
from sklearn.datasets import make_circles# 创建1000个样本
n_samples = 1000# 创建我们的圆圈样本
X, y = make_circles(n_samples,noise=0.03, # 每个点的噪声random_state=42) # 保证我们获得相同的值
可视化来看一下数据
import matplotlib.pyplot as plt
plt.scatter(x=X[:,0],y=X[:,1],c=y,cmap=plt.cm.RdYlBu)
# 将数据转换为张量,并将数据转换为默认数据格式
import torch
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)# 查看一下前五个样本
X[:5],y[:5]
(tensor([[ 0.7542, 0.2315],
[-0.7562, 0.1533],
[-0.8154, 0.1733],
[-0.3937, 0.6929],
[ 0.4422, -0.8967]]),
tensor([1., 1., 1., 1., 0.]))
# 划分数据为训练集和测试集
from sklearn.model_selection import train_test_split# test_size=0.2 是说测试数据占数据的20%,因为这个方法是随机划分的,因此我们这里设置了random_state=42,这样就有助于我们复现代码
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2,random_state=42)
len(X_train), len(y_train), len(X_test), len(y_test)
(800, 800, 200, 200)
import torch.nn as nn
import torchdevice = "cuda" if torch.cuda.is_available() else "cpu"
device
import torch.nn as nn
class CircleClassificationV2(nn.Module):def __init__(self):super().__init__()self.layer_1 = nn.Linear(in_features=2, out_features=10)self.layer_2 = nn.Linear(in_features=10, out_features=10)self.layer_3 = nn.Linear(in_features=10, out_features=1)def forward(self, x):return self.layer_3(self.layer_2(self.layer_1(x)))model_2 = CircleClassificationV2().to(device)
model_2
CircleClassificationV2(
(layer_1): Linear(in_features=2, out_features=10, bias=True)
(layer_2): Linear(in_features=10, out_features=10, bias=True)
(layer_3): Linear(in_features=10, out_features=1, bias=True)
)
可以看出这里有三层线形层,同时out_features的数量也增加了
# 损失函数
loss_fn = nn.BCEWithLogitsLoss()# 优化器
optimizer = optim.SGD(params=model_2.parameters(),lr=0.1)
# 设置训练周期
epochs = 1000# 将数据都放到统一的设备上
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)# 训练循环
for epoch in range(epochs):model_2.train()y_logits = model_2(X_train).squeeze()y_pred = torch.round(torch.sigmoid(y_logits))# 损失函数loss = loss_fn(y_logits,y_train)acc = accuracy_fn(y_true=y_train,y_pred=y_pred)optimizer.zero_grad()loss.backward()optimizer.step()# 测试model_0.eval()with torch.inference_mode():test_logits = model_2(X_test).squeeze()test_pred = torch.round(torch.sigmoid(test_logits))test_loss = loss_fn(test_logits, y_test)test_acc = accuracy_fn(y_true=y_test,y_pred=test_pred)# 打印输出if epoch % 100 == 0:print(f"Epoch:{epoch} | Train loss:{loss:.5f} | Train accuracy:{acc:.2f}% | Test loss:{test_loss:.4f} | Test accuracy:{test_acc:.2f}%")
Epoch:0 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%
Epoch:100 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%
Epoch:200 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%
Epoch:300 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%
Epoch:400 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%
Epoch:500 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%
Epoch:600 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%
Epoch:700 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%
Epoch:800 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%
Epoch:900 | Train loss:0.69381 | Train accuracy:50.00% | Test loss:0.6957 | Test accuracy:50.00%
分类的损失并没有变小,准确率仍然是50%,这就意味着模型进行分类,就是随机的,就像抛硬币一样,50%正面,50%反面。让我们可视化看一下。
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Training")
plot_decision_boundary(model_2, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title("Testing")
plot_decision_boundary(model_2, X_test, y_test)
从这个图片,我们可以看出,这个图像的分解仍然是一条线,在右上角。那这是我们的模型没有进行学习吗?还记得我们之前学习的线性回归吗? y = weight * X + bias, 我们用这个来试一试就知道这个模型有没有学习数据了。
3 线性模型 fit 线性方程
# 创建数据
weight = 0.7
bias = 0.3start = 0
end = 1
step = 0.01X = torch.arange(start, end, step).unsqueeze(dim=1)
y = weight * X + biasprint(len(X), len(y))
print(X[:5], y[:5])
100 100
tensor([[0.0000],
[0.0100],
[0.0200],
[0.0300],
[0.0400]]) tensor([[0.3000],
[0.3070],
[0.3140],
[0.3210],
[0.3280]])
# 将数据划分为训练集和测试集
train_split = int(0.8 * len(X))
X_train, y_train = X[:train_split], y[:train_split]
X_test, y_test = X[train_split:], y[train_split:]len(X_train), len(y_train), len(X_test), len(y_test)
(80, 80, 20, 20)
plot_predictions(train_data=X_train,train_labels=y_train,test_data=X_test,test_labels=y_test)
# 创建设备无关的代码
device = "cuda" if torch.cuda.is_available() else "cpu"
device
‘cuda’
# 设置CPU上的随机种子
torch.manual_seed(42)# 设置GPU上的随机种子
torch.cuda.manual_seed(42)# 将数据放到GPU上
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)
model_2
CircleClassificationV2(
(layer_1): Linear(in_features=2, out_features=10, bias=True)
(layer_2): Linear(in_features=10, out_features=10, bias=True)
(layer_3): Linear(in_features=10, out_features=1, bias=True)
)
从这里可以看出模型的输入是2,但我们这里线性回归的输入是1,所以这里注意需要更改
# 创建模型,这里采用 nn.Sequential 来构建模型,因为是顺序的,这样简单一点,和model_2一样的结构
model_1 = nn.Sequential(nn.Linear(in_features = 1, out_features = 10),nn.Linear(in_features = 10, out_features = 10),nn.Linear(in_features = 10, out_features = 1)
)model_1.to(device)
model_1
Sequential(
(0): Linear(in_features=1, out_features=10, bias=True)
(1): Linear(in_features=10, out_features=10, bias=True)
(2): Linear(in_features=10, out_features=1, bias=True)
)
# 损失函数, 因为是线性的,所以我们肯定是用MAE的
loss_fn = nn.L1Loss()# 优化器
optimizer = optim.SGD(params=model_1.parameters(),lr=0.01)
# 训练的周期
epochs = 1000for epoch in range(epochs):# 训练model_1.train()# 前向传播y_pred = model_1(X_train)# 损失loss = loss_fn(y_pred, y_train)# 梯度清零optimizer.zero_grad()# 反向传播loss.backward()# 梯度下降optimizer.step()# 测试model_1.eval()with torch.inference_mode():test_pred = model_1(X_test)test_loss = loss_fn(test_pred, y_test)# 打印结果if epoch % 100 == 0:print(f"Epoch:{epoch} | Train loss:{loss:.4f} | Test loss:{test_loss:.4f}")
Epoch:0 | Train loss:0.7599 | Test loss:0.9110
Epoch:100 | Train loss:0.0286 | Test loss:0.0008
Epoch:200 | Train loss:0.0253 | Test loss:0.0021
Epoch:300 | Train loss:0.0214 | Test loss:0.0031
Epoch:400 | Train loss:0.0196 | Test loss:0.0034
Epoch:500 | Train loss:0.0194 | Test loss:0.0039
Epoch:600 | Train loss:0.0190 | Test loss:0.0038
Epoch:700 | Train loss:0.0188 | Test loss:0.0038
Epoch:800 | Train loss:0.0184 | Test loss:0.0033
Epoch:900 | Train loss:0.0180 | Test loss:0.0036
可视化看一下
model_1.eval()
with torch.inference_mode():y_pred = model_1(X_test)
plot_predictions(train_data=X_train.cpu(),train_labels=y_train.cpu(),test_data=X_test.cpu(),test_labels=y_test.cpu(),predictions=y_pred.cpu())
设置一下学习率,马上红色和绿色的点就更接近,那从这里我们可以看出定义的模型是对数据进行了学习的,那为什么效果不好呢,可以想一下,我们的数据是圆形和线性明显是不一样的,这里就不得不提出一个概念,非线性。相信了解过机器学习的是不是都听过ReLu()、Sigmoid()、Tanh()
这些非线性的函数,加入非线性我们才能更好地学习数据。
4 加入非线性激活函数 fit 圆圈数据
# 创建数据
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_splitX, y = make_circles(n_samples=1000,noise=0.03,random_state=42)# 将 X, y转换为张量
X = torch.from_numpy(X).type(torch.float)
y = torch.from_numpy(y).type(torch.float)
X[:5], y[:5]
(tensor([[ 0.7542, 0.2315],
[-0.7562, 0.1533],
[-0.8154, 0.1733],
[-0.3937, 0.6929],
[ 0.4422, -0.8967]]),
tensor([1., 1., 1., 1., 0.]))
# 绘制个图像看看
import matplotlib.pyplot as plt
plt.scatter(x = X[:,0],y = X[:,1],c=y,cmap=plt.cm.RdYlBu)
# 将数据集划分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
len(X_train), len(y_train), len(X_test), len(y_test)
(800, 800, 200, 200)
# 设备无关的代码
device = "cuda" if torch.cuda.is_available() else "cpu"
device
‘cuda’
# 将数据放到统一的设备上
X_train, y_train = X_train.to(device), y_train.to(device)
X_test, y_test = X_test.to(device), y_test.to(device)
创建模型,并将模型实例化,把他放到指定的设备上
class CircleClassificationV3(nn.Module):def __init__(self):super().__init__()self.layer1 = nn.Linear(in_features=2, out_features=10)self.layer2 = nn.Linear(in_features=10, out_features=10)self.layer3 = nn.Linear(in_features=10, out_features=1)self.relu = nn.ReLU()def forward(self, x):# 这里用了两个ReLU()啊return self.layer3(self.relu(self.layer2(self.relu(self.layer1(x)))))model_3 = CircleClassificationV3().to(device)
model_3
CircleClassificationV3(
(layer1): Linear(in_features=2, out_features=10, bias=True)
(layer2): Linear(in_features=10, out_features=10, bias=True)
(layer3): Linear(in_features=10, out_features=1, bias=True)
(relu): ReLU()
)
# 损失函数
loss_fn = nn.BCEWithLogitsLoss()# 优化函数
optimizer = optim.SGD(params=model_3.parameters(),lr=0.1)
print((model_3(X_train).squeeze()).shape)
print(y_train.shape)
torch.Size([800])
torch.Size([800])
# 设置随机种子
torch.manual_seed(42)
torch.cuda.manual_seed(42)# 设置训练周期
epochs = 1000for epoch in range(epochs):# 训练阶段model_3.train()y_logits = model_3(X_train).squeeze()y_pred = torch.round(torch.sigmoid(y_logits))loss = loss_fn(y_logits, y_train)acc = accuracy_fn(y_true=y_train,y_pred=y_pred)optimizer.zero_grad()loss.backward()optimizer.step()# 测试阶段model_3.eval()with torch.inference_mode():test_logits = model_3(X_test).squeeze()test_pred = torch.round(torch.sigmoid(test_logits))test_loss = loss_fn(test_logits,y_test)test_acc = accuracy_fn(y_true=y_test,y_pred=test_pred)# 打印输出if epoch % 100 == 0:print(f"Epoch:{epoch} | Train Loss:{loss:.4f} | Train Accuracy:{acc:.2f}% | Test Loss:{test_loss:.4f} | Test Accuracy:{test_acc:.2f}%")
Epoch:0 | Train Loss:0.6929 | Train Accuracy:50.00% | Test Loss:0.6932 | Test Accuracy:50.00%
Epoch:100 | Train Loss:0.6912 | Train Accuracy:52.88% | Test Loss:0.6910 | Test Accuracy:52.50%
Epoch:200 | Train Loss:0.6898 | Train Accuracy:53.37% | Test Loss:0.6894 | Test Accuracy:55.00%
Epoch:300 | Train Loss:0.6879 | Train Accuracy:53.00% | Test Loss:0.6872 | Test Accuracy:56.00%
Epoch:400 | Train Loss:0.6852 | Train Accuracy:52.75% | Test Loss:0.6841 | Test Accuracy:56.50%
Epoch:500 | Train Loss:0.6810 | Train Accuracy:52.75% | Test Loss:0.6794 | Test Accuracy:56.50%
Epoch:600 | Train Loss:0.6751 | Train Accuracy:54.50% | Test Loss:0.6729 | Test Accuracy:56.00%
Epoch:700 | Train Loss:0.6666 | Train Accuracy:58.38% | Test Loss:0.6632 | Test Accuracy:59.00%
Epoch:800 | Train Loss:0.6516 | Train Accuracy:64.00% | Test Loss:0.6476 | Test Accuracy:67.50%
Epoch:900 | Train Loss:0.6236 | Train Accuracy:74.00% | Test Loss:0.6215 | Test Accuracy:79.00%
绘制图像浅看一下
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title("Train")
plot_decision_boundary(model_3, X_train, y_train)
plt.subplot(1, 2, 2)
plt.title("Test")
plot_decision_boundary(model_3, X_test, y_test)
哇塞,这个图尊嘟很不错,是不是成功的画圈圈啦,证明什么,证明我们的激活函数是有作用的,我们可以参考上面的思路调整模型的性能哈,接下来,我们就会把这些过程完整的整合在一起啦!
5 复现非线性激活函数
我们之前实验了如何将激活函数加入我们的模型中来给非线性激活函数建模.
# 创建一个简单的 tensor
A = torch.arange(-10, 10, 1, dtype=torch.float)
A
tensor([-10., -9., -8., -7., -6., -5., -4., -3., -2., -1., 0., 1.,
2., 3., 4., 5., 6., 7., 8., 9.])
plt.plot(A)
接下来让我看看ReLU是如何影响它的
5.1 ReLU
def relu(x):return torch.maximum(torch.tensor(0),x)relu(A)
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 2., 3., 4., 5., 6., 7.,
8., 9.])
所有的负数都变成0了
plt.plot(relu(A))
5.2 sigmoid
def sigmoid(x):return 1 / (1 + torch.exp(-x))sigmoid(A)
tensor([4.5398e-05, 1.2339e-04, 3.3535e-04, 9.1105e-04, 2.4726e-03, 6.6929e-03,
1.7986e-02, 4.7426e-02, 1.1920e-01, 2.6894e-01, 5.0000e-01, 7.3106e-01,
8.8080e-01, 9.5257e-01, 9.8201e-01, 9.9331e-01, 9.9753e-01, 9.9909e-01,
9.9966e-01, 9.9988e-01])
plt.plot(sigmoid(A))
ok,今天很顺利的完成了我的学习task!
BB啊**,今天好好吃饭了嘛~晚餐的牛蛙很好吃,方便面也嘎嘎nice!吃撑啦。
BB,如果我的文档对您有帮助的话,记得给俺点个赞呐!比心心
靴靴,谢谢~