硬件参数比对
在深度学习中,单卡性能一般用图像或自然语言数据集进行测试,测试内容是训练和推理的耗时 t ,计算公式如下:
其中:
-
batchsize表示批次大小,与内存容量直接相关
-
v为计算速度,表示每单位时间能够处理的样本数,通常与Floating Point Performance (TFLOPS)直接相关,与数据存取速度(内存带宽、频率)、计算效率间接相关
实际训练比对
如果你还是担心参数虚标,或者是显卡其他参数对深度学习的影响,那么实际训练就是一个可行的方案,这里需要注意的是,当训练没达到瓶颈时,显卡的性能未必能够完全发挥,比如
小型
使用开源的 MNIST 手写数字数据集,代码计算训练时间
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
import time# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = torch.device("cuda")
batch_size = 64
learning_rate = 0.001
num_epochs = 5transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,), (0.5,))
])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)class SimpleNN(nn.Module):def __init__(self):super(SimpleNN, self).__init__()self.fc1 = nn.Linear(28 * 28, 128)self.fc2 = nn.Linear(128, 10)def forward(self, x):x = x.view(-1, 28 * 28)x = torch.relu(self.fc1(x))x = self.fc2(x)return xmodel = SimpleNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)start_time = time.time()
for epoch in range(num_epochs):for batch_idx, (data, target) in enumerate(train_loader):# 将数据和标签移动到 GPUdata, target = data.to(device), target.to(device)optimizer.zero_grad()output = model(data)loss = criterion(output, target)loss.backward()optimizer.step()print(f'Epoch {epoch + 1}/{num_epochs}, Loss: {loss.item():.4f}')end_time = time.time()
print(f'Training time: {end_time - start_time:.2f} seconds')