目录
1.前言
2.代码
3.数据形态【分类用】
4.配置文件
5.训练
6.测试-分析-混淆矩阵等等,测试图片效果等
7.导出onnx
8.onnx推理
9.docker环境简单补充
1.前言
好久没有做图像分类了,于是想用商汤的mmclassification快速搞一波,发现已经没有了。现在是mmpretrain集成。
2.代码
截止到我写文章,我是下载的GITHUB中的mmpretrain,我是main分支,是1.2版本。https://github.com/open-mmlab/mmpretrainhttps://github.com/open-mmlab/mmpretrain 安装环境:
(1)跟着文档来就好
1.2依赖环境 — MMPretrain 1.2.0 文档https://mmpretrain.readthedocs.io/zh-cn/latest/get_started.html 主要是这两步: cd mmpretrain --> pip install -U openmim && mim install -e .
open-mmlab喜欢用mim来装东西,又快,又对。包括mmcv、mmdeploy、mmdet等。
(2)自己搞一个docker,我文章最后做补充文档~
3.数据形态【分类用】
可以看出,data下是训练集和验证集,然后是类名,类名下是各自图片,就这样就行了。
4.配置文件
代码里有个config文件,下面的resnet下面的,resnet50_8xb32_in1k.py抄一个过来做自己的,它里边还有如下一些配置文件:
依次把所有内容抄过来,做一个自己的配置文件。我放在config_me下边,叫my_resnet50_8xb32_in1k.py,最终内容如下边代码:
这里有两点需要注意,一个是去模型库下载预训练权重【读readme找模型库,对应配置文件下载的对应预训练pth】,第二个是dataset_type = 'CustomDataset'这里用自定义就行了,数据形态上边那样就行,不用、不用去改dateset下的imagenet、 coco啥的标签......
CLASS_NUMS = 8 # 你要分类的数量,比如我是8类
BATCH_SIZE = 20
TRAIN_NUM_WORKERS = 8
VAL_NUM_WORKERS = 4
TR_DATA_ROOT = "/xx/data/train" # 训练集
VAL_DATA_ROOT = "/xx/data/val" # 验证集
MAX_EPOCH = 600
MultiStepLR_list = [100, 200, 300] # 学习率递减epoch分批
VAL_INTERVAL = 20 # 多少迭代验证一次
SAVE_INTERVAL = 50 # 多少迭代保存一次模型
LOG_INTERVAL = 100 # 多少迭代/批次打印一次
PRE_CHECKPOINT = "/configs_me/resnet50_8xb32_in1k_20210831-ea4938fc.pth" # 去模型库下载与config文件相对应的预训练模型权重frozen_stagesss = 2 # -1不冻结层,这里选择冻结骨干2层# model settings
model = dict(type='ImageClassifier',backbone=dict(type='ResNet',depth=50,num_stages=4,out_indices=(3, ),frozen_stages=frozen_stagesss, # 冻结主干网的层数style='pytorch'),neck=dict(type='GlobalAveragePooling'),head=dict(type='LinearClsHead',num_classes=CLASS_NUMS,in_channels=2048, # load_from后就该2048 512报错# in_channels=512,loss=dict(type='CrossEntropyLoss', loss_weight=1.0),topk=(1, 5), # 二分类啥的或者不用top5准确率的,用topk=(1, ),))# dataset settings
dataset_type = 'CustomDataset'
data_preprocessor = dict(num_classes=CLASS_NUMS,# RGB format normalization parametersmean=[123.675, 116.28, 103.53],std=[58.395, 57.12, 57.375],# convert image from BGR to RGBto_rgb=True,
)train_pipeline = [dict(type='LoadImageFromFile'),dict(type='RandomResizedCrop', scale=224),dict(type='RandomFlip', prob=0.5, direction='horizontal'),dict(type='PackInputs'),
]test_pipeline = [dict(type='LoadImageFromFile'),dict(type='ResizeEdge', scale=256, edge='short'), # 缩放短边尺寸至 256pxdict(type='CenterCrop', crop_size=224),dict(type='PackInputs'),
]train_dataloader = dict(batch_size=BATCH_SIZE,num_workers=TRAIN_NUM_WORKERS,dataset=dict(type=dataset_type,data_root=TR_DATA_ROOT,# ann_file='meta/train.txt',# split='train',pipeline=train_pipeline),sampler=dict(type='DefaultSampler', shuffle=True), # 默认采样# persistent_workers=True, # 保持进程,缩短每个epoch准备时间
)val_dataloader = dict(batch_size=BATCH_SIZE,num_workers=VAL_NUM_WORKERS,dataset=dict(type=dataset_type,data_root=VAL_DATA_ROOT,# ann_file='meta/test.txt',# split='test',pipeline=test_pipeline),sampler=dict(type='DefaultSampler', shuffle=False),# persistent_workers=True,
)val_evaluator = dict(type='Accuracy', topk=(1, 5)) # 二分类不能用top1和top5
# val_evaluator = dict(type='Accuracy', topk=(1, ))# If you want standard test, please manually configure the test dataset
test_dataloader = val_dataloader
test_evaluator = val_evaluator# optimizer
optim_wrapper = dict(optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001))# learning policy
param_scheduler = dict(type='MultiStepLR', by_epoch=True, milestones=MultiStepLR_list, gamma=0.5)# train, val, test setting
train_cfg = dict(by_epoch=True, max_epochs=MAX_EPOCH, val_interval=VAL_INTERVAL)
val_cfg = dict()
test_cfg = dict()# NOTE: `auto_scale_lr` is for automatically scaling LR,
# based on the actual training batch size.
# auto_scale_lr = dict(base_batch_size=256)
# 通过默认策略自动缩放学习率,此策略适用于总批次大小 256
# 如果你使用不同的总批量大小,比如 512 并启用自动学习率缩放
# 我们将学习率扩大到 2 倍# defaults to use registries in mmpretrain
default_scope = 'mmpretrain'# configure default hooks
default_hooks = dict(# record the time of every iteration.timer=dict(type='IterTimerHook'),# print log every 100 iterations.logger=dict(type='LoggerHook', interval=LOG_INTERVAL),# enable the parameter scheduler.param_scheduler=dict(type='ParamSchedulerHook'),# save checkpoint per epoch.checkpoint=dict(type='CheckpointHook', interval=SAVE_INTERVAL),# set sampler seed in distributed evrionment.sampler_seed=dict(type='DistSamplerSeedHook'),# validation results visualization, set True to enable it.visualization=dict(type='VisualizationHook', enable=False),
)# configure environment
env_cfg = dict(# whether to enable cudnn benchmarkcudnn_benchmark=False,# set multi process parametersmp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),# set distributed parametersdist_cfg=dict(backend='nccl'),
)# set visualizer
vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(type='UniversalVisualizer', vis_backends=vis_backends)# set log level
log_level = 'INFO'# load from which checkpoint
load_from = PRE_CHECKPOINT# whether to resume training from the loaded checkpoint
resume = False# Defaults to use random seed and disable `deterministic`
randomness = dict(seed=None, deterministic=False)
5.训练
把tools文件夹下边的train.py,复制一份【PS.后边都指的复制到项目根目录下】,只改动如下代码,然后python train.py就可以训练了。【注意训练结果权重在你的work-dir指定目录下】
parser.add_argument('--config', default="config_me/my_resnet50_8xb32_in1k.py", help='train config file path')
parser.add_argument('--work-dir', default="my_train_result", help='the dir to save logs and models')
还可以加几句打印类别顺序:【还没试过,待定】
classes = runner.test_loop.dataloader.dataset.metainfo.get('classes')
print("=== 本次训练类别顺序: ======================")
print(classes)
print('=========================================')
6.测试-分析-混淆矩阵等等,测试图片效果等
同理,把tools下的test.py复制,改动如下,可以评估验证集:
parser.add_argument('--config', default="config_me/my_resnet50_8xb32_in1k.py", help='test config file path')parser.add_argument('--checkpoint', default="my_train_result/epoch_200.pth", help='checkpoint file')parser.add_argument('--work-dir', default="test_result", help='the directory to save the file containing evaluation metrics')# parser.add_argument('--out', default="test_result/res_epoch_20.pkl", help='the file to output results.') # 这个是保存为pkl可以
同理, analyze_results.py复制一份出来,改动如下,可以分析模型对测试集的效果:
parser.add_argument('--config', default=default="config_me/my_resnet50_8xb32_in1k.py", help='test config file path')parser.add_argument('--result', default="test_result/res_epoch_20.pkl", help='test result json/pkl file')parser.add_argument('--out-dir', default="test_result/analyze", help='dir to store output files')
同理, confusion_matrix.py复制一份出来,改动如下,可以计算验证集的混淆矩阵:
parser.add_argument('--config', default="config_me/my_resnet50_8xb32_in1k.py", help='test config file path')parser.add_argument('--ckpt_or_result', default="my_train_result/epoch_200.pth",type=str,help='The checkpoint file (.pth) or ''dumpped predictions pickle file (.pkl).')
运行的时候,加上 --show 和--include-values等,显示带数字的混淆矩阵
同理,把demo下边的image_demo.py复制一份,改动如下,可以测试图片推理:
parser.add_argument('--img', default="data/val/3.jpg", help='Image file')parser.add_argument('--model', default="configs_me/my_resnet50_8xb32_in1k.py", help='Model name or config file path')parser.add_argument('--checkpoint', default="xxx/epoch_400.pth", help='Checkpoint file path.')parser.add_argument('--show',action='store_true',help='Whether to show the prediction result in a window.')parser.add_argument('--show-dir',default="test_111111111111111",type=str,help='The directory to save the visualization image.')
7.导出onnx
这里用到mmdeploy, 把mmdeploy,git clone一个到本项目文件夹下,再cd到mmdeploy里,同样运行mim install -e .来安装mmdeploy。或者参考:Get Started — mmdeploy 1.3.1 文档
目前我这里是:1.3.1版本
导出onnx脚本:export_onnx.py
# === mmdeploy方式导出onnx ====================================
from mmdeploy.apis import torch2onnx
from mmdeploy.backend.sdk.export_info import export2SDKimg = '随便一张测试图路径 xxx/xx。jpg'
work_dir = '另存onnx的目录'
save_file = 'epoch_500.onnx'
deploy_cfg = 'mmdeploy/configs/mmpretrain/classification_onnxruntime_static.py'
model_cfg = 'configs_me/my_resnet50_8xb32_in1k.py' # 训练的配置文件
model_checkpoint = 'train_res_1024/epoch_500.pth' # 训练的pth结果
device = 'cpu'# 1. convert model to onnx
torch2onnx(img, work_dir, save_file, deploy_cfg, model_cfg, model_checkpoint, device)# 2. extract pipeline info for sdk use (dump-info)
export2SDK(deploy_cfg, model_cfg, work_dir, pth=model_checkpoint, device=device)
8.onnx推理
(1)mmdeploy推理方式
import os# === 使用mmdeploy推理onnx ===============================
from mmdeploy.apis import inference_model# 类别顺序:训练的时候,test时候,或者混淆矩阵那里可以打印出class顺序
classes = ["class1", "class2", "class3"]
data_paths = 'data/1 (1).png'model_cfg = 'configs_me/my_resnet50_8xb32_in1k.py'
deploy_cfg = 'mmdeploy/configs/mmpretrain/classification_onnxruntime_static.py'
data_paths= 'xxx/1.jpg'
backend_files = ['xxx/rscd_c8_2w_epoch_500.onnx'] # 刚导出的
device = 'cpu'# for img in os.listdir(data_paths):
img_path = data_paths
result = inference_model(model_cfg, deploy_cfg, backend_files, img_path, device)
socres = result[0].pred_score.cpu().numpy().tolist()
labels = result[0].pred_label.cpu().numpy().tolist()label = labels[0]
score = socres[label]
print("图片名:", img_path, "预测类别:", classes[label], "预测分数:", round(score, 4))
(2)onnx-runtime推理方式, 脱离框架【very nice !!!!!!!!!】
里边数据处理是参考 config文件里边图像,比如resize啥的要对。
import os
import onnxruntime
import cv2
import numpy as npdef resize_edge(image, scale=256, edge='short'):"""将图像的短边缩放到指定尺寸,保持宽高比不变"""h, w = image.shape[:2]if edge == 'short':if h < w:scale_ratio = scale / helse:scale_ratio = scale / welse:if h > w:scale_ratio = scale / helse:scale_ratio = scale / wnew_size = (int(w * scale_ratio), int(h * scale_ratio))resized_image = cv2.resize(image, new_size)return resized_imagedef center_crop(image, crop_size=224):"""从图像中心裁剪指定尺寸的区域"""h, w = image.shape[:2]center_x, center_y = w // 2, h // 2half_crop_size = crop_size // 2# 确定中心裁剪区域start_x = max(center_x - half_crop_size, 0)start_y = max(center_y - half_crop_size, 0)cropped_image = image[start_y:start_y + crop_size, start_x:start_x + crop_size]return cropped_imagedef pack_inputs(image):"""将图像转化为 3x224x224 格式并归一化"""# 调整通道顺序,变为3x224x224img_crop = image[:, :, ::-1].transpose(2, 0, 1).astype(np.float32)img_crop[0, :] = (img_crop[0, :] - 123.675) / 58.395img_crop[1, :] = (img_crop[1, :] - 116.28) / 57.12img_crop[2, :] = (img_crop[2, :] - 103.53) / 57.375return img_cropdef img_preprocess(image_path):"""图像预处理,以resnet50配置文件为例:test_pipeline = [dict(type='LoadImageFromFile'),dict(type='ResizeEdge', scale=256, edge='short'), # 缩放短边尺寸至 256pxdict(type='CenterCrop', crop_size=224),dict(type='PackInputs'),]"""image = cv2.imread(image_path)resized_image = resize_edge(image, scale=256, edge='short')cropped_image = center_crop(resized_image, crop_size=224)final_image = pack_inputs(cropped_image)return final_imagedef img_infer(onnx_model, img_path):img_crop = img_preprocess(img_path)input = np.expand_dims(img_crop, axis=0)onnx_session = onnxruntime.InferenceSession(onnx_model, providers=['CPUExecutionProvider'])input_name = []for node in onnx_session.get_inputs():input_name.append(node.name)output_name = []for node in onnx_session.get_outputs():output_name.append(node.name)input_feed = {}for name in input_name:input_feed[name] = inputpred = onnx_session.run(None, input_feed)return pred # 预测结果if __name__ == '__main__':onnx_model = "onnx_model/epoch_500.onnx"classes = ["class1", "class2", "class3"] # 混淆矩阵和测试时候可以打印出来classes_explain = ["第一类", "第二类", "第三类"]# 一张图推理img_path = "data/1 (1).png"res = img_infer(onnx_model, img_path)print("图片名:", img_path, "预测类别:", classes_explain[np.argmax(res)], "预测分数:", round(np.max(res), 4))
补充分类指标:在onnx批量推理后,也可用sklearn评估。如:
from sklearn.metrics import classification_report
import numpy as np
# 假设真实标签和模型预测结果
y_true = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2])
y_pred = np.array([0, 1, 2, 0, 0, 2, 2, 1, 2, 0, 1, 1, 0, 1, 2])
# 生成分类报告
report = classification_report(y_true, y_pred, target_names=['a', 'b', 'c'])
print(report)
# 计算混淆矩阵 cm = confusion_matrix(y_true, y_pred)
# 打印混淆矩阵 print("混淆矩阵:\n", cm)
9.docker环境简单补充
- dockerFile如下: 从阿里源拉一个torch的基础镜像.........
# https://www.modelscope.cn/docs/环境安装 # GPU环境镜像(python3.10) FROM registry.cn-beijing.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.3.0-tf2.16.1-1.15.0 # FROM registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.3.0-tf2.16.1-1.15.0 # FROM registry.us-west-1.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.3.0-tf2.16.1-1.15.0 RUN mkdir /opt/code WORKDIR /opt/code |
- 构建镜像 docker build -t hezy_base_image .
- 创建容器 docker run --name hezy_mmcls -d -p 9528:22 --shm-size=1g hezy_base_image tail -f /dev/null
【-d 表示后台, -p表示端口映射 --shm-size 表示共享内存分配 tail -f /dev/null表示啥也不干】
- docker run还有些参数,可以酌情添加。
- docker exec -it 容器id /bin/bash: 命令可以进到容器
- docker images, docker ps | grep hezy: 查看镜像和容器等等
针对本次mmpretrain环境里边继续操作:
- 容器里边删除所有关于mm的环境【重装】,包括mmcls、openmim、mmdet、mmseg、mmpretrain等;
- 安装mmpretrain:https://mmpretrain.readthedocs.io/zh-cn/latest/get_started.html
- 验证:python demo/image_demo.py demo/demo.JPEG resnet18_8xb32_in1k --device cpu
- 补充:映射ssh等以及如下:
vim /etc/ssh/sshd_config 下边这些设置放开: Port 22 AddressFamily any ListenAddress 0.0.0.0 PermitRootLogin yes PermitEmptyPasswords yes PasswordAuthentication yes #重启ssh service ssh restart # 设置root密码:passwd root 外边就root/root和IP:端口登录了。【其他shel或者pycharm等idea登录用】 |