实验二 深度学习与应用:人体关键点检测
1、 实验目的
- 了解人体关键点检测基础流程
- 熟悉YOLOV7-pose模型结构
- 掌握 YOLOv7-pose 模型的训练、Fine-tuning 以及推理的能力
- 掌握YOLOV7-pose模型对实际问题的应用能力,了解如何在特定的场景和任务中应用该模型
2、实验环境
[镜像详情]
虚拟机数量:1个(需GPU >=4GB)
虚拟机信息:
- 操作系统:Ubuntu20.04
- 代码位置:/home/zkpk/experiment/YOLOV7-POSE
- MS COCO 2017数据集存储位置:/home/zkpk/experiment/YOLOV7-POSE/images
(数据集下载地址:https://cocodataset.org/#download)
提供tiny版测试数据集,位于:./data/coco 128 - 已安装软件:python版本:python 3.9,显卡驱动,cuda版本:cuda11.3 cudnn 版本:8.4.1,torch==1.12.1+cu113,torchvision= 0.13.1+cu113
- 根据requirements.txt,合理配置python环境
3、实验内容
- 准备MS COCO 2017 数据集 或者 tiny 版数据集:coco-128 (./data/coco128)
- 根据训练参数,训练YOLOV7-pose模型
- 测试YOLOV7-pose模型
- 输入单张图片调用训练后的模型推理检测图片中人体关键点
- 设计输入离线视频,实时对视频进行人体关键点检测
4、实验关键点
- 数据集索引文件位置必须为数据集配置文件(coco_kpts.yaml)中指定位置;
- 数据集存储位置必须为数据集索引文件中指定的路径一致,如下图所示
图 1- 训练过程中出现 OOM错误时, 需将–batch-size参数设置为较小的数值(一般为2的次幂)
5、实验效果图
人体关键点检测
6、实验步骤
- 6.1 根据上文提供的下载地址下载MS COCO 2017数据集,存储位置:/home/zkpk/experiment/YOLOV7-POSE/images
(数据集下载地址:https://cocodataset.org/#download)
或者使用(data\coco128)路径下tiny版数据集,测试训练、推理过程; - 6.2 打开命令行窗口进入当前代码工程目录下
cd /home/zkpk/experiment/YOLOV7-POSE
- 6.3 训练YOLOV7-POSE模型
# 使用coco128 tiny版数据集在 GPU 训练上训练模型
python --weights weights/yolov7-w6-person.pt --cfg cfg/yolov7-w6-pose.yaml --data data/coco_kpts_128.yaml --hyp data/hyp.pose.yaml --batch-size 1 --img-size 960 --device "0" --kpt-label # 使用coco128 tiny版数据集在 CPU 训练上训练模型
python --weights weights/yolov7-w6-person.pt --cfg cfg/yolov7-w6-pose.yaml --data data/coco_kpts_128.yaml --hyp data/hyp.pose.yaml --batch-size 1 --img-size 960 --device "cpu" --kpt-label # 使用ms coco版数据集在 CPU 训练上训练模型
python --weights weights/yolov7-w6-person.pt --cfg cfg/yolov7-w6-pose.yaml --data data/coco_kpts.yaml --hyp data/hyp.pose.yaml --batch-size 1 --img-size 960 --device "cpu" --kpt-label # 使用ms coco128 版数据集在 GPU 训练上训练模型
python --weights weights/yolov7-w6-person.pt --cfg cfg/yolov7-w6-pose.yaml --data data/coco_kpts.yaml --hyp data/hyp.pose.yaml --batch-size 1 --img-size 960 --device "0" --kpt-label
运行日志输出如下:
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, kpt=0.1, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)from n params module arguments 0 -1 1 0 models.common.ReOrg [] 1 -1 1 7040 models.common.Conv [12, 64, 3, 1] 2 -1 1 73984 models.common.Conv [64, 128, 3, 2] 3 -1 1 8320 models.common.Conv [128, 64, 1, 1] 4 -2 1 8320 models.common.Conv [128, 64, 1, 1] 5 -1 1 36992 models.common.Conv [64, 64, 3, 1] 6 -1 1 36992 models.common.Conv [64, 64, 3, 1] 7 -1 1 36992 models.common.Conv [64, 64, 3, 1] 8 -1 1 36992 models.common.Conv [64, 64, 3, 1] 9 [-1, -3, -5, -6] 1 0 models.common.Concat [1] 10 -1 1 33024 models.common.Conv [256, 128, 1, 1] 11 -1 1 295424 models.common.Conv [128, 256, 3, 2] 12 -1 1 33024 models.common.Conv [256, 128, 1, 1] 13 -2 1 33024 models.common.Conv [256, 128, 1, 1] 14 -1 1 147712 models.common.Conv [128, 128, 3, 1] 15 -1 1 147712 models.common.Conv [128, 128, 3, 1] 16 -1 1 147712 models.common.Conv [128, 128, 3, 1] 17 -1 1 147712 models.common.Conv [128, 128, 3, 1] 18 [-1, -3, -5, -6] 1 0 models.common.Concat [1] 19 -1 1 131584 models.common.Conv [512, 256, 1, 1] 20 -1 1 1180672 models.common.Conv [256, 512, 3, 2] 21 -1 1 131584 models.common.Conv [512, 256, 1, 1] 22 -2 1 131584 models.common.Conv [512, 256, 1, 1] 23 -1 1 590336 models.common.Conv [256, 256, 3, 1] 24 -1 1 590336 models.common.Conv [256, 256, 3, 1] 25 -1 1 590336 models.common.Conv [256, 256, 3, 1] 26 -1 1 590336 models.common.Conv [256, 256, 3, 1] 27 [-1, -3, -5, -6] 1 0 models.common.Concat [1] 28 -1 1 525312 models.common.Conv [1024, 512, 1, 1] 29 -1 1 3540480 models.common.Conv [512, 768, 3, 2] 30 -1 1 295680 models.common.Conv [768, 384, 1, 1] 31 -2 1 295680 models.common.Conv [768, 384, 1, 1] 32 -1 1 1327872 models.common.Conv [384, 384, 3, 1] 33 -1 1 1327872 models.common.Conv [384, 384, 3, 1] 34 -1 1 1327872 models.common.Conv [384, 384, 3, 1] 35 -1 1 1327872 models.common.Conv [384, 384, 3, 1] 36 [-1, -3, -5, -6] 1 0 models.common.Concat [1] 37 -1 1 1181184 models.common.Conv [1536, 768, 1, 1] 38 -1 1 7079936 models.common.Conv [768, 1024, 3, 2] 39 -1 1 525312 models.common.Conv [1024, 512, 1, 1] 40 -2 1 525312 models.common.Conv [1024, 512, 1, 1] 41 -1 1 2360320 models.common.Conv [512, 512, 3, 1] 42 -1 1 2360320 models.common.Conv [512, 512, 3, 1] 43 -1 1 2360320 models.common.Conv [512, 512, 3, 1] 44 -1 1 2360320 models.common.Conv [512, 512, 3, 1] 45 [-1, -3, -5, -6] 1 0 models.common.Concat [1] 46 -1 1 2099200 models.common.Conv [2048, 1024, 1, 1] 47 -1 1 7609344 models.common.SPPCSPC [1024, 512, 1] 48 -1 1 197376 models.common.Conv [512, 384, 1, 1] 49 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 50 37 1 295680 models.common.Conv [768, 384, 1, 1] 51 [-1, -2] 1 0 models.common.Concat [1] 52 -1 1 295680 models.common.Conv [768, 384, 1, 1] 53 -2 1 295680 models.common.Conv [768, 384, 1, 1] 54 -1 1 663936 models.common.Conv [384, 192, 3, 1] 55 -1 1 332160 models.common.Conv [192, 192, 3, 1] 56 -1 1 332160 models.common.Conv [192, 192, 3, 1] 57 -1 1 332160 models.common.Conv [192, 192, 3, 1] 58[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1] 59 -1 1 590592 models.common.Conv [1536, 384, 1, 1] 60 -1 1 98816 models.common.Conv [384, 256, 1, 1] 61 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 62 28 1 131584 models.common.Conv [512, 256, 1, 1] 63 [-1, -2] 1 0 models.common.Concat [1] 64 -1 1 131584 models.common.Conv [512, 256, 1, 1] 65 -2 1 131584 models.common.Conv [512, 256, 1, 1] 66 -1 1 295168 models.common.Conv [256, 128, 3, 1] 67 -1 1 147712 models.common.Conv [128, 128, 3, 1] 68 -1 1 147712 models.common.Conv [128, 128, 3, 1] 69 -1 1 147712 models.common.Conv [128, 128, 3, 1] 70[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1] 71 -1 1 262656 models.common.Conv [1024, 256, 1, 1] 72 -1 1 33024 models.common.Conv [256, 128, 1, 1] 73 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 74 19 1 33024 models.common.Conv [256, 128, 1, 1] 75 [-1, -2] 1 0 models.common.Concat [1] 76 -1 1 33024 models.common.Conv [256, 128, 1, 1] 77 -2 1 33024 models.common.Conv [256, 128, 1, 1] 78 -1 1 73856 models.common.Conv [128, 64, 3, 1] 79 -1 1 36992 models.common.Conv [64, 64, 3, 1] 80 -1 1 36992 models.common.Conv [64, 64, 3, 1] 81 -1 1 36992 models.common.Conv [64, 64, 3, 1] 82[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1] 83 -1 1 65792 models.common.Conv [512, 128, 1, 1] 84 -1 1 295424 models.common.Conv [128, 256, 3, 2] 85 [-1, 71] 1 0 models.common.Concat [1] 86 -1 1 131584 models.common.Conv [512, 256, 1, 1] 87 -2 1 131584 models.common.Conv [512, 256, 1, 1] 88 -1 1 295168 models.common.Conv [256, 128, 3, 1] 89 -1 1 147712 models.common.Conv [128, 128, 3, 1] 90 -1 1 147712 models.common.Conv [128, 128, 3, 1] 91 -1 1 147712 models.common.Conv [128, 128, 3, 1] 92[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1] 93 -1 1 262656 models.common.Conv [1024, 256, 1, 1] 94 -1 1 885504 models.common.Conv [256, 384, 3, 2] 95 [-1, 59] 1 0 models.common.Concat [1] 96 -1 1 295680 models.common.Conv [768, 384, 1, 1] 97 -2 1 295680 models.common.Conv [768, 384, 1, 1] 98 -1 1 663936 models.common.Conv [384, 192, 3, 1] 99 -1 1 332160 models.common.Conv [192, 192, 3, 1]
100 -1 1 332160 models.common.Conv [192, 192, 3, 1]
101 -1 1 332160 models.common.Conv [192, 192, 3, 1]
102[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1]
103 -1 1 590592 models.common.Conv [1536, 384, 1, 1]
104 -1 1 1770496 models.common.Conv [384, 512, 3, 2]
105 [-1, 47] 1 0 models.common.Concat [1]
106 -1 1 525312 models.common.Conv [1024, 512, 1, 1]
107 -2 1 525312 models.common.Conv [1024, 512, 1, 1]
108 -1 1 1180160 models.common.Conv [512, 256, 3, 1]
109 -1 1 590336 models.common.Conv [256, 256, 3, 1]
110 -1 1 590336 models.common.Conv [256, 256, 3, 1]
111 -1 1 590336 models.common.Conv [256, 256, 3, 1]
112[-1, -2, -3, -4, -5, -6] 1 0 models.common.Concat [1]
113 -1 1 1049600 models.common.Conv [2048, 512, 1, 1]
114 83 1 295424 models.common.Conv [128, 256, 3, 1]
115 93 1 1180672 models.common.Conv [256, 512, 3, 1]
116 103 1 2655744 models.common.Conv [384, 768, 3, 1]
117 113 1 4720640 models.common.Conv [512, 1024, 3, 1]
118[114, 115, 116, 117] 1 10466036 models.yolo.IKeypoint [1, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], 17, [256, 512, 768, 1024]]
D:\anaconda3\envs\yolo\lib\site-packages\torch\functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:2895.)return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Model Summary: 641 layers, 80238452 parameters, 80238452 gradients, 102.2 GFLOPSTransferred 634/908 items from weights/yolov7-w6-person.pt
Scaled weight_decay = 0.0005
Optimizer groups: 155 .bias, 155 conv.weight, 155 other
train: Scanning 'data\coco128\labels\train2017.cache' images and labels... 16 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 2/2 [00:00<?, ?it/s]
val: Scanning 'data\coco128\labels\train2017.cache' images and labels... 16 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 2/2 [00:00<?, ?it/s]
Plotting labels... autoanchor: Analyzing anchors... anchors/target = 6.17, Best Possible Recall (BPR) = 1.0000
Image sizes 960 train, 960 test
Using 2 dataloader workers
Logging results to runs\train\yolov7-w6-pose13
Starting training for 300 epochs...Epoch gpu_mem box obj cls kpt kptv total labels img_size0/299 4.14G 0.08823 1.94 0 0.3494 0.008104 2.386 19 960: 100%|██████████| 8/8 [00:21<00:00, 2.64s/it]Class Images Labels P R mAP@.5 mAP@.5:.95: 100%|██████████| 4/4 [00:06<00:00, 1.53s/it]all 16 41 0.75 0.585 0.606 0.334Epoch gpu_mem box obj cls kpt kptv total labels img_size1/299 4.14G 0.08576 0.5416 0 0.3474 0.008164 0.983 8 960: 88%|████████▊ | 7/8 [00:07<00:01, 1.01s/it]
6.4 测试训练模型
在shell 窗口输入测试指令
python test.py --data data/coco_kpts_128.yaml --img 960 --conf 0.001 --iou 0.65 --weights yolov7-w6-pose.pt --kpt-label
使用6.3中训练后的模型测试效果,将–weights 模型路径更换为训练模型保存的路径,模型存储路径为:runs\train\yolov7-w6-poseXX(XX为每次实验的次数)
6.5 在单张图片中,实现模型推理,输出人体关键点检测模型
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
weigths = torch.load('yolov7-w6-pose.pt', map_location=device)
model = weigths['model']
_ = model.float().eval()if torch.cuda.is_available():model.half().to(device)image = cv2.imread('./person.jpg') # 测试图片的路径
image = letterbox(image, 960, stride=64, auto=True)[0]
image_ = image.copy()
image = transforms.ToTensor()(image)
image = torch.tensor(np.array([image.numpy()]))
运行指令如下:
python keypoints.py
运行效果如下:
6.5 在视频中 实现实时模型推理,输出人体关键点检测模型
python keypoint_video.py
可以在keypoint_video.py中修改输入的数据源,选自自己的视频输入:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
weigths = torch.load('yolov7-w6-pose.pt')
model = weigths['model']
model = model.half().to(device)
_ = model.eval()cap = cv2.VideoCapture('2.mp4')# 输入视频路径
if (cap.isOpened() == False):print('open failed.')
效果如下:
7、思考题
- 考虑人体关键点检测中,模型结构还有哪些改进点
- 思考怎么将yoloV7-Pose模型应用到手势姿势识别中
- 思考如何调节模型参数和训练参数提升模型的效果指标
8、 实验报告
请按照实验报告的格式要求撰写实验报告,需要源码私信我哈。