论文网址:FFCA-YOLO for Small Object Detection in Remote Sensing Images | IEEE Journals & Magazine | IEEE Xplore
论文代码:GitHub - yemu1138178251/FFCA-YOLO
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
1. 心得
2. 论文逐段精读
2.1. Abstact
2.2. Introduction
2.3. Related Works
2.3.1. Applications of YOLO in Remote Sensing
2.3.2. Feature Enhancement and Fusion Methods of Small Object Detection
2.3.3. Global Context Feature Representation
2.3.4. Lightweight Model Frameworks
2.4. Proposed Method
2.4.1. Overview
2.4.2. Feature Enhancement Module (FEM)
2.4.3. Feature Fusion Module (FFM)
2.4.4. Spatial Context Aware Module (SCAM)
2.4.5. Lite-FFCA-YOLO (L-FFCA-YOLO)
2.5. Experimental Results
2.5.1. Experimental Dataset Description
2.5.2. Model Training and Evaluation Metrics
2.5.3. Comparisons With Previous Methods
2.5.4. Ablation Experimental Result
2.5.5. Robustness Experiment
2.5.6. Lightweight Comparison Experiment
2.6. Conclusion
3. Reference
1. 心得
(1)看视觉的论文真是一种,闲情逸致,简单明了的图片和易懂的公式,让人....真是...放松
(2)作者,非常的诚实,把借鉴的所有模型全说出来了
2. 论文逐段精读
2.1. Abstact
①They proposed feature enhancement, fusion and context aware YOLO (FFCA-YOLO) with 3 novel modules to optimize the computing costs
②3 datasets: a) 2 of RS: VEDAI and AI-TOD, b) one self-biult USOD
arduous adj.艰苦的;艰难的
2.2. Introduction
①Interested small objects are usually less than 32 × 32 pixels
②They aim to design models for real time on board detection
aliasing n.混叠;别名使用;混淆现象
reconnaissance n.侦察
2.3. Related Works
2.3.1. Applications of YOLO in Remote Sensing
①Introducing the pros and cons of one stage and two stage detection
②One stage is suitable for on board detection and they thus list some YOLO based models
2.3.2. Feature Enhancement and Fusion Methods of Small Object Detection
①Introducing some feature enhancement methods
2.3.3. Global Context Feature Representation
①Introduced global context feature extraction method
2.3.4. Lightweight Model Frameworks
①Common methods: prune excessive parameters or employ lightweigt conv
2.4. Proposed Method
2.4.1. Overview
①Benchmark: YOLOv5 since lightweight
②Overall framework(图中第一列到第二列线如果根据代码来说似乎是画错了的,是CSP接FEM?):
2.4.2. Feature Enhancement Module (FEM)
①Schematic of FEM:
②Function:
where the superscript denotes conv kernel and the subscript denotes conv type, denotes concatenation, denotes elementwise addition, denotes feature map
2.4.3. Feature Fusion Module (FFM)
①Structure of FFM:
where the input , , , fuse two maps with the same shape together(上面用Y作为FEM的输出,这里进来又是X了,还是挺,不好的一个操作,希望大家不要效仿。可以理解X是特征金字塔某某层但是感觉画的总图又有问题)
②Equations of FFM:
where denotes upsampling, denotes conv including batch normalization and SiLU
③They provided 3 strategies for reweighting channels:
where denotes the channel attention mechanism, denotes the trainable weight in the -th feature map, denotes the trainable weight in the -th channel, denotes the total number of channels in the -th feature map, denotes the total number of channels after concatenation, . FFM choose the second strategy.
2.4.4. Spatial Context Aware Module (SCAM)
①Framework of SCAM and other:
②The pixelwise spatial context:
where and denotes the input and output of the -th pixel in the -th level feature map, denotes the total number of pixels, and are the linear transform matrices for projecting the feature maps, and are GAP and GMP
2.4.5. Lite-FFCA-YOLO (L-FFCA-YOLO)
①Frequent memory redundancy access causes speed decreasement of DWConv
②Structure of L-FFCA-YOLO:
③Parameters of FCAA-YOLO and L-FFCA_YOLO:
2.5. Experimental Results
①Size of small object: 32 × 32
②Benchmark: YOLOv5m, due to the balance between speed and accuracy
2.5.1. Experimental Dataset Description
(1)VEDAI
①Pixels: about 16000*16000 from the same altitude
②Resolution per pixel: 12.5*12.5 cm
③Modality: RGB
④Data split: official, except for class which instances less than 50
(2)Ai-TOD
①Average size of object: 12.8 pixel
②Total image: 28036
③Object instances: 700621 with 8 classes
④Dara split: 11214 for tr, 2804 for val, 14018 for test
(3)Unicorn small object dataset (USOD)
①Built based on UNICORN 2008 with visible light data only
②Spatial resolution: 0.4m
③Manully filtering, segmenting and adding annotations:
(a) original annotation, (b) manual annotation, (c) manual annotation, (d) manual annotation
④Images: 3000
⑤Vehicle instance: 43378
⑥Data split: tr:test = 7:3
⑦Proportion of size of objects:
⑧Data distribution of USOD:
photoelectric adj.光电的
2.5.2. Model Training and Evaluation Metrics
①Optimizer: Stochastic gradient descent (SGD)
②Learning rate: 0.01
③Momentum: 0.937
④Weight decay: 0.0005
⑤Batch size: 32
⑥Loss: 0.5 Normalized Wasserstein distance (NWD), 0.5 CIOU loss
⑦Distance between bounding box: Wasserstein distance
2.5.3. Comparisons With Previous Methods
①Visualized detection performance of FFCA-YOLO on (a) USOD, (b) VEDAI, (c) AI-TOD:
②Comparison table on VEDAI:
③Comparison table on AI-TOD:
④Comparison table on USOD:
⑤YOLOv5m, TPH-YOLO and FFCA-YOLO in low illumination and shadow occlusion scenes:
2.5.4. Ablation Experimental Result
①Module ablation in USOD:
②How FEM and SCAM affect the feature map:
③Comparison of FEM module in USOD:
④SCAM and other blocks at the same module:
2.5.5. Robustness Experiment
①Simulated degradation images in USOD:
where is blurring factor, denotes variance of gaussian noise, denotes amplitude factor of the stripe, is atmospheric light parameters
②Robustness experiments of FFCA-YOLO and YOLOv5m in USOD:
2.5.6. Lightweight Comparison Experiment
①L-FFCA-YOLO compared with others in USOD:
2.6. Conclusion
①Limiatations: a) optimization of speed and memory, b) target space-based RS further
3. Reference
Zhang, Y. et al. (2024) FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 62. doi: 10.1109/TGRS.2024.3363057