微信网页版下载_阳江58同城招聘网最新招聘_网络营销具有哪些特点_torrentkitty搜索引擎

论文网址：FFCA-YOLO for Small Object Detection in Remote Sensing Images | IEEE Journals & Magazine | IEEE Xplore

论文代码：GitHub - yemu1138178251/FFCA-YOLO

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

1. 心得

2. 论文逐段精读

2.1. Abstact

2.2. Introduction

2.3. Related Works

2.3.1. Applications of YOLO in Remote Sensing

2.3.2. Feature Enhancement and Fusion Methods of Small Object Detection

2.3.3. Global Context Feature Representation

2.3.4. Lightweight Model Frameworks

2.4. Proposed Method

2.4.1. Overview

2.4.2. Feature Enhancement Module (FEM)

2.4.3. Feature Fusion Module (FFM)

2.4.4. Spatial Context Aware Module (SCAM)

2.4.5. Lite-FFCA-YOLO (L-FFCA-YOLO)

2.5. Experimental Results

2.5.1. Experimental Dataset Description

2.5.2. Model Training and Evaluation Metrics

2.5.3. Comparisons With Previous Methods

2.5.4. Ablation Experimental Result

2.5.5. Robustness Experiment

2.5.6. Lightweight Comparison Experiment

2.6. Conclusion

3. Reference

1. 心得

（1）看视觉的论文真是一种，闲情逸致，简单明了的图片和易懂的公式，让人....真是...放松

（2）作者，非常的诚实，把借鉴的所有模型全说出来了

2. 论文逐段精读

2.1. Abstact

①They proposed feature enhancement, fusion and context aware YOLO (FFCA-YOLO) with 3 novel modules to optimize the computing costs

②3 datasets: a) 2 of RS: VEDAI and AI-TOD, b) one self-biult USOD

arduous adj.艰苦的；艰难的

2.2. Introduction

①Interested small objects are usually less than 32 × 32 pixels

②They aim to design models for real time on board detection

aliasing n.混叠；别名使用；混淆现象

reconnaissance n.侦察

2.3. Related Works

2.3.1. Applications of YOLO in Remote Sensing

①Introducing the pros and cons of one stage and two stage detection

②One stage is suitable for on board detection and they thus list some YOLO based models

2.3.2. Feature Enhancement and Fusion Methods of Small Object Detection

①Introducing some feature enhancement methods

2.3.3. Global Context Feature Representation

①Introduced global context feature extraction method

2.3.4. Lightweight Model Frameworks

①Common methods: prune excessive parameters or employ lightweigt conv

2.4. Proposed Method

2.4.1. Overview

①Benchmark: YOLOv5 since lightweight

②Overall framework（图中第一列到第二列线如果根据代码来说似乎是画错了的，是CSP接FEM？）:

2.4.2. Feature Enhancement Module (FEM)

①Schematic of FEM:

②Function:

$\begin{aligned} W_{1} & =f_\mathrm{conv}^{3\times3}\left[f_\mathrm{conv}^{1\times1}\left(F\right)\right] \\ W_{2} & =f_{\mathrm{diconv}}^{3\times3}\left\{f_{\mathrm{conv}}^{3\times1}\left\{f_{\mathrm{conv}}^{1\times3}\left[f_{\mathrm{conv}}^{1\times1}\left(F\right)\right]\right\}\right\} \\ W_{3} & =f_{\mathrm{diconv}}^{3\times3}\left\{f_{\mathrm{conv}}^{1\times3}\left\{f_{\mathrm{conv}}^{3\times1}\left[f_{\mathrm{conv}}^{1\times1}\left(F\right)\right]\right\}\right\} \\ \mathrm{Y} & =\mathrm{Cat}\left(W_1,W_2,W_3\right)\oplus f_\mathrm{conv}^{1\times1}\left(F\right) \end{aligned}$

where the superscript denotes conv kernel and the subscript denotes conv type, $\text{Cat}\left ( \cdot \right )$ denotes concatenation, $\oplus$ denotes elementwise addition, $F$ denotes feature map

2.4.3. Feature Fusion Module (FFM)

①Structure of FFM:

where the input $X_2 \in \mathbb{R}^{160 \times 160}$ , $X_3 \in \mathbb{R}^{80 \times 80}$ , $X_4 \in \mathbb{R}^{40 \times 40}$ , $\text{CRC}$ fuse two maps with the same shape together（上面用Y作为FEM的输出，这里进来又是X了，还是挺，不好的一个操作，希望大家不要效仿。可以理解X是特征金字塔某某层但是感觉画的总图又有问题）

②Equations of FFM:

$\begin{aligned} & X_2^{\prime}=\mathrm{CSP}\left\{\mathrm{CRC}\left[f_{\mathrm{up}}^{2\uparrow}\left(\mathrm{CBS}(X_{3}^{\prime})\right),X_{2}\right]\right\} \\ & X_3^{\prime\prime}=\mathrm{CSP}\left\{\mathrm{CRC}\left[\mathrm{CBS}(X_3^{\prime}),X_3,\mathrm{CBS}(X_2^{\prime},\mathrm{stride}=2\right)\right]\} \\ & X_{4}^{\prime\prime}=\mathrm{CSP}\left\{\mathrm{CRC}\left[X_{4}^{\prime},\mathrm{CBS}(X_{3}^{\prime\prime},\mathrm{stride}=2)\right]\right\} \end{aligned}$

where $f_{\mathrm{up}}^{2\uparrow}$ denotes upsampling, $\text{CBS}$ denotes $3 \times 3$ conv including batch normalization and SiLU

③They provided 3 strategies for reweighting channels:

$\begin{aligned} & \mathrm{Output}=\mathrm{Attention}(X)\cdot X \\ & \mathrm{Output}=\sum_j\frac{\omega_j}{\varepsilon+\sum_m\omega_m}\cdot x_j \\ & \mathrm{Output}=\sum_i\sum_j\frac{\omega_i}{\varepsilon+\sum_k\omega_k}\cdot\frac{\omega_j}{\varepsilon+\sum_{m_i}\omega_{m_i}}\cdot x_j \end{aligned}$

where $\mathrm{Attention}\left ( \cdot \right )$ denotes the channel attention mechanism, $\omega _i$ denotes the trainable weight in the $i$ -th feature map, $\omega _j$ denotes the trainable weight in the $j$ -th channel, $m_i$ denotes the total number of channels in the $i$ -th feature map, $m$ denotes the total number of channels after concatenation, $\varepsilon =0.0001$ . FFM choose the second strategy.

2.4.4. Spatial Context Aware Module (SCAM)

①Framework of SCAM and other:

②The pixelwise spatial context:

$\begin{gathered} Q_i^{j}=P_i^j+a_i^j\sum_{j=1}^{N_i}\left[\frac{\exp(\omega_{qk}P_i^j)}{\sum_{n=1}^{N_i}\exp(\omega_{qk}P_i^n)}\cdot\omega_vP_i^j\right] \\ a_i^j=\frac{\exp\left(\left[\arg(P_i);\max(P_i)\right]P_i^j\right)}{\sum_{n=1}^{N_i}\exp\left(\left[\arg(P_i);\max(P_i)\right]P_i^n\right)}\cdot\omega_v \end{gathered}$

where $P_i^j$ and $Q_i^j$ denotes the input and output of the $j$ -th pixel in the $i$ -th level feature map, $N_i$ denotes the total number of pixels, $\omega_{qk}$ and $\omega_{v}$ are the linear transform matrices for projecting the feature maps, $\mathrm{avg}(\cdot)$ and $\mathrm{max}(\cdot)$ are GAP and GMP

2.4.5. Lite-FFCA-YOLO (L-FFCA-YOLO)

①Frequent memory redundancy access causes speed decreasement of DWConv

②Structure of L-FFCA-YOLO:

③Parameters of FCAA-YOLO and L-FFCA_YOLO:

2.5. Experimental Results

①Size of small object: 32 × 32

②Benchmark: YOLOv5m, due to the balance between speed and accuracy

2.5.1. Experimental Dataset Description

（1）VEDAI

①Pixels: about 16000*16000 from the same altitude

②Resolution per pixel: 12.5*12.5 cm

③Modality: RGB

④Data split: official, except for class which instances less than 50

（2）Ai-TOD

①Average size of object: 12.8 pixel

②Total image: 28036

③Object instances: 700621 with 8 classes

④Dara split: 11214 for tr, 2804 for val, 14018 for test

（3）Unicorn small object dataset (USOD)

①Built based on UNICORN 2008 with visible light data only

②Spatial resolution: 0.4m

③Manully filtering, segmenting and adding annotations:

(a) original annotation, (b) manual annotation, (c) manual annotation, (d) manual annotation

④Images: 3000

⑤Vehicle instance: 43378

⑥Data split: tr:test = 7:3

⑦Proportion of size of objects:

⑧Data distribution of USOD:

photoelectric adj.光电的

2.5.2. Model Training and Evaluation Metrics

①Optimizer: Stochastic gradient descent (SGD)

②Learning rate: 0.01

③Momentum: 0.937

④Weight decay: 0.0005

⑤Batch size: 32

⑥Loss: 0.5 Normalized Wasserstein distance (NWD), 0.5 CIOU loss

⑦Distance between bounding box: Wasserstein distance

2.5.3. Comparisons With Previous Methods

①Visualized detection performance of FFCA-YOLO on (a) USOD, (b) VEDAI, (c) AI-TOD:

②Comparison table on VEDAI:

③Comparison table on AI-TOD:

④Comparison table on USOD:

⑤YOLOv5m, TPH-YOLO and FFCA-YOLO in low illumination and shadow occlusion scenes:

2.5.4. Ablation Experimental Result

①Module ablation in USOD:

②How FEM and SCAM affect the feature map:

③Comparison of FEM module in USOD:

④SCAM and other blocks at the same module:

2.5.5. Robustness Experiment

①Simulated degradation images in USOD:

where $w$ is blurring factor, $\sigma ^2$ denotes variance of gaussian noise, $r$ denotes amplitude factor of the stripe, $A$ is atmospheric light parameters

②Robustness experiments of FFCA-YOLO and YOLOv5m in USOD:

2.5.6. Lightweight Comparison Experiment

①L-FFCA-YOLO compared with others in USOD:

2.6. Conclusion

①Limiatations: a) optimization of speed and memory, b) target space-based RS further

3. Reference

Zhang, Y. et al. (2024) FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 62. doi: 10.1109/TGRS.2024.3363057

微信网页版下载_阳江58同城招聘网最新招聘_网络营销具有哪些特点_torrentkitty搜索引擎

1. 心得

2. 论文逐段精读

2.1. Abstact

2.2. Introduction

2.3. Related Works

2.3.1. Applications of YOLO in Remote Sensing

2.3.2. Feature Enhancement and Fusion Methods of Small Object Detection

2.3.3. Global Context Feature Representation

2.3.4. Lightweight Model Frameworks

2.4. Proposed Method

2.4.1. Overview

2.4.2. Feature Enhancement Module (FEM)

2.4.3. Feature Fusion Module (FFM)

2.4.4. Spatial Context Aware Module (SCAM)

2.4.5. Lite-FFCA-YOLO (L-FFCA-YOLO)

2.5. Experimental Results

2.5.1. Experimental Dataset Description

2.5.2. Model Training and Evaluation Metrics

2.5.3. Comparisons With Previous Methods

2.5.4. Ablation Experimental Result

2.5.5. Robustness Experiment

2.5.6. Lightweight Comparison Experiment

2.6. Conclusion

3. Reference

最新新闻

热搜词