您的位置:首页 > 游戏 > 手游 > 微信网页版下载_阳江58同城招聘网最新招聘_网络营销具有哪些特点_torrentkitty搜索引擎

微信网页版下载_阳江58同城招聘网最新招聘_网络营销具有哪些特点_torrentkitty搜索引擎

2024/12/26 21:21:39 来源:https://blog.csdn.net/Sherlily/article/details/144542974  浏览:    关键词:微信网页版下载_阳江58同城招聘网最新招聘_网络营销具有哪些特点_torrentkitty搜索引擎
微信网页版下载_阳江58同城招聘网最新招聘_网络营销具有哪些特点_torrentkitty搜索引擎

论文网址:FFCA-YOLO for Small Object Detection in Remote Sensing Images | IEEE Journals & Magazine | IEEE Xplore

论文代码:GitHub - yemu1138178251/FFCA-YOLO

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstact

2.2. Introduction

2.3. Related Works

2.3.1. Applications of YOLO in Remote Sensing

2.3.2. Feature Enhancement and Fusion Methods of Small Object Detection

2.3.3. Global Context Feature Representation

2.3.4. Lightweight Model Frameworks

2.4. Proposed Method

2.4.1. Overview

2.4.2. Feature Enhancement Module (FEM)

2.4.3. Feature Fusion Module (FFM)

2.4.4. Spatial Context Aware Module (SCAM)

2.4.5. Lite-FFCA-YOLO (L-FFCA-YOLO)

2.5. Experimental Results

2.5.1. Experimental Dataset Description

2.5.2. Model Training and Evaluation Metrics

2.5.3. Comparisons With Previous Methods

2.5.4. Ablation Experimental Result

2.5.5. Robustness Experiment

2.5.6. Lightweight Comparison Experiment

2.6. Conclusion

3. Reference


1. 心得

(1)看视觉的论文真是一种,闲情逸致,简单明了的图片和易懂的公式,让人....真是...放松

(2)作者,非常的诚实,把借鉴的所有模型全说出来了

2. 论文逐段精读

2.1. Abstact

        ①They proposed feature enhancement, fusion and context aware YOLO (FFCA-YOLO) with 3 novel modules to optimize the computing costs

        ②3 datasets: a) 2 of RS: VEDAI and AI-TOD, b) one self-biult USOD

arduous  adj.艰苦的;艰难的

2.2. Introduction

        ①Interested small objects are usually less than 32 × 32 pixels

        ②They aim to design models for real time on board detection

aliasing  n.混叠;别名使用;混淆现象

reconnaissance  n.侦察

2.3. Related Works

2.3.1. Applications of YOLO in Remote Sensing

        ①Introducing the pros and cons of one stage and two stage detection

        ②One stage is suitable for on board detection and they thus list some YOLO based models

2.3.2. Feature Enhancement and Fusion Methods of Small Object Detection

        ①Introducing some feature enhancement methods

2.3.3. Global Context Feature Representation

        ①Introduced global context feature extraction method

2.3.4. Lightweight Model Frameworks

        ①Common methods: prune excessive parameters or employ lightweigt conv

2.4. Proposed Method

2.4.1. Overview

        ①Benchmark: YOLOv5 since lightweight

        ②Overall framework(图中第一列到第二列线如果根据代码来说似乎是画错了的,是CSP接FEM?):

2.4.2. Feature Enhancement Module (FEM)

        ①Schematic of FEM:

        ②Function:

\begin{aligned} W_{1} & =f_\mathrm{conv}^{3\times3}\left[f_\mathrm{conv}^{1\times1}\left(F\right)\right] \\ W_{2} & =f_{\mathrm{diconv}}^{3\times3}\left\{f_{\mathrm{conv}}^{3\times1}\left\{f_{\mathrm{conv}}^{1\times3}\left[f_{\mathrm{conv}}^{1\times1}\left(F\right)\right]\right\}\right\} \\ W_{3} & =f_{\mathrm{diconv}}^{3\times3}\left\{f_{\mathrm{conv}}^{1\times3}\left\{f_{\mathrm{conv}}^{3\times1}\left[f_{\mathrm{conv}}^{1\times1}\left(F\right)\right]\right\}\right\} \\ \mathrm{Y} & =\mathrm{Cat}\left(W_1,W_2,W_3\right)\oplus f_\mathrm{conv}^{1\times1}\left(F\right) \end{aligned}

where the superscript denotes conv kernel and the subscript denotes conv type, \text{Cat}\left ( \cdot \right ) denotes concatenation, \oplus denotes elementwise addition, F denotes feature map

2.4.3. Feature Fusion Module (FFM)

        ①Structure of FFM:

where the input X_2 \in \mathbb{R}^{160 \times 160}X_3 \in \mathbb{R}^{80 \times 80}X_4 \in \mathbb{R}^{40 \times 40}\text{CRC} fuse two maps with the same shape together(上面用Y作为FEM的输出,这里进来又是X了,还是挺,不好的一个操作,希望大家不要效仿。可以理解X是特征金字塔某某层但是感觉画的总图又有问题)

        ②Equations of FFM:

\begin{aligned} & X_2^{\prime}=\mathrm{CSP}\left\{\mathrm{CRC}\left[f_{\mathrm{up}}^{2\uparrow}\left(\mathrm{CBS}(X_{3}^{\prime})\right),X_{2}\right]\right\} \\ & X_3^{\prime\prime}=\mathrm{CSP}\left\{\mathrm{CRC}\left[\mathrm{CBS}(X_3^{\prime}),X_3,\mathrm{CBS}(X_2^{\prime},\mathrm{stride}=2\right)\right]\} \\ & X_{4}^{\prime\prime}=\mathrm{CSP}\left\{\mathrm{CRC}\left[X_{4}^{\prime},\mathrm{CBS}(X_{3}^{\prime\prime},\mathrm{stride}=2)\right]\right\} \end{aligned}

where f_{\mathrm{up}}^{2\uparrow} denotes upsampling, \text{CBS} denotes 3 \times 3 conv including batch normalization and SiLU

        ③They provided 3 strategies for reweighting channels:

\begin{aligned} & \mathrm{Output}=\mathrm{Attention}(X)\cdot X \\ & \mathrm{Output}=\sum_j\frac{\omega_j}{\varepsilon+\sum_m\omega_m}\cdot x_j \\ & \mathrm{Output}=\sum_i\sum_j\frac{\omega_i}{\varepsilon+\sum_k\omega_k}\cdot\frac{\omega_j}{\varepsilon+\sum_{m_i}\omega_{m_i}}\cdot x_j \end{aligned}

where \mathrm{Attention}\left ( \cdot \right ) denotes the channel attention  mechanism, \omega _i denotes the trainable weight in the i-th feature map, \omega _j denotes the trainable weight in the j-th channel, m_i denotes the total number of channels in the i-th feature map, m denotes the total number of channels after concatenation, \varepsilon =0.0001. FFM choose the second strategy.

2.4.4. Spatial Context Aware Module (SCAM)

        ①Framework of SCAM and other:

        ②The pixelwise spatial context:

\begin{gathered} Q_i^{j}=P_i^j+a_i^j\sum_{j=1}^{N_i}\left[\frac{\exp(\omega_{qk}P_i^j)}{\sum_{n=1}^{N_i}\exp(\omega_{qk}P_i^n)}\cdot\omega_vP_i^j\right] \\ a_i^j=\frac{\exp\left(\left[\arg(P_i);\max(P_i)\right]P_i^j\right)}{\sum_{n=1}^{N_i}\exp\left(\left[\arg(P_i);\max(P_i)\right]P_i^n\right)}\cdot\omega_v \end{gathered}

where P_i^j and Q_i^j denotes the input and output of the j-th pixel in the i-th level feature map, N_i denotes the total number of pixels, \omega_{qk} and \omega_{v}  are the linear transform matrices for projecting the feature maps, \mathrm{avg}(\cdot) and \mathrm{max}(\cdot) are GAP and GMP

2.4.5. Lite-FFCA-YOLO (L-FFCA-YOLO)

        ①Frequent memory redundancy access causes speed decreasement of DWConv

        ②Structure of L-FFCA-YOLO:

        ③Parameters of FCAA-YOLO and L-FFCA_YOLO:

2.5. Experimental Results

        ①Size of small object: 32 × 32

        ②Benchmark: YOLOv5m, due to the balance between speed and accuracy

2.5.1. Experimental Dataset Description

(1)VEDAI

        ①Pixels: about 16000*16000 from the same altitude

        ②Resolution per pixel: 12.5*12.5 cm

        ③Modality: RGB

        ④Data split: official, except for class which instances less than 50

(2)Ai-TOD

        ①Average size of object: 12.8 pixel

        ②Total image: 28036

        ③Object instances: 700621 with 8 classes

        ④Dara split: 11214 for tr, 2804 for val, 14018 for test

(3)Unicorn small object dataset (USOD)

        ①Built based on UNICORN 2008 with visible light data only

        ②Spatial resolution: 0.4m

        ③Manully filtering, segmenting and adding annotations:

(a) original annotation, (b) manual annotation, (c) manual annotation, (d) manual annotation

        ④Images: 3000

        ⑤Vehicle instance: 43378

        ⑥Data split: tr:test = 7:3

        ⑦Proportion of size of objects:

        ⑧Data distribution of USOD:

photoelectric  adj.光电的

2.5.2. Model Training and Evaluation Metrics

        ①Optimizer: Stochastic gradient descent (SGD)

        ②Learning rate: 0.01

        ③Momentum: 0.937

        ④Weight decay: 0.0005

        ⑤Batch size: 32

        ⑥Loss: 0.5 Normalized Wasserstein distance (NWD), 0.5 CIOU loss

        ⑦Distance between bounding box: Wasserstein distance

2.5.3. Comparisons With Previous Methods

        ①Visualized detection performance of FFCA-YOLO on (a) USOD, (b) VEDAI, (c) AI-TOD:

        ②Comparison table on VEDAI:

        ③Comparison table on AI-TOD:

        ④Comparison table on USOD:

        ⑤YOLOv5m, TPH-YOLO and FFCA-YOLO in low illumination and shadow occlusion scenes:

2.5.4. Ablation Experimental Result

        ①Module ablation in USOD:

        ②How FEM and SCAM affect the feature map:

        ③Comparison of FEM module in USOD:

        ④SCAM and other blocks at the same module:

2.5.5. Robustness Experiment

        ①Simulated degradation images in USOD:

where w is blurring factor,\sigma ^2 denotes variance of gaussian noise, r denotes amplitude factor of the stripe, A is atmospheric light parameters

        ②Robustness experiments of FFCA-YOLO and YOLOv5m in USOD:

2.5.6. Lightweight Comparison Experiment

        ①L-FFCA-YOLO compared with others in USOD:

2.6. Conclusion

        ①Limiatations: a) optimization of speed and memory, b) target space-based RS further

3. Reference

Zhang, Y. et al. (2024) FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing, 62. doi: 10.1109/TGRS.2024.3363057

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com