Remote Sensing for Natural Resources >
Information extraction of roads from remote sensing images using CNN combined with Transformer
Received date: 2023-08-02
Revised date: 2024-05-09
Online published: 2026-06-03
Deep learning-based methods for information extraction of roads from high-resolution remote sensing images face challenges in extracting information about both global context and edge details. This study proposed a cascaded neural network for road segmentation in remote sensing images, allowing both types of information to be simultaneously learned. First, the input feature images were sent to encoders CNN and Transformer. Then, the characteristics learned by both branch encoders were effectively combined using the shuffle attention dual branch fusion (SA-DBF) module, thus achieving the fusion of global and local information. Using the SA-DBF module, the model of the features learned from both branches was established through fine-grained interaction, during which channel and spatial information in the feature images were efficiently extracted and invalid noise was suppressed using multiple attention mechanisms. The proposed network was evaluated using the Massachusetts Road dataset, yielding an overall accuracy rate (OA) of 98.04%, an intersection over union (IoU) of 88.03%, and an F1 score of 65.13%. Compared to that of mainstream methodsU-Net and TransRoadNet, the IoU of the proposed network increased by 2.01 and 1.42 percentage points, respectively. Experimental results indicate that the proposed method outperforms all the methods compared and can effectively improve the accuracy of road segmentation.
Key words: cascaded neural network; Transformer; feature fusion; attention mechanism
QU Haicheng , WANG Ying , LIU Lamei , HAO Ming . Information extraction of roads from remote sensing images using CNN combined with Transformer[J]. Remote Sensing for Natural Resources, 2025 , 37(1) : 38 -45 . DOI: 10.6046/zrzyyg.2023237
表1 不同模块消融实验的对比结果Tab.1 Comparison results of different modules (%) |
| 方法 | OA | F1 | IoU |
|---|---|---|---|
| U-Net | 96.39 | 84.12 | 63.12 |
| Transformer+ U-Net | 97.08 | 85.97 | 64.01 |
| U-Net+SA-DBF | 96.27 | 86.36 | 64.37 |
| Transformer +U-Net+SA-DBF | 98.04 | 88.03 | 65.13 |
表3 Transformer规模对模型的影响结果Tab.3 Influence of Transformer scale on the model (%) |
| Transformer规模 | OA | F1 | IoU |
|---|---|---|---|
| Large | 97.08 | 85.97 | 64.01 |
| Base | 96.82 | 84.87 | 63.85 |
表4 不同模型的实验对比结果Tab.4 Experimental comparison results of different models |
| 方法 | OA/% | F1/% | IoU/% | 时间/s | 参数量/ 106MB |
|---|---|---|---|---|---|
| SegNet | 95.27 | 81.34 | 60.63 | 43.2 | 30.6 |
| DeeplabV3+ | 96.21 | 83.42 | 63.08 | 43.5 | 30.2 |
| U-Net | 96.39 | 84.12 | 63.12 | 42.6 | 25.3 |
| D-LinkNet | 97.32 | 85.98 | 63.29 | 41.5 | 30.9 |
| TransRoadNet | 97.49 | 85.26 | 63.71 | 40.3 | 31.4 |
| CoAtNet | 97.51 | 86.24 | 63.92 | 40.6 | 27.6 |
| 本文方法 | 98.04 | 88.03 | 65.13 | 39.1 | 24.2 |
表5 不同网络的实验对比结果Tab.5 Experimental comparison results of different networks |
| 序号 | 原图 | DeepLabV3+ | U-Net | SegNet | TransRoadNet | D-LinkNet | CoAt | 本文方法 |
|---|---|---|---|---|---|---|---|---|
| 1 | ![]() | |||||||
| 2 | ![]() | |||||||
| 3 | ![]() | |||||||
| 4 | ![]() | |||||||
| 5 | ![]() | |||||||
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
王勇, 曾祥强. 集成注意力机制和扩张卷积的道路提取模型[J]. 中国图象图形学报, 2022, 27(10):3102-3115.
|
| [14] |
吴强强, 王帅, 王彪, 等. 空间信息感知语义分割模型的高分辨率遥感影像道路提取[J]. 遥感学报, 2022, 26(9):1872-1885.
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
/
| 〈 |
|
〉 |