## RMT-PPAD: Real-time Multi-task Learning for Panoptic Perception in Autonomous Driving This repository is the official PyTorch implementation of the paper "RMT-PPAD: Real-time Multi-task Learning for Panoptic Perception in Autonomous Driving". > [**RMT-PPAD: Real-time Multi-task Learning for Panoptic Perception in Autonomous Driving**](https://arxiv.org/abs/2508.06529) > > by [Jiayuan Wang](https://scholar.google.ca/citations?user=1z6x5_UAAAAJ&hl=zh-CN&oi=ao), [Q. M. Jonathan Wu](https://scholar.google.com/citations?user=BJSAsE8AAAAJ&hl=zh-CN) :email:, [Katsuya Suto](https://scholar.google.com/citations?user=x3oJXHwAAAAJ&hl=ja), and [Ning Zhang](https://scholar.google.ca/citations?hl=zh-CN&user=ZcYihtoAAAAJ) > > (:email:) corresponding author. --- ### The Illustration of RMT-PPAD ![RMT-PPAD](pictures/constructure.jpg) ### Contributions * We design a real-time transformer-based multi-task model (RMT-PPAD) without bells and whistles that jointly addresses object detection, drivable area segmentation, and lane line segmentation in a single network. * We propose a lightweight GCA module, which extracts task-specific features, retains shared representations, and adaptively fuses them to alleviate negative transfer between tasks. * We design an adaptive segmentation decoder that learns task-specific weights for multi-scale features automatically. This eliminates the need for manually designed task-specific structures while balancing fine details and global context. * We identify the inconsistency between the lane line label widths used for training and testing in previous works. For a fair and true reflection of the model’s lane line segmentation performance, we propose a simple yet effective method to dilate the test label widths to the same as the train dataset. * We conduct extensive experiments and ablation studies on the BDD100K dataset and real-world driving scenarios to validate the effectiveness of RMT-PPAD, which achieves SOTA performance across all tasks compared to open-source MTL models for panoptic driving perception. ### Results

Quantitative results comparison of RMT-PPAD and open-source MTL models on BDD100K

Model FPS Params (M) Object Detection Drivable Area Lane Line
Recall (%) mAP50 (%) mIoU (%) IoU (%) ACC (%)
YOLOP64.57.988.576.489.044.079.8
HybridNet17.212.893.577.291.052.082.7
YOLOPX27.532.993.783.390.952.179.1
A-YOLOM(n)52.94.485.378.090.545.677.2
A-YOLOM(s)52.713.686.981.191.049.780.7
RMT-PPAD32.634.395.484.992.656.884.7
#### Ablation study on MTL and GCA | Methods | Recall (%) | mAP50 (%) | mIoU (%) | IoU (%) | ACC (%) | |----------------------------|------------|-----------|----------|---------|---------| | Object only | 92.1 | 77.5 | – | – | – | | Drivable area only | – | – | 91.0 | – | – | | Lane line only | – | – | – | 53.2 | 85.3 | | Segmentation only | – | – | 91.3 | 53.3 | 85.4 | | vanilla MTL | 92.4 | 76.9 | 91.0 | 52.4 | 83.6 | | MTL with GCA (RMT-PPAD) | 92.1 | 78.3 | 91.3 | 52.7 | 84.1 |

Ablation Studies for segmentation performance at different confidence thresholds on toy and BDD100K.

mIoU for drivable area segmentation; IoU and ACC for lane line segmentation.

Threshold Toy BDD100K
mIoU (%) IoU (%) ACC (%) mIoU (%) IoU (%) ACC (%)
0.4091.348.888.992.653.789.4
0.4591.349.288.792.654.089.1
0.5091.149.688.492.454.388.9
0.5590.950.088.292.154.688.7
0.6090.450.387.991.755.088.4
0.6589.850.687.591.055.288.1
0.7089.051.087.290.355.587.7
0.7588.151.486.789.555.987.3
0.8087.151.886.288.556.386.8
0.8585.952.385.487.456.686.0
0.9084.252.784.185.956.884.7
0.9580.952.181.083.455.881.5
**Notes**: - The works we have used for reference include `YOLOP`([paper](https://link.springer.com/article/10.1007/s11633-022-1339-y),[code](https://github.com/hustvl/YOLOP)), `HybridNets`([paper](https://arxiv.org/abs/2203.09035),[code](https://github.com/datvuthanh/HybridNets)), `YOLOPX`([paper](https://www.sciencedirect.com/science/article/pii/S003132032300849X),[code](https://github.com/jiaoZ7688/YOLOPX)), `A-YOLOM`([paper](https://ieeexplore.ieee.org/document/10509552),[code]([https://github.com/ultralytics/ultralytics](https://github.com/JiayuanWang-JW/YOLOv8-multi-task))). Thanks for their wonderful works. --- ### Visualization #### Real Road ![Real Rold](pictures/real_world.png) --- ### Requirement This codebase has been developed with [**Python==3.8.19**](https://www.python.org/) with [**PyTorch==2.4.1**](https://pytorch.org/get-started/locally/). ```setup cd RMT-PPAD conda env create -f environment.yml conda activate RMTPPAD cd ultralytics ``` ### Data preparation and Pre-trained model ### Note: Since we extended the label size for lane line testing, please use our provided dataset to reproduce the results reported in the paper. Further details are described in the paper. #### Download - Download the images from [images](https://bdd-data.berkeley.edu/). - Pre-trained model: [RMT-PPAD](https://uwin365-my.sharepoint.com/:u:/g/personal/wang621_uwindsor_ca/EVvXPuqxXdRAkIuAVdth14gBYKuDJ6XqlA2ppRHsmeQN_w?e=hKcXJX). - Download the annotations of detection from [labels](https://uwin365-my.sharepoint.com/:u:/g/personal/wang621_uwindsor_ca/EV2FyiQg0llNpBL2F5hnEi0BwfEFTP3jckw7adfLSXPzrQ?e=jSaTOO). - Download the annotations of lane line segmentation and drivable area segmentation from [mask](https://uwin365-my.sharepoint.com/:u:/g/personal/wang621_uwindsor_ca/EXrUtDWQ5vlAgzaGopIC3foBZXbs5JNNJRgvR4XotO2cgg?e=CVLOHg). We recommend the dataset directory structure to be the following: ``` ├─dataset root │ ├─images │ │ ├─train2017 │ │ ├─val2017 │ ├─labels │ │ ├─train2017 │ │ ├─val2017 │ ├─mask │ │ ├─lane │ │ │ ├─train2017 │ │ │ ├─val2017 │ │ ├─drivable │ │ │ ├─train2017 │ │ │ ├─val2017 ``` Update the your dataset path in the `./ultralytics/datasets/bdd-multi.yaml`. ### Training You can set the training configuration in the `./ultralytics/yolo/cfg/default.yaml`. ``` python train.py ``` You can change the setting in train.py ### Evaluation You can set the evaluation configuration in the `./ultralytics/yolo/cfg/default.yaml` ``` python test.py ``` You can change the setting in test.py ### Prediction ``` python predict.py ``` You can change the setting in predict.py ## Citation If you find our paper and code useful for your research, please consider giving a star :star: and citation :pencil: : ```BibTeX @ARTICLE{2025arXiv250806529W, author = {{Wang}, Jiayuan and {Wu}, Q.~M. Jonathan and {Suto}, Katsuya and {Zhang}, Ning}, title = {RMT-PPAD: Real-time Multi-task Learning for Panoptic Perception in Autonomous Driving}, journal = {arXiv e-prints}, keywords = {Computer Vision and Pattern Recognition, Machine Learning}, year = 2025, month = aug, eid = {arXiv:2508.06529}, pages = {arXiv:2508.06529}, archivePrefix = {arXiv}, eprint = {2508.06529} } ```