## RMT-PPAD: Real-time Multi-task Learning for Panoptic Perception in Autonomous Driving This repository is the official PyTorch implementation of the paper "RMT-PPAD: Real-time Multi-task Learning for Panoptic Perception in Autonomous Driving". > [**RMT-PPAD: Real-time Multi-task Learning for Panoptic Perception in Autonomous Driving**](https://arxiv.org/abs/2508.06529) > > by [Jiayuan Wang](https://scholar.google.ca/citations?user=1z6x5_UAAAAJ&hl=zh-CN&oi=ao), [Q. M. Jonathan Wu](https://scholar.google.com/citations?user=BJSAsE8AAAAJ&hl=zh-CN)^:email:, [Katsuya Suto](https://scholar.google.com/citations?user=x3oJXHwAAAAJ&hl=ja), and [Ning Zhang](https://scholar.google.ca/citations?hl=zh-CN&user=ZcYihtoAAAAJ) > > (^:email:) corresponding author. --- ### The Illustration of RMT-PPAD ![RMT-PPAD](pictures/constructure.jpg) ### Contributions * We design a real-time transformer-based multi-task model (RMT-PPAD) without bells and whistles that jointly addresses object detection, drivable area segmentation, and lane line segmentation in a single network. * We propose a lightweight GCA module, which extracts task-specific features, retains shared representations, and adaptively fuses them to alleviate negative transfer between tasks. * We design an adaptive segmentation decoder that learns task-specific weights for multi-scale features automatically. This eliminates the need for manually designed task-specific structures while balancing fine details and global context. * We identify the inconsistency between the lane line label widths used for training and testing in previous works. For a fair and true reflection of the model’s lane line segmentation performance, we propose a simple yet effective method to dilate the test label widths to the same as the train dataset. * We conduct extensive experiments and ablation studies on the BDD100K dataset and real-world driving scenarios to validate the effectiveness of RMT-PPAD, which achieves SOTA performance across all tasks compared to open-source MTL models for panoptic driving perception. ### Results

Quantitative results comparison of RMT-PPAD and open-source MTL models on BDD100K

Model	FPS	Params (M)	Object Detection		Drivable Area	Lane Line
Model	FPS	Params (M)	Recall (%)	mAP50 (%)	mIoU (%)	IoU (%)	ACC (%)
YOLOP	64.5	7.9	88.5	76.4	89.0	44.0	79.8
HybridNet	17.2	12.8	93.5	77.2	91.0	52.0	82.7
YOLOPX	27.5	32.9	93.7	83.3	90.9	52.1	79.1
A-YOLOM(n)	52.9	4.4	85.3	78.0	90.5	45.6	77.2
A-YOLOM(s)	52.7	13.6	86.9	81.1	91.0	49.7	80.7
RMT-PPAD	32.6	34.3	95.4	84.9	92.6	56.8	84.7

#### Ablation study on MTL and GCA | Methods | Recall (%) | mAP50 (%) | mIoU (%) | IoU (%) | ACC (%) | |----------------------------|------------|-----------|----------|---------|---------| | Object only | 92.1 | 77.5 | – | – | – | | Drivable area only | – | – | 91.0 | – | – | | Lane line only | – | – | – | 53.2 | 85.3 | | Segmentation only | – | – | 91.3 | 53.3 | 85.4 | | vanilla MTL | 92.4 | 76.9 | 91.0 | 52.4 | 83.6 | | MTL with GCA (RMT-PPAD) | 92.1 | 78.3 | 91.3 | 52.7 | 84.1 |

Ablation Studies for segmentation performance at different confidence thresholds on toy and BDD100K.

mIoU for drivable area segmentation; IoU and ACC for lane line segmentation.

Threshold	Toy			BDD100K
Threshold	mIoU (%)	IoU (%)	ACC (%)	mIoU (%)	IoU (%)	ACC (%)
0.40	91.3	48.8	88.9	92.6	53.7	89.4
0.45	91.3	49.2	88.7	92.6	54.0	89.1
0.50	91.1	49.6	88.4	92.4	54.3	88.9
0.55	90.9	50.0	88.2	92.1	54.6	88.7
0.60	90.4	50.3	87.9	91.7	55.0	88.4
0.65	89.8	50.6	87.5	91.0	55.2	88.1
0.70	89.0	51.0	87.2	90.3	55.5	87.7
0.75	88.1	51.4	86.7	89.5	55.9	87.3
0.80	87.1	51.8	86.2	88.5	56.3	86.8
0.85	85.9	52.3	85.4	87.4	56.6	86.0
0.90	84.2	52.7	84.1	85.9	56.8	84.7
0.95	80.9	52.1	81.0	83.4	55.8	81.5

**Notes**: - The works we have used for reference include `YOLOP`([paper](https://link.springer.com/article/10.1007/s11633-022-1339-y),[code](https://github.com/hustvl/YOLOP)), `HybridNets`([paper](https://arxiv.org/abs/2203.09035),[code](https://github.com/datvuthanh/HybridNets)), `YOLOPX`([paper](https://www.sciencedirect.com/science/article/pii/S003132032300849X),[code](https://github.com/jiaoZ7688/YOLOPX)), `A-YOLOM`([paper](https://ieeexplore.ieee.org/document/10509552),[code]([https://github.com/ultralytics/ultralytics](https://github.com/JiayuanWang-JW/YOLOv8-multi-task))). Thanks for their wonderful works. --- ### Visualization #### Real Road ![Real Rold](pictures/real_world.png) --- ### Requirement This codebase has been developed with [**Python==3.8.19**](https://www.python.org/) with [**PyTorch==2.4.1**](https://pytorch.org/get-started/locally/). ```setup cd RMT-PPAD conda env create -f environment.yml conda activate RMTPPAD cd ultralytics ``` ### Data preparation and Pre-trained model ### Note: Since we extended the label size for lane line testing, please use our provided dataset to reproduce the results reported in the paper. Further details are described in the paper. #### Download - Download the images from [images](https://bdd-data.berkeley.edu/). - Pre-trained model: [RMT-PPAD](https://uwin365-my.sharepoint.com/:u:/g/personal/wang621_uwindsor_ca/EVvXPuqxXdRAkIuAVdth14gBYKuDJ6XqlA2ppRHsmeQN_w?e=hKcXJX). - Download the annotations of detection from [labels](https://uwin365-my.sharepoint.com/:u:/g/personal/wang621_uwindsor_ca/EV2FyiQg0llNpBL2F5hnEi0BwfEFTP3jckw7adfLSXPzrQ?e=jSaTOO). - Download the annotations of lane line segmentation and drivable area segmentation from [mask](https://uwin365-my.sharepoint.com/:u:/g/personal/wang621_uwindsor_ca/EXrUtDWQ5vlAgzaGopIC3foBZXbs5JNNJRgvR4XotO2cgg?e=CVLOHg). We recommend the dataset directory structure to be the following: ``` ├─dataset root │ ├─images │ │ ├─train2017 │ │ ├─val2017 │ ├─labels │ │ ├─train2017 │ │ ├─val2017 │ ├─mask │ │ ├─lane │ │ │ ├─train2017 │ │ │ ├─val2017 │ │ ├─drivable │ │ │ ├─train2017 │ │ │ ├─val2017 ``` Update the your dataset path in the `./ultralytics/datasets/bdd-multi.yaml`. ### Training You can set the training configuration in the `./ultralytics/yolo/cfg/default.yaml`. ``` python train.py ``` You can change the setting in train.py ### Evaluation You can set the evaluation configuration in the `./ultralytics/yolo/cfg/default.yaml` ``` python test.py ``` You can change the setting in test.py ### Prediction ``` python predict.py ``` You can change the setting in predict.py ## Citation If you find our paper and code useful for your research, please consider giving a star :star: and citation :pencil: : ```BibTeX @ARTICLE{2025arXiv250806529W, author = {{Wang}, Jiayuan and {Wu}, Q.~M. Jonathan and {Suto}, Katsuya and {Zhang}, Ning}, title = {RMT-PPAD: Real-time Multi-task Learning for Panoptic Perception in Autonomous Driving}, journal = {arXiv e-prints}, keywords = {Computer Vision and Pattern Recognition, Machine Learning}, year = 2025, month = aug, eid = {arXiv:2508.06529}, pages = {arXiv:2508.06529}, archivePrefix = {arXiv}, eprint = {2508.06529} } ```