🎯 Training Status: - Current Epoch: 2/10 (13.3% complete) - Segmentation Dice: 0.9594 - Detection IoU: 0.5742 - Training stable with 8 GPUs 🔧 Technical Achievements: - ✅ RMT-PPAD Transformer segmentation decoder integrated - ✅ Task-specific GCA architecture optimized - ✅ Multi-scale feature fusion (180×180, 360×360, 600×600) - ✅ Adaptive scale weight learning implemented - ✅ BEVFusion multi-task framework enhanced 📊 Performance Highlights: - Divider segmentation: 0.9793 Dice (excellent) - Pedestrian crossing: 0.9812 Dice (excellent) - Stop line: 0.9812 Dice (excellent) - Carpark area: 0.9802 Dice (excellent) - Walkway: 0.9401 Dice (good) - Drivable area: 0.8959 Dice (good) 🛠️ Code Changes Included: - Enhanced BEVFusion model (bevfusion.py) - RMT-PPAD integration modules (rmtppad_integration.py) - Transformer segmentation head (enhanced_transformer.py) - GCA module optimizations (gca.py) - Configuration updates (Phase 4B configs) - Training scripts and automation tools - Comprehensive documentation and analysis reports 📅 Snapshot Date: Fri Nov 14 09:06:09 UTC 2025 📍 Environment: Docker container 🎯 Phase: RMT-PPAD Integration Complete |
||
|---|---|---|
| .. | ||
| 3D标注详细指南.md | ||
| 8卡训练快速参考.md | ||
| BEVFUSION_ARCHITECTURE.md | ||
| BEVFUSION_VERSIONS_COMPARISON.md | ||
| BEVFormer_vs_BEVFusion分割技术对比.md | ||
| BEVFusion_LSS方案对比与建议_最终版.md | ||
| BEVFusion内存占用分析_20251101.md | ||
| BEVFusion完整项目路线图.md | ||
| BEVFusion实车部署完整计划.md | ||
| BEVFusion技术分析.md | ||
| BEVFusion项目总览_20251031.md | ||
| BEVFusion项目计划.md | ||
| BEV分辨率提升方案分析.md | ||
| Backbone到BEV多尺度架构分析.md | ||
| CHECKPOINT_MISMATCH_EXPLANATION.md | ||
| CUSTOM_SENSOR_MIGRATION_GUIDE.md | ||
| DOCS_CLEANUP_PLAN.md | ||
| DOCS_INDEX.md | ||
| ENVIRONMENT_CHANGE_DETECTED.md | ||
| ENVIRONMENT_FIX_RECORD.md | ||
| ENVIRONMENT_ISSUE_RECORD.md | ||
| EPOCH23_创建完成总结.md | ||
| EPOCH23_快速启动指南.md | ||
| EPOCH23_文档索引.md | ||
| EPOCH23_训练中的评估结果.md | ||
| EPOCH23_评估与部署完整计划.md | ||
| EVAL_DEPLOYMENT_ANALYSIS.md | ||
| Epoch8-11_Loss分析与Phase4启动建议.md | ||
| GPU_OPTIMIZATION_ANALYSIS.md | ||
| GeneralizedLSSFPN详解.md | ||
| INFERENCE_GUIDE.md | ||
| LSS模块方案专业建议.md | ||
| MAPTR_CODE_ANALYSIS.md | ||
| MAPTR_INTEGRATION_PLAN.md | ||
| MULTITASK_GUIDE.md | ||
| MapTR代码研究报告.md | ||
| MapTR集成实战指南.md | ||
| NEW_DOCKER_EVAL_GUIDE.md | ||
| ORIN_DEPLOYMENT_PLAN.md | ||
| PHASE3_EPOCH23_BASELINE_PERFORMANCE.md | ||
| PHASE4A_ANALYSIS.md | ||
| PHASE4A_GPU_MEMORY_ISSUE.md | ||
| PHASE4A_QUICK_START.md | ||
| PHASE4A_STAGE1_LAUNCHED_SUCCESS.md | ||
| PHASE4A_STAGE1_PROGRESS_20251111.md | ||
| PHASE4A_STATUS_AND_ENVIRONMENT.md | ||
| PHASE5_RESTART_WORKERS0.md | ||
| PREPARATION_CHECKLIST.md | ||
| PRETRAINED_MODELS_INFO.md | ||
| PROGRESSIVE_ENHANCEMENT_PLAN.md | ||
| PROJECT_MASTER_PLAN.md | ||
| PROJECT_PROGRESS_REPORT_20251030.md | ||
| PROJECT_STATUS_FULL_REPORT_20251030.md | ||
| PROJECT_STATUS_UPDATE_20251030.md | ||
| PROJECT_SUMMARY_20251030_FINAL.md | ||
| Phase4A_Stage1_8GPU配置_20251101.md | ||
| Phase4A_Stage1_训练进展_20251101.md | ||
| Phase4A_模型结构分析.md | ||
| QUICK_REFERENCE_CARD.md | ||
| README.md | ||
| README_转换为Word.md | ||
| RESTART_AND_LAUNCH_PHASE4A.md | ||
| SEGMENTATION_DIMENSIONS_ANALYSIS.md | ||
| SEGMENTATION_HEAD_ARCHITECTURE_COMPARISON.md | ||
| TRAINING_PROGRESS_UPDATE_20251021.md | ||
| TRAINING_STATUS_REPORT_20251030_1515.md | ||
| TRAINING_TIME_ANALYSIS.md | ||
| TRANSFER_LEARNING_GUIDE.md | ||
| UPDATED_PLAN_WITH_EVAL.md | ||
| VISUALIZATION_GUIDE.md | ||
| nuScenes数据格式与实车标注指南.md | ||
| 全感知网络快速启动指南.md | ||
| 剪枝工具使用指南.md | ||
| 多机多卡训练配置指南.md | ||
| 并行任务总结_20251030.md | ||
| 并行任务计划_20251030.md | ||
| 方案C立即实施评估报告.md | ||
| 模型优化_快速开始.md | ||
| 模型优化启动计划.md | ||
| 模型分析结果与优化方案.md | ||
| 自动驾驶全感知网络扩展方案.md | ||
| 训练失败根因分析_20251031.md | ||
| 训练异常停止报告_20251031.md | ||
| 训练总结_一页纸版本.md | ||
| 训练重启成功报告_20251031.md | ||
| 项目状态总览_20251030.md | ||
| 项目进展与问题解决总结_20251030.md | ||
| 项目进度分析与准备清单.md | ||
README.md
BEVFusion
website | paper | video
News
- (2024/5) BEVFusion is integrated into NVIDIA DeepStream for sensor fusion.
- (2023/5) NVIDIA provides a TensorRT deployment solution of BEVFusion, achieving 25 FPS on Jetson Orin.
- (2023/4) BEVFusion ranks first on Argoverse 3D object detection leaderboard among all solutions.
- (2023/1) BEVFusion is integrated into MMDetection3D.
- (2023/1) BEVFusion is accepted to ICRA 2023!
- (2022/8) BEVFusion ranks first on Waymo 3D object detection leaderboard among all solutions.
- (2022/6) BEVFusion ranks first on nuScenes 3D object detection leaderboard among all solutions.
- (2022/6) BEVFusion ranks first on nuScenes 3D object detection leaderboard among all solutions.
Abstract
Multi-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the semantic density of camera features, hindering the effectiveness of such methods, especially for semantic-oriented tasks (such as 3D scene segmentation). In this paper, we break this deeply-rooted convention with BEVFusion, an efficient and generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. To achieve this, we diagnose and lift key efficiency bottlenecks in the view transformation with optimized BEV pooling, reducing latency by more than 40x. BEVFusion is fundamentally task-agnostic and seamlessly supports different 3D perception tasks with almost no architectural changes. It establishes the new state of the art on the nuScenes benchmark, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower computation cost.
Results
3D Object Detection (on Waymo test)
| Model | mAP-L1 | mAPH-L1 | mAP-L2 | mAPH-L2 |
|---|---|---|---|---|
| BEVFusion | 82.72 | 81.35 | 77.65 | 76.33 |
| BEVFusion-TTA | 86.04 | 84.76 | 81.22 | 79.97 |
Here, BEVFusion only uses a single model without any test time augmentation. BEVFusion-TTA uses single model with test-time augmentation and no model ensembling is applied.
3D Object Detection (on nuScenes test)
| Model | Modality | mAP | NDS |
|---|---|---|---|
| BEVFusion-e | C+L | 74.99 | 76.09 |
| BEVFusion | C+L | 70.23 | 72.88 |
| BEVFusion-base* | C+L | 71.72 | 73.83 |
*: We scaled up MACs of the model to match the computation cost of concurrent work.
3D Object Detection (on nuScenes validation)
| Model | Modality | mAP | NDS | Checkpoint |
|---|---|---|---|---|
| BEVFusion | C+L | 68.52 | 71.38 | Link |
| Camera-Only Baseline | C | 35.56 | 41.21 | Link |
| LiDAR-Only Baseline | L | 64.68 | 69.28 | Link |
Note: The camera-only object detection baseline is a variant of BEVDet-Tiny with a much heavier view transformer and other differences in hyperparameters. Thanks to our efficient BEV pooling operator, this model runs fast and has higher mAP than BEVDet-Tiny under the same input resolution. Please refer to BEVDet repo for the original BEVDet-Tiny implementation. The LiDAR-only baseline is TransFusion-L.
BEV Map Segmentation (on nuScenes validation)
| Model | Modality | mIoU | Checkpoint |
|---|---|---|---|
| BEVFusion | C+L | 62.95 | Link |
| Camera-Only Baseline | C | 57.09 | Link |
| LiDAR-Only Baseline | L | 48.56 | Link |
Usage
Prerequisites
The code is built with following libraries:
- Python >= 3.8, <3.9
- OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)
- Pillow = 8.4.0 (see here)
- PyTorch >= 1.9, <= 1.10.2
- tqdm
- torchpack
- mmcv = 1.4.0
- mmdetection = 2.20.0
- nuscenes-dev-kit
After installing these dependencies, please run this command to install the codebase:
python setup.py develop
We also provide a Dockerfile to ease environment setup. To get started with docker, please make sure that nvidia-docker is installed on your machine. After that, please execute the following command to build the docker image:
cd docker && docker build . -t bevfusion
We can then run the docker with the following command:
nvidia-docker run -it -v `pwd`/../data:/dataset --shm-size 16g bevfusion /bin/bash
sudo podman run -it --rm
--device /dev/nvidia0
--device /dev/nvidiactl
--device /dev/nvidia-uvm
-v $(pwd)/data:/dataset
--shm-size 16g
bevfusion
/bin/bash
We recommend the users to run data preparation (instructions are available in the next section) outside the docker if possible. Note that the dataset directory should be an absolute path. Within the docker, please run the following command to clone our repo and install custom CUDA extensions:
cd home && git clone https://github.com/mit-han-lab/bevfusion && cd bevfusion
python setup.py develop
You can then create a symbolic link data to the /dataset directory in the docker.
Data Preparation
nuScenes
Please follow the instructions from here to download and preprocess the nuScenes dataset. Please remember to download both detection dataset and the map extension (for BEV map segmentation). After data preparation, you will be able to see the following directory structure (as is indicated in mmdetection3d):
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweeps
│ │ ├── v1.0-test
| | ├── v1.0-trainval
│ │ ├── nuscenes_database
│ │ ├── nuscenes_infos_train.pkl
│ │ ├── nuscenes_infos_val.pkl
│ │ ├── nuscenes_infos_test.pkl
│ │ ├── nuscenes_dbinfos_train.pkl
Evaluation
We also provide instructions for evaluating our pretrained models. Please download the checkpoints using the following script:
./tools/download_pretrained.sh
Then, you will be able to run:
torchpack dist-run -np [number of gpus] python tools/test.py [config file path] pretrained/[checkpoint name].pth --eval [evaluation type]
For example, if you want to evaluate the detection variant of BEVFusion, you can try:
torchpack dist-run -np 8 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained/bevfusion-det.pth --eval bbox
While for the segmentation variant of BEVFusion, this command will be helpful:
torchpack dist-run -np 8 python tools/test.py configs/nuscenes/seg/fusion-bev256d2-lss.yaml pretrained/bevfusion-seg.pth --eval map
Training
We provide instructions to reproduce our results on nuScenes.
For example, if you want to train the camera-only variant for object detection, please run:
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
For camera-only BEV segmentation model, please run:
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/camera-bev256d2.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
For LiDAR-only detector, please run:
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/lidar/voxelnet_0p075.yaml
For LiDAR-only BEV segmentation model, please run:
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/lidar-centerpoint-bev128.yaml
For BEVFusion detection model, please run:
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth
For BEVFusion segmentation model, please run:
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/fusion-bev256d2-lss.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
Note: please run tools/test.py separately after training to get the final evaluation metrics.
Deployment on TensorRT
CUDA-BEVFusion: Best practice for TensorRT, which provides INT8 acceleration solutions and achieves 25fps on ORIN.
FAQs
Q: Can we directly use the info files prepared by mmdetection3d?
A: We recommend re-generating the info files using this codebase since we forked mmdetection3d before their coordinate system refactoring.
Acknowledgements
BEVFusion is based on mmdetection3d. It is also greatly inspired by the following outstanding contributions to the open-source community: LSS, BEVDet, TransFusion, CenterPoint, MVP, FUTR3D, CVT and DETR3D.
Please also check out related papers in the camera-only 3D perception community such as BEVDet4D, BEVerse, BEVFormer, M2BEV, PETR and PETRv2, which might be interesting future extensions to BEVFusion.
Citation
If BEVFusion is useful or relevant to your research, please kindly recognize our contributions by citing our paper:
@inproceedings{liu2022bevfusion,
title={BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation},
author={Liu, Zhijian and Tang, Haotian and Amini, Alexander and Yang, Xingyu and Mao, Huizi and Rus, Daniela and Han, Song},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2023}
}
