bev-project/README.md

# BEVFusion

### [website](http://bevfusion.mit.edu/) | [paper](https://arxiv.org/abs/2205.13542) | [video](https://www.youtube.com/watch?v=uCAka90si9E)

![demo](assets/demo.gif)

## News

- **(2024/5)** BEVFusion is integrated into NVIDIA [DeepStream](https://developer.nvidia.com/blog/nvidia-deepstream-7-0-milestone-release-for-next-gen-vision-ai-development/) for sensor fusion.
- **(2023/5)** NVIDIA provides a [TensorRT deployment solution](https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/tree/master/CUDA-BEVFusion) of BEVFusion, achieving 25 FPS on Jetson Orin.
- **(2023/4)** BEVFusion ranks first on [Argoverse](https://eval.ai/web/challenges/challenge-page/1710/overview) 3D object detection leaderboard among all solutions.
- **(2023/1)** BEVFusion is integrated into [MMDetection3D](https://github.com/open-mmlab/mmdetection3d/tree/main/projects/BEVFusion).
- **(2023/1)** BEVFusion is accepted to ICRA 2023!
- **(2022/8)** BEVFusion ranks first on [Waymo](https://waymo.com/open/challenges/2020/3d-detection/) 3D object detection leaderboard among all solutions.
- **(2022/6)** BEVFusion ranks first on [nuScenes](https://nuscenes.org/tracking?externalData=all&mapData=all&modalities=Any) 3D object detection leaderboard among all solutions.
- **(2022/6)** BEVFusion ranks first on [nuScenes](https://nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any) 3D object detection leaderboard among all solutions.

## Abstract

Multi-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the semantic density of camera features, hindering the effectiveness of such methods, especially for semantic-oriented tasks (such as 3D scene segmentation). In this paper, we break this deeply-rooted convention with BEVFusion, an efficient and generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. To achieve this, we diagnose and lift key efficiency bottlenecks in the view transformation with optimized BEV pooling, reducing latency by more than **40x**. BEVFusion is fundamentally task-agnostic and seamlessly supports different 3D perception tasks with almost no architectural changes. It establishes the new state of the art on the nuScenes benchmark, achieving **1.3%** higher mAP and NDS on 3D object detection and **13.6%** higher mIoU on BEV map segmentation, with **1.9x** lower computation cost.

## Results

### 3D Object Detection (on Waymo test)

|   Model   | mAP-L1 | mAPH-L1  | mAP-L2  | mAPH-L2  |
| :-------: | :------: | :--: | :--: | :--: |
| [BEVFusion](https://waymo.com/open/challenges/entry/?challenge=DETECTION_3D&challengeId=DETECTION_3D&emailId=f58eed96-8bb3&timestamp=1658347965704580) |    82.72   |  81.35  | 77.65  |  76.33 |
| [BEVFusion-TTA](https://waymo.com/open/challenges/entry/?challenge=DETECTION_3D&challengeId=DETECTION_3D&emailId=94ddc185-d2ce&timestamp=1663562767759105) | 86.04    |  84.76 | 81.22  |  79.97 |

Here, BEVFusion only uses a single model without any test time augmentation. BEVFusion-TTA uses single model with test-time augmentation and no model ensembling is applied. 

### 3D Object Detection (on nuScenes test)

|   Model   | Modality | mAP  | NDS  |
| :-------: | :------: | :--: | :--: |
| BEVFusion-e |   C+L    | 74.99 | 76.09 |
| BEVFusion |   C+L    | 70.23 | 72.88 |
| BEVFusion-base* |   C+L    | 71.72 | 73.83 |

*: We scaled up MACs of the model to match the computation cost of concurrent work.

### 3D Object Detection (on nuScenes validation)

|        Model         | Modality | mAP  | NDS  | Checkpoint  |
| :------------------: | :------: | :--: | :--: | :---------: |
|    [BEVFusion](configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml)       |   C+L    | 68.52 | 71.38 | [Link](https://www.dropbox.com/scl/fi/ulaz9z4wdwtypjhx7xdi3/bevfusion-det.pth?rlkey=ovusfi2rchjub5oafogou255v&dl=1) |
| [Camera-Only Baseline](configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml) |    C     | 35.56 | 41.21 | [Link](https://www.dropbox.com/scl/fi/pxfaz1nc07qa2twlatzkz/camera-only-det.pth?rlkey=f5do81fawie0ssbg9uhrm6p30&dl=1) |
| [LiDAR-Only Baseline](configs/nuscenes/det/transfusion/secfpn/lidar/voxelnet_0p075.yaml)  |    L     | 64.68 | 69.28 | [Link](https://www.dropbox.com/scl/fi/b1zvgrg9ucmv0wtx6pari/lidar-only-det.pth?rlkey=fw73bmdh57jxtudw6osloywah&dl=1) |

*Note*: The camera-only object detection baseline is a variant of BEVDet-Tiny with a much heavier view transformer and other differences in hyperparameters. Thanks to our [efficient BEV pooling](mmdet3d/ops/bev_pool) operator, this model runs fast and has higher mAP than BEVDet-Tiny under the same input resolution. Please refer to [BEVDet repo](https://github.com/HuangJunjie2017/BEVDet) for the original BEVDet-Tiny implementation. The LiDAR-only baseline is TransFusion-L.

### BEV Map Segmentation (on nuScenes validation)

|        Model         | Modality | mIoU | Checkpoint  |
| :------------------: | :------: | :--: | :---------: |
| [BEVFusion](configs/nuscenes/seg/fusion-bev256d2-lss.yaml)       |   C+L    | 62.95 | [Link](https://www.dropbox.com/scl/fi/8lgd1hkod2a15mwry0fvd/bevfusion-seg.pth?rlkey=2tmgw7mcrlwy9qoqeui63tay9&dl=1) |
| [Camera-Only Baseline](configs/nuscenes/seg/camera-bev256d2.yaml) |    C     | 57.09 | [Link](https://www.dropbox.com/scl/fi/cwpcu80n0shmwraegi6z4/camera-only-seg.pth?rlkey=l60kdaz19fq3gwocsjk09e60z&dl=1) |
| [LiDAR-Only Baseline](configs/nuscenes/seg/lidar-centerpoint-bev128.yaml)  |    L     | 48.56 | [Link](https://www.dropbox.com/scl/fi/mi3w6uxvytdre9i42r9k7/lidar-only-seg.pth?rlkey=rve7hx80u3en1gfoi7tjucl72&dl=1) |

## Usage

### Prerequisites

The code is built with following libraries:

- Python >= 3.8, \<3.9
- OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)
- Pillow = 8.4.0 (see [here](https://github.com/mit-han-lab/bevfusion/issues/63))
- [PyTorch](https://github.com/pytorch/pytorch) >= 1.9, \<= 1.10.2
- [tqdm](https://github.com/tqdm/tqdm)
- [torchpack](https://github.com/mit-han-lab/torchpack)
- [mmcv](https://github.com/open-mmlab/mmcv) = 1.4.0
- [mmdetection](http://github.com/open-mmlab/mmdetection) = 2.20.0
- [nuscenes-dev-kit](https://github.com/nutonomy/nuscenes-devkit)

After installing these dependencies, please run this command to install the codebase:

```bash
python setup.py develop
```

We also provide a [Dockerfile](docker/Dockerfile) to ease environment setup. To get started with docker, please make sure that `nvidia-docker` is installed on your machine. After that, please execute the following command to build the docker image:

```bash
cd docker && docker build . -t bevfusion
```

We can then run the docker with the following command:

```bash
nvidia-docker run -it -v `pwd`/../data:/dataset --shm-size 16g bevfusion /bin/bash
```

We recommend the users to run data preparation (instructions are available in the next section) outside the docker if possible. Note that the dataset directory should be an absolute path. Within the docker, please run the following command to clone our repo and install custom CUDA extensions:

```bash
cd home && git clone https://github.com/mit-han-lab/bevfusion && cd bevfusion
python setup.py develop
```

You can then create a symbolic link `data` to the `/dataset` directory in the docker.

### Data Preparation

#### nuScenes

Please follow the instructions from [here](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md) to download and preprocess the nuScenes dataset. Please remember to download both detection dataset and the map extension (for BEV map segmentation). After data preparation, you will be able to see the following directory structure (as is indicated in mmdetection3d):

```
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── nuscenes
│   │   ├── maps
│   │   ├── samples
│   │   ├── sweeps
│   │   ├── v1.0-test
|   |   ├── v1.0-trainval
│   │   ├── nuscenes_database
│   │   ├── nuscenes_infos_train.pkl
│   │   ├── nuscenes_infos_val.pkl
│   │   ├── nuscenes_infos_test.pkl
│   │   ├── nuscenes_dbinfos_train.pkl

```

### Evaluation

We also provide instructions for evaluating our pretrained models. Please download the checkpoints using the following script: 

```bash
./tools/download_pretrained.sh
```

Then, you will be able to run:

```bash
torchpack dist-run -np [number of gpus] python tools/test.py [config file path] pretrained/[checkpoint name].pth --eval [evaluation type]
```

For example, if you want to evaluate the detection variant of BEVFusion, you can try:

```bash
torchpack dist-run -np 8 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained/bevfusion-det.pth --eval bbox
```

While for the segmentation variant of BEVFusion, this command will be helpful:

```bash
torchpack dist-run -np 8 python tools/test.py configs/nuscenes/seg/fusion-bev256d2-lss.yaml pretrained/bevfusion-seg.pth --eval map
```

### Training

We provide instructions to reproduce our results on nuScenes.

For example, if you want to train the camera-only variant for object detection, please run:

```bash
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
```

For camera-only BEV segmentation model, please run:

```bash
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/camera-bev256d2.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
```

For LiDAR-only detector, please run:

```bash
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/lidar/voxelnet_0p075.yaml
```

For LiDAR-only BEV segmentation model, please run:

```bash
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/lidar-centerpoint-bev128.yaml
```

For BEVFusion detection model, please run:
```bash
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth 
```

For BEVFusion segmentation model, please run:
```bash
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/fusion-bev256d2-lss.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
```

Note: please run `tools/test.py` separately after training to get the final evaluation metrics.

## Deployment on TensorRT
[CUDA-BEVFusion](https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/tree/master/CUDA-BEVFusion): Best practice for TensorRT, which provides INT8 acceleration solutions and achieves 25fps on ORIN.

## FAQs

Q: Can we directly use the info files prepared by mmdetection3d?

A: We recommend re-generating the info files using this codebase since we forked mmdetection3d before their [coordinate system refactoring](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/changelog.md).

## Acknowledgements

BEVFusion is based on [mmdetection3d](https://github.com/open-mmlab/mmdetection3d). It is also greatly inspired by the following outstanding contributions to the open-source community: [LSS](https://github.com/nv-tlabs/lift-splat-shoot), [BEVDet](https://github.com/HuangJunjie2017/BEVDet), [TransFusion](https://github.com/XuyangBai/TransFusion), [CenterPoint](https://github.com/tianweiy/CenterPoint), [MVP](https://github.com/tianweiy/MVP), [FUTR3D](https://arxiv.org/abs/2203.10642), [CVT](https://github.com/bradyz/cross_view_transformers) and [DETR3D](https://github.com/WangYueFt/detr3d). 

Please also check out related papers in the camera-only 3D perception community such as [BEVDet4D](https://arxiv.org/abs/2203.17054), [BEVerse](https://arxiv.org/abs/2205.09743), [BEVFormer](https://arxiv.org/abs/2203.17270), [M2BEV](https://arxiv.org/abs/2204.05088), [PETR](https://arxiv.org/abs/2203.05625) and [PETRv2](https://arxiv.org/abs/2206.01256), which might be interesting future extensions to BEVFusion.


## Citation

If BEVFusion is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

```bibtex
@inproceedings{liu2022bevfusion,
  title={BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation},
  author={Liu, Zhijian and Tang, Haotian and Amini, Alexander and Yang, Xingyu and Mao, Huizi and Rus, Daniela and Han, Song},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2023}
}
```
Add README 2022-05-29 04:21:28 +08:00			`# BEVFusion`

			`### [website](http://bevfusion.mit.edu/) \| [paper](https://arxiv.org/abs/2205.13542) \| [video](https://www.youtube.com/watch?v=uCAka90si9E)`

			`![demo](assets/demo.gif)`

			`## News`

Update README.md 2024-07-12 05:09:07 +08:00			`- (2024/5) BEVFusion is integrated into NVIDIA [DeepStream](https://developer.nvidia.com/blog/nvidia-deepstream-7-0-milestone-release-for-next-gen-vision-ai-development/) for sensor fusion.`
Update README.md 2024-07-12 05:07:10 +08:00			`- (2023/5) NVIDIA provides a [TensorRT deployment solution](https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/tree/master/CUDA-BEVFusion) of BEVFusion, achieving 25 FPS on Jetson Orin.`
Update README.md 2024-07-12 04:39:27 +08:00			`- (2023/4) BEVFusion ranks first on [Argoverse](https://eval.ai/web/challenges/challenge-page/1710/overview) 3D object detection leaderboard among all solutions.`
Update README.md 2024-07-12 05:09:07 +08:00			`- (2023/1) BEVFusion is integrated into [MMDetection3D](https://github.com/open-mmlab/mmdetection3d/tree/main/projects/BEVFusion).`
Update README.md 2024-07-12 03:45:10 +08:00			`- (2023/1) BEVFusion is accepted to ICRA 2023!`
			`- (2022/8) BEVFusion ranks first on [Waymo](https://waymo.com/open/challenges/2020/3d-detection/) 3D object detection leaderboard among all solutions.`
Update README.md 2024-07-12 04:39:27 +08:00			`- (2022/6) BEVFusion ranks first on [nuScenes](https://nuscenes.org/tracking?externalData=all&mapData=all&modalities=Any) 3D object detection leaderboard among all solutions.`
Update README.md 2024-07-12 03:45:10 +08:00			`- (2022/6) BEVFusion ranks first on [nuScenes](https://nuscenes.org/object-detection?externalData=all&mapData=all&modalities=Any) 3D object detection leaderboard among all solutions.`
Add README 2022-05-29 04:21:28 +08:00
			`## Abstract`

			Multi-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the semantic density of camera features, hindering the effectiveness of such methods, especially for semantic-oriented tasks (such as 3D scene segmentation). In this paper, we break this deeply-rooted convention with BEVFusion, an efficient and generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. To achieve this, we diagnose and lift key efficiency bottlenecks in the view transformation with optimized BEV pooling, reducing latency by more than 40x. BEVFusion is fundamentally task-agnostic and seamlessly supports different 3D perception tasks with almost no architectural changes. It establishes the new state of the art on the nuScenes benchmark, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower computation cost.

			`## Results`

Update results on nuScenes and Waymo test set (#222) 2022-11-07 01:26:38 +08:00			`### 3D Object Detection (on Waymo test)`

			`\| Model \| mAP-L1 \| mAPH-L1 \| mAP-L2 \| mAPH-L2 \|`
			`\| :-------: \| :------: \| :--: \| :--: \| :--: \|`
			`\| [BEVFusion](https://waymo.com/open/challenges/entry/?challenge=DETECTION_3D&challengeId=DETECTION_3D&emailId=f58eed96-8bb3&timestamp=1658347965704580) \| 82.72 \| 81.35 \| 77.65 \| 76.33 \|`
			`\| [BEVFusion-TTA](https://waymo.com/open/challenges/entry/?challenge=DETECTION_3D&challengeId=DETECTION_3D&emailId=94ddc185-d2ce&timestamp=1663562767759105) \| 86.04 \| 84.76 \| 81.22 \| 79.97 \|`

			`Here, BEVFusion only uses a single model without any test time augmentation. BEVFusion-TTA uses single model with test-time augmentation and no model ensembling is applied.`

Update news and results 2022-06-04 09:22:22 +08:00			`### 3D Object Detection (on nuScenes test)`
Add README 2022-05-29 04:21:28 +08:00
[Major] Code release. 2022-06-03 12:21:18 +08:00			`\| Model \| Modality \| mAP \| NDS \|`
			`\| :-------: \| :------: \| :--: \| :--: \|`
Update news and results 2022-06-04 09:22:22 +08:00			`\| BEVFusion-e \| C+L \| 74.99 \| 76.09 \|`
			`\| BEVFusion \| C+L \| 70.23 \| 72.88 \|`
Update results on nuScenes and Waymo test set (#222) 2022-11-07 01:26:38 +08:00			`\| BEVFusion-base* \| C+L \| 71.72 \| 73.83 \|`

			`*: We scaled up MACs of the model to match the computation cost of concurrent work.`
Add README 2022-05-29 04:21:28 +08:00
Update news and results 2022-06-04 09:22:22 +08:00			`### 3D Object Detection (on nuScenes validation)`
Add README 2022-05-29 04:21:28 +08:00
			`\| Model \| Modality \| mAP \| NDS \| Checkpoint \|`
			`\| :------------------: \| :------: \| :--: \| :--: \| :---------: \|`
Update checkpoint links 2023-09-30 03:42:39 +08:00			`\| [BEVFusion](configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml) \| C+L \| 68.52 \| 71.38 \| [Link](https://www.dropbox.com/scl/fi/ulaz9z4wdwtypjhx7xdi3/bevfusion-det.pth?rlkey=ovusfi2rchjub5oafogou255v&dl=1) \|`
			`\| [Camera-Only Baseline](configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml) \| C \| 35.56 \| 41.21 \| [Link](https://www.dropbox.com/scl/fi/pxfaz1nc07qa2twlatzkz/camera-only-det.pth?rlkey=f5do81fawie0ssbg9uhrm6p30&dl=1) \|`
Update README.md 2024-03-29 08:34:23 +08:00			`\| [LiDAR-Only Baseline](configs/nuscenes/det/transfusion/secfpn/lidar/voxelnet_0p075.yaml) \| L \| 64.68 \| 69.28 \| [Link](https://www.dropbox.com/scl/fi/b1zvgrg9ucmv0wtx6pari/lidar-only-det.pth?rlkey=fw73bmdh57jxtudw6osloywah&dl=1) \|`
[Major] Code release. 2022-06-03 12:21:18 +08:00
			`Note: The camera-only object detection baseline is a variant of BEVDet-Tiny with a much heavier view transformer and other differences in hyperparameters. Thanks to our [efficient BEV pooling](mmdet3d/ops/bev_pool) operator, this model runs fast and has higher mAP than BEVDet-Tiny under the same input resolution. Please refer to [BEVDet repo](https://github.com/HuangJunjie2017/BEVDet) for the original BEVDet-Tiny implementation. The LiDAR-only baseline is TransFusion-L.`
Add README 2022-05-29 04:21:28 +08:00
Update news and results 2022-06-04 09:22:22 +08:00			`### BEV Map Segmentation (on nuScenes validation)`
Add README 2022-05-29 04:21:28 +08:00
			`\| Model \| Modality \| mIoU \| Checkpoint \|`
			`\| :------------------: \| :------: \| :--: \| :---------: \|`
Update checkpoint links 2023-09-30 03:42:39 +08:00			`\| [BEVFusion](configs/nuscenes/seg/fusion-bev256d2-lss.yaml) \| C+L \| 62.95 \| [Link](https://www.dropbox.com/scl/fi/8lgd1hkod2a15mwry0fvd/bevfusion-seg.pth?rlkey=2tmgw7mcrlwy9qoqeui63tay9&dl=1) \|`
			`\| [Camera-Only Baseline](configs/nuscenes/seg/camera-bev256d2.yaml) \| C \| 57.09 \| [Link](https://www.dropbox.com/scl/fi/cwpcu80n0shmwraegi6z4/camera-only-seg.pth?rlkey=l60kdaz19fq3gwocsjk09e60z&dl=1) \|`
			`\| [LiDAR-Only Baseline](configs/nuscenes/seg/lidar-centerpoint-bev128.yaml) \| L \| 48.56 \| [Link](https://www.dropbox.com/scl/fi/mi3w6uxvytdre9i42r9k7/lidar-only-seg.pth?rlkey=rve7hx80u3en1gfoi7tjucl72&dl=1) \|`
[Major] Code release. 2022-06-03 12:21:18 +08:00
			`## Usage`

			`### Prerequisites`

			`The code is built with following libraries:`

			`- Python >= 3.8, \<3.9`
[Minor] Update requirements. 2022-07-27 22:30:05 +08:00			`- OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)`
[Major] Update important requirement. 2022-07-16 22:30:21 +08:00			`- Pillow = 8.4.0 (see [here](https://github.com/mit-han-lab/bevfusion/issues/63))`
[Minor] Update requirements. 2022-06-10 23:37:16 +08:00			`- [PyTorch](https://github.com/pytorch/pytorch) >= 1.9, \<= 1.10.2`
[Major] Code release. 2022-06-03 12:21:18 +08:00			`- [tqdm](https://github.com/tqdm/tqdm)`
			`- [torchpack](https://github.com/mit-han-lab/torchpack)`
			`- [mmcv](https://github.com/open-mmlab/mmcv) = 1.4.0`
			`- [mmdetection](http://github.com/open-mmlab/mmdetection) = 2.20.0`
[Minor] Fix dependencies. 2022-06-17 07:23:42 +08:00			`- [nuscenes-dev-kit](https://github.com/nutonomy/nuscenes-devkit)`
[Major] Code release. 2022-06-03 12:21:18 +08:00
			`After installing these dependencies, please run this command to install the codebase:`

			```bash
			`python setup.py develop`
			```

Add docker support (#145) 2022-09-27 06:51:16 +08:00			We also provide a [Dockerfile](docker/Dockerfile) to ease environment setup. To get started with docker, please make sure that `nvidia-docker` is installed on your machine. After that, please execute the following command to build the docker image:

			```bash
			`cd docker && docker build . -t bevfusion`
			```

			`We can then run the docker with the following command:`

			```bash
			nvidia-docker run -it -v `pwd`/../data:/dataset --shm-size 16g bevfusion /bin/bash
			```

			`We recommend the users to run data preparation (instructions are available in the next section) outside the docker if possible. Note that the dataset directory should be an absolute path. Within the docker, please run the following command to clone our repo and install custom CUDA extensions:`

			```bash
			`cd home && git clone https://github.com/mit-han-lab/bevfusion && cd bevfusion`
			`python setup.py develop`
			```

			You can then create a symbolic link `data` to the `/dataset` directory in the docker.

[Major] Code release. 2022-06-03 12:21:18 +08:00			`### Data Preparation`

			`#### nuScenes`

			`Please follow the instructions from [here](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/nuscenes_det.md) to download and preprocess the nuScenes dataset. Please remember to download both detection dataset and the map extension (for BEV map segmentation). After data preparation, you will be able to see the following directory structure (as is indicated in mmdetection3d):`

			```
			`mmdetection3d`
			`├── mmdet3d`
			`├── tools`
			`├── configs`
			`├── data`
			`│ ├── nuscenes`
			`│ │ ├── maps`
			`│ │ ├── samples`
			`│ │ ├── sweeps`
			`│ │ ├── v1.0-test`
			`\| \| ├── v1.0-trainval`
			`│ │ ├── nuscenes_database`
			`│ │ ├── nuscenes_infos_train.pkl`
			`│ │ ├── nuscenes_infos_val.pkl`
			`│ │ ├── nuscenes_infos_test.pkl`
			`│ │ ├── nuscenes_dbinfos_train.pkl`
[Minor] Remove mono3d files in README. 2022-08-30 13:29:06 +08:00
[Major] Code release. 2022-06-03 12:21:18 +08:00			```

			`### Evaluation`

			`We also provide instructions for evaluating our pretrained models. Please download the checkpoints using the following script:`

			```bash
			`./tools/download_pretrained.sh`
			```

			`Then, you will be able to run:`

			```bash
Update the instruction of the -np flag. (#489) Co-authored-by: Qinru Li <q4li@eng.ucsd.edu> 2024-07-26 03:16:39 +08:00			`torchpack dist-run -np [number of gpus] python tools/test.py [config file path] pretrained/[checkpoint name].pth --eval [evaluation type]`
[Major] Code release. 2022-06-03 12:21:18 +08:00			```

			`For example, if you want to evaluate the detection variant of BEVFusion, you can try:`

			```bash
			`torchpack dist-run -np 8 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained/bevfusion-det.pth --eval bbox`
			```

			`While for the segmentation variant of BEVFusion, this command will be helpful:`

			```bash
			`torchpack dist-run -np 8 python tools/test.py configs/nuscenes/seg/fusion-bev256d2-lss.yaml pretrained/bevfusion-seg.pth --eval map`
			```

Add training details (#150) * [Major] Update training details. * [Minor] Update README.md. * [Minor] Remove comment. 2022-09-27 06:24:38 +08:00			`### Training`

			`We provide instructions to reproduce our results on nuScenes.`

			`For example, if you want to train the camera-only variant for object detection, please run:`

			```bash
			`torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth`
			```

			`For camera-only BEV segmentation model, please run:`

			```bash
			`torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/camera-bev256d2.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth`
			```

			`For LiDAR-only detector, please run:`

			```bash
			`torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/lidar/voxelnet_0p075.yaml`
			```

			`For LiDAR-only BEV segmentation model, please run:`

			```bash
			`torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/lidar-centerpoint-bev128.yaml`
			```

Release training configurations for fusion models (#257) * [Major] Add fusion model configs. * [Minor] Update training instructions. 2022-12-05 11:47:29 +08:00			`For BEVFusion detection model, please run:`
			```bash
			`torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth`
			```

			`For BEVFusion segmentation model, please run:`
			```bash
			`torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/fusion-bev256d2-lss.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth`
			```

			Note: please run `tools/test.py` separately after training to get the final evaluation metrics.

Add deployment to README (#462) 2023-08-12 05:06:12 +08:00			`## Deployment on TensorRT`
			`[CUDA-BEVFusion](https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/tree/master/CUDA-BEVFusion): Best practice for TensorRT, which provides INT8 acceleration solutions and achieves 25fps on ORIN.`

[Major] Code release. 2022-06-03 12:21:18 +08:00			`## FAQs`

			`Q: Can we directly use the info files prepared by mmdetection3d?`

			`A: We recommend re-generating the info files using this codebase since we forked mmdetection3d before their [coordinate system refactoring](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/changelog.md).`

			`## Acknowledgements`

			BEVFusion is based on [mmdetection3d](https://github.com/open-mmlab/mmdetection3d). It is also greatly inspired by the following outstanding contributions to the open-source community: [LSS](https://github.com/nv-tlabs/lift-splat-shoot), [BEVDet](https://github.com/HuangJunjie2017/BEVDet), [TransFusion](https://github.com/XuyangBai/TransFusion), [CenterPoint](https://github.com/tianweiy/CenterPoint), [MVP](https://github.com/tianweiy/MVP), [FUTR3D](https://arxiv.org/abs/2203.10642), [CVT](https://github.com/bradyz/cross_view_transformers) and [DETR3D](https://github.com/WangYueFt/detr3d).

Update README 2022-06-07 07:44:10 +08:00			`Please also check out related papers in the camera-only 3D perception community such as [BEVDet4D](https://arxiv.org/abs/2203.17054), [BEVerse](https://arxiv.org/abs/2205.09743), [BEVFormer](https://arxiv.org/abs/2203.17270), [M2BEV](https://arxiv.org/abs/2204.05088), [PETR](https://arxiv.org/abs/2203.05625) and [PETRv2](https://arxiv.org/abs/2206.01256), which might be interesting future extensions to BEVFusion.`
[Major] Code release. 2022-06-03 12:21:18 +08:00
Add README 2022-05-29 04:21:28 +08:00
			`## Citation`

			`If BEVFusion is useful or relevant to your research, please kindly recognize our contributions by citing our paper:`

			```bibtex
Update README.md 2023-02-10 05:02:55 +08:00			`@inproceedings{liu2022bevfusion,`
Add README 2022-05-29 04:21:28 +08:00			`title={BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation},`
			`author={Liu, Zhijian and Tang, Haotian and Amini, Alexander and Yang, Xingyu and Mao, Huizi and Rus, Daniela and Han, Song},`
Update README.md 2023-02-10 05:02:55 +08:00			`booktitle={IEEE International Conference on Robotics and Automation (ICRA)},`
			`year={2023}`
Add README 2022-05-29 04:21:28 +08:00			`}`
			```