# BEVFusion 两个版本对比分析

## 两个BEVFusion版本概述

### 1. MIT-BEVFusion (当前项目)
- **来源**: [MIT Han Lab](https://github.com/mit-han-lab/bevfusion)
- **论文**: "BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation" (ICRA 2023)
- **重点**: **BEV空间融合** + **高效BEV Pooling优化**
- **特色**: 40x速度提升的BEV pooling算子

### 2. ADLab-BEVFusion
- **来源**: [ADLab-AutoDrive](https://github.com/ADLab-AutoDrive/BEVFusion)
- **论文**: "BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework" (不同论文)
- **重点**: **鲁棒性** + **LiDAR故障场景**
- **特色**: 测试LiDAR失效场景下的性能

---

## 核心差异对比

| 对比项 | MIT-BEVFusion (当前) | ADLab-BEVFusion |
|--------|---------------------|-----------------|
| **论文重点** | BEV统一表示 + 效率优化 | 简单鲁棒的融合框架 |
| **核心创新** | 高效BEV Pooling (40x加速) | 鲁棒的融合策略 |
| **融合方式** | BEV空间融合 | BEV空间融合 |
| **代码库** | 基于mmdet3d早期版本 | 基于mmdet3d 0.11.0 |
| **mmdet版本** | 2.20.0 | 2.11.0 |
| **PyTorch版本** | 1.9-1.10.2 | 1.7.0 |
| **特色功能** | 多任务(检测+分割) | LiDAR故障测试 |
| **BEV Pooling** | 自定义CUDA优化算子 | 标准实现 |
| **训练流程** | 端到端 | 分阶段(Camera→LiDAR→Fusion) |

---

## 目录结构对比

### MIT-BEVFusion (当前项目)

```
bevfusion/
├── configs/              配置文件
│   ├── default.yaml
│   └── nuscenes/
│       ├── det/         检测配置
│       └── seg/         分割配置
├── mmdet3d/             核心代码
│   ├── models/
│   │   ├── fusers/      融合模块
│   │   │   ├── conv.py  ConvFuser
│   │   │   └── add.py   AddFuser
│   │   ├── vtransforms/ 视图转换
│   │   │   ├── lss.py
│   │   │   └── depth_lss.py
│   │   └── fusion_models/
│   │       └── bevfusion.py
│   ├── ops/             CUDA算子
│   │   ├── bev_pool/    ★ 高效BEV pooling
│   │   ├── spconv/      稀疏卷积
│   │   └── voxel/       体素化
│   └── ...
├── tools/
│   ├── train.py         训练脚本
│   └── test.py          测试脚本
└── docker/              Docker配置
```

### ADLab-BEVFusion

```
BEVFusion/
├── configs/             配置文件
│   ├── bevfusion/       BEVFusion配置
│   │   ├── cam_stream/  相机分支训练
│   │   ├── lidar_stream/ LiDAR分支训练
│   │   ├── drop_fov/    FOV受限测试
│   │   └── drop_bbox/   物体失效测试
├── mmdet3d/             核心代码
│   └── models/
│       ├── detectors/   检测器
│       └── ...
├── mmcv_custom/         自定义mmcv组件
├── mmdetection-2.11.0/  内嵌mmdetection
├── requirements/        依赖管理
├── tests/               测试代码
└── tools/
    ├── dist_train.sh    分布式训练脚本
    └── dist_test.sh     分布式测试脚本
```

---

## 关键技术差异

### 1. BEV Pooling实现

#### MIT版本 (当前)
```
高效BEV Pooling算子:
  位置: mmdet3d/ops/bev_pool/
  实现: 自定义CUDA kernel
  性能: 相比原始LSS快40x
  
代码:
  from mmdet3d.ops import bev_pool_v2
  output = bev_pool_v2(depth, features, ranks, ...)
  
优势:
  ✅ 极致优化的CUDA实现
  ✅ 内存和速度双优
  ✅ 支持FP16
```

#### ADLab版本
```
标准BEV Pooling:
  使用标准的PyTorch操作
  性能: 标准速度
  
优势:
  ✅ 代码简单易懂
  ✅ 易于修改和扩展
  ✅ 依赖少
```

### 2. 训练策略

#### MIT版本 (当前)
```
端到端训练:
  1. 同时训练camera和lidar encoder
  2. 使用预训练的lidar-only模型初始化
  
命令:
  torchpack dist-run -np 8 python tools/train.py config.yaml \
    --model.encoders.camera.backbone.init_cfg.checkpoint camera_pretrain.pth \
    --load_from lidar-only-det.pth
  
特点:
  - 一次训练完成
  - 简单直接
```

#### ADLab版本
```
分阶段训练（推荐流程）:
  阶段1: 训练camera stream (nuImage数据集)
    ./tools/dist_train.sh configs/bevfusion/cam_stream/mask_rcnn_*.py 8
  
  阶段2: 训练camera BEV分支
    ./tools/dist_train.sh configs/bevfusion/cam_stream/bevf_pp_*_cam.py 8
  
  阶段3: 训练LiDAR stream
    ./tools/dist_train.sh configs/bevfusion/lidar_stream/hv_pointpillars_*.py 8
  
  阶段4: 融合训练
    ./tools/dist_train.sh configs/bevfusion/bevf_pp_*.py 8

特点:
  - 更稳定
  - 每个阶段可以单独调试
  - 适合工业应用
```

### 3. 配置系统

#### MIT版本 (当前)
```yaml
# 使用torchpack配置系统
# YAML格式，支持变量替换

model:
  encoders:
    camera: ${camera_config}
    lidar: ${lidar_config}
  fuser:
    type: ConvFuser
  heads:
    object: ${detection_config}
    map: ${segmentation_config}

# 使用${}语法引用变量
point_cloud_range: ${point_cloud_range}
```

#### ADLab版本
```python
# 使用mmdetection配置系统
# Python格式配置文件

_base_ = [
    '../_base_/models/bevfusion.py',
    '../_base_/datasets/nus-3d.py',
    '../_base_/schedules/schedule_2x.py',
]

# 直接Python代码配置
model = dict(
    type='BEVFusion',
    pts_voxel_layer=dict(...),
    pts_bbox_head=dict(...),
)
```

### 4. 特色功能

#### MIT版本 (当前)
```
1. 多任务支持
   - 同时支持3D检测和BEV分割
   - 共享backbone
   
2. 高效算子
   - 优化的BEV pooling (CUDA)
   - 优化的稀疏卷积
   
3. 灵活配置
   - 支持多种backbone (ResNet, SwinTransformer, VoVNet)
   - 支持多种vtransform (LSS, DepthLSS, BEVDepth)

4. 性能基准
   Waymo排行榜第一
   nuScenes检测和分割都是第一
```

#### ADLab版本
```
1. 鲁棒性测试
   - LiDAR FOV受限场景
   - LiDAR物体失效场景
   - 评估融合框架的鲁棒性
   
2. 实用性优化
   - 简单的训练流程
   - 工业级实现
   
3. 多种配置
   - PointPillars版本
   - CenterPoint版本
   - TransFusion版本

4. 实验设置
   FOV受限: (-π/3, π/3) 或 (-π/2, π/2)
   物体失效: 随机drop 50%前景物体
```

---

## 性能对比

### MIT-BEVFusion (当前项目结果)

**nuScenes Validation**:
| 模型 | 模态 | mAP | NDS |
|------|------|-----|-----|
| BEVFusion | C+L | 68.52 | 71.38 |
| Camera-Only | C | 35.56 | 41.21 |
| LiDAR-Only | L | 64.68 | 69.28 |

**BEV分割**:
| 模型 | mIoU |
|------|------|
| BEVFusion | 62.95% |

### ADLab-BEVFusion (从网站数据)

**nuScenes Validation**:
| 模型 | Head | 3D Backbone | 2D Backbone | mAP | NDS |
|------|------|-------------|-------------|-----|-----|
| BEVFusion | PointPillars | PointPillars | Dual-Swin-T | 52.9 | 61.6 |
| BEVFusion | CenterPoint | VoxelNet | Dual-Swin-T | 60.9 | 67.5 |
| BEVFusion* | TransFusion-L | VoxelNet | Dual-Swin-T | **69.6** | **72.1** |

*使用BEV空间数据增强

**LiDAR故障场景**:
| 场景 | mAP | NDS |
|------|-----|-----|
| FOV限制 (-π/3,π/3) | 41.5 | 50.8 |
| FOV限制 (-π/2,π/2) | 46.4 | 55.8 |
| 50%物体失效 | 50.3 | 57.6 |

---

## 代码实现差异

### 1. 模型架构文件位置

#### MIT版本 (当前)
```
mmdet3d/models/
├── fusion_models/
│   ├── base.py
│   └── bevfusion.py          ← 主模型
├── fusers/
│   ├── conv.py               ← ConvFuser
│   └── add.py                ← AddFuser
├── vtransforms/
│   ├── lss.py
│   ├── depth_lss.py
│   └── aware_bevdepth.py
```

#### ADLab版本
```
mmdet3d/models/
├── detectors/
│   └── bevfusion.py          ← 继承自mvx_two_stage
├── fusion_layers/
│   └── point_fusion.py       ← 点云融合
├── backbones/
│   └── dual_swin.py          ← Dual-Swin backbone
```

### 2. 配置文件组织

#### MIT版本 (当前)
```
configs/
├── default.yaml              全局配置
└── nuscenes/
    ├── det/                  检测配置
    │   ├── centerhead/
    │   └── transfusion/
    │       └── secfpn/
    │           ├── camera/   单camera
    │           ├── lidar/    单lidar
    │           └── camera+lidar/  融合
    └── seg/                  分割配置
        ├── camera-bev256d2.yaml
        └── fusion-bev256d2-lss.yaml

特点:
  - YAML格式
  - 变量替换系统 (${variable})
  - 按任务组织
```

#### ADLab版本
```
configs/
├── _base_/                   基础配置
│   ├── models/
│   ├── datasets/
│   └── schedules/
└── bevfusion/
    ├── cam_stream/           相机训练配置
    │   ├── mask_rcnn_*.py    2D检测
    │   └── bevf_pp_*_cam.py  BEV camera
    ├── lidar_stream/         LiDAR训练配置
    │   └── hv_pointpillars_*.py
    ├── bevf_pp_*.py          融合配置(PointPillars)
    ├── bevf_cp_*.py          融合配置(CenterPoint)
    ├── bevf_tf_*.py          融合配置(TransFusion)
    ├── drop_fov/             FOV受限测试
    └── drop_bbox/            物体失效测试

特点:
  - Python格式
  - mmdetection标准配置继承
  - 按训练阶段组织
```

---

## 训练流程对比

### MIT版本 (当前) - 端到端

```bash
# 一步到位
torchpack dist-run -np 8 python tools/train.py \
  configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml \
  --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \
  --load_from pretrained/lidar-only-det.pth

优点:
  ✅ 简单，一条命令
  ✅ 快速开始
  
缺点:
  ⚠️ 如果失败，整个流程重来
```

### ADLab版本 - 渐进式

```bash
# 步骤1: 训练2D检测backbone (nuImage数据集)
./tools/dist_train.sh \
  configs/bevfusion/cam_stream/mask_rcnn_dbswin-t_fpn_3x_nuim_cocopre.py 8

# 步骤2: 训练camera BEV分支
./tools/dist_train.sh \
  configs/bevfusion/cam_stream/bevf_pp_4x8_2x_nusc_cam.py 8

# 步骤3: 训练LiDAR分支
./tools/dist_train.sh \
  configs/bevfusion/lidar_stream/hv_pointpillars_secfpn_sbn-all_4x8_2x_nus-3d.py 8

# 步骤4: 融合训练
./tools/dist_train.sh \
  configs/bevfusion/bevf_pp_2x8_1x_nusc.py 8

优点:
  ✅ 每个阶段独立，易于调试
  ✅ 更稳定
  ✅ 可以单独优化每个阶段
  
缺点:
  ⚠️ 需要多个步骤
  ⚠️ 总时间更长
```

---

## 关键代码差异

### BEV Pooling算子

#### MIT版本 (当前) - 高度优化
```python
# mmdet3d/ops/bev_pool/bev_pool.py
from . import bev_pool_v2  # CUDA扩展

def bev_pool(depth, feat, ranks_depth, ranks_feat, ...):
    # 使用优化的CUDA kernel
    output = bev_pool_v2(depth, feat, ranks_depth, ...)
    return output

性能:
  - 速度: 相比原始LSS快40x
  - 内存: 优化的内存管理
  - 实现: C++/CUDA
```

#### ADLab版本 - 标准实现
```python
# 使用标准PyTorch操作
def bev_pool(depth, feat, geometry):
    # 标准的scatter和gather操作
    bev_feat = torch.zeros(...)
    for d in depth_bins:
        # 标准PyTorch实现
        bev_feat += scatter_nd(...)
    return bev_feat

性能:
  - 速度: 标准PyTorch速度
  - 内存: 标准
  - 实现: 纯Python/PyTorch
```

### Fuser实现

#### MIT版本 (当前) - 模块化
```python
# mmdet3d/models/fusers/conv.py
@FUSERS.register_module()
class ConvFuser(nn.Sequential):
    def __init__(self, in_channels, out_channels):
        super().__init__(
            nn.Conv2d(sum(in_channels), out_channels, 3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(True),
        )
    
    def forward(self, inputs):
        return super().forward(torch.cat(inputs, dim=1))

使用:
  model:
    fuser:
      type: ConvFuser  # 或 AddFuser
      in_channels: [80, 256]
      out_channels: 256
```

#### ADLab版本 - 集成在模型中
```python
# 融合逻辑直接在BEVFusion模型类中
class BEVFusion(MVXTwoStageDetector):
    def forward(self, ...):
        # camera features
        cam_bev = self.extract_cam_feat(...)
        # lidar features  
        lidar_bev = self.extract_pts_feat(...)
        # 融合
        fused = self.fusion_layer(cam_bev, lidar_bev)
        ...

特点:
  - 融合逻辑耦合在模型中
  - 不易切换融合策略
```

---

## 特色功能对比

### MIT版本 (当前) - 多任务

```yaml
# 支持同时训练检测和分割
model:
  heads:
    object:
      type: TransFusionHead
      # 检测配置
    map:
      type: BEVSegmentationHead
      # 分割配置

一个模型输出:
  - 3D检测框
  - BEV分割图
```

### ADLab版本 - 鲁棒性测试

```python
# drop_fov/: LiDAR FOV受限测试
configs/bevfusion/drop_fov/fov60_bevf_tf_*.py
configs/bevfusion/drop_fov/fov90_bevf_tf_*.py

# drop_bbox/: 前景物体失效测试  
configs/bevfusion/drop_bbox/halfbox_bevf_tf_*.py

测试场景:
  1. LiDAR FOV从360°缩减到60°或90°
  2. 随机drop 50%的前景物体点云
  3. 评估融合框架的鲁棒性

结果:
  - FOV 60°时: mAP 41.5 (vs 正常69.6)
  - 50%物体失效: mAP 50.3 (vs 正常69.6)
  
  证明相机可以补偿LiDAR的失效
```

---

## 依赖差异

### MIT版本 (当前)
```
Python >= 3.8, < 3.9
PyTorch >= 1.9, <= 1.10.2
mmcv = 1.4.0
mmdet = 2.20.0
torchpack (必需)
CUDA 11.3

特点:
  - 较新的PyTorch版本
  - 需要torchpack
```

### ADLab版本
```
Python = 3.8.3
PyTorch = 1.7.0
mmcv = 1.4.0
mmdet = 2.11.0 (内嵌在项目中)
不需要torchpack
CUDA 10.2/11.0

特点:
  - 较旧但稳定的PyTorch版本
  - 内嵌mmdetection，依赖管理更简单
```

---

## 适用场景

### MIT-BEVFusion (当前) 适合：

1. **研究和论文复现**
   - 追求最先进性能
   - 需要多任务支持
   - 关注推理速度

2. **边缘部署**
   - 利用优化的BEV pooling
   - 需要实时性能
   - TensorRT部署

3. **快速实验**
   - 端到端训练
   - 快速迭代
   - 简单配置

### ADLab-BEVFusion 适合：

1. **工业应用**
   - 需要稳定性和鲁棒性
   - 多阶段训练更可控
   - 需要测试传感器失效场景

2. **传感器失效研究**
   - 研究LiDAR故障场景
   - 评估融合鲁棒性
   - 安全关键系统

3. **教学和学习**
   - 代码结构更标准（遵循mmdet3d规范）
   - 易于理解和修改
   - 不需要torchpack

---

## 目录结构详细对比

### 当前项目（MIT）独有的目录

```
mmdet3d/ops/bev_pool/         ★ 高效BEV pooling CUDA算子
mmdet3d/models/vtransforms/   视图转换模块（支持多种）
mmdet3d/models/fusers/        独立的融合模块
mmdet3d/models/fusion_models/ 融合模型基类
```

### ADLab版本独有的目录

```
.dev_scripts/                 开发脚本
mmcv_custom/                  自定义mmcv组件
mmdetection-2.11.0/           内嵌的mmdetection
requirements/                 详细的依赖管理
  ├── build.txt
  ├── optional.txt
  ├── runtime.txt
  └── tests.txt
tests/                        完整的测试套件
demo/                         演示脚本
docs/                         完整文档
configs/bevfusion/
  ├── drop_fov/               ★ FOV受限测试
  └── drop_bbox/              ★ 物体失效测试
```

---

## 训练脚本差异

### MIT版本 (当前)
```bash
# 使用torchpack
torchpack dist-run -np 8 python tools/train.py config.yaml

# 直接使用Python
python tools/train.py config.yaml
```

### ADLab版本
```bash
# 使用标准的bash脚本
./tools/dist_train.sh config.py 8

# 内部调用
python -m torch.distributed.launch \
  --nproc_per_node=8 \
  tools/train.py config.py
```

---

## 当前多任务训练状态确认

让我检查一下训练是否还在运行：

<function_calls>
<invoke name="run_terminal_cmd">
<parameter name="command">tail -20 /workspace/bevfusion/training_multitask_v4.log && echo "" && echo "---GPU状态---" && nvidia-smi --query-gpu=index,utilization.gpu,memory.used --format=csv,noheader