472 lines
9.4 KiB
Markdown
472 lines
9.4 KiB
Markdown
# BEVFusion 全感知网络快速启动指南
|
||
|
||
**目标**:将BEVFusion扩展为完整的自动驾驶感知+定位+地图系统
|
||
**当前基础**:双任务模型(检测+分割)
|
||
**扩展方向**:+矢量地图 +定位 +轨迹预测
|
||
|
||
---
|
||
|
||
## 🎯 扩展目标
|
||
|
||
```
|
||
当前BEVFusion
|
||
├── 3D目标检测 ✅
|
||
└── BEV语义分割 ✅
|
||
|
||
扩展后完整系统
|
||
├── 3D目标检测 ✅
|
||
├── BEV语义分割 ✅
|
||
├── 矢量地图预测 🆕 (高精地图)
|
||
├── 自车定位 🆕 (厘米级定位)
|
||
├── 轨迹预测 🆕 (6秒预测)
|
||
└── 占用网格 🆕 (3D空间理解)
|
||
```
|
||
|
||
---
|
||
|
||
## 🚀 推荐方案:核心四任务系统
|
||
|
||
### 系统架构
|
||
```
|
||
检测 + 分割 + 矢量地图 + 定位
|
||
```
|
||
|
||
**为什么选这四个**:
|
||
- ✅ 覆盖自动驾驶核心需求
|
||
- ✅ 时间可控(3-4周)
|
||
- ✅ 性能和效率平衡
|
||
- ✅ 适合后续部署
|
||
|
||
---
|
||
|
||
## 📅 4周实施计划
|
||
|
||
### Week 1-2:当前训练(进行中)✅
|
||
```
|
||
状态: Epoch 3/23
|
||
预计完成: 2025-10-29
|
||
交付: 增强版双任务模型
|
||
```
|
||
|
||
### Week 3:矢量地图集成(11-01 ~ 11-07)
|
||
|
||
**Day 1-2:准备工作**
|
||
```bash
|
||
# 1. 克隆MapTR代码
|
||
cd /workspace
|
||
git clone https://github.com/hustvl/MapTR.git
|
||
|
||
# 2. 提取矢量地图数据
|
||
python tools/data_converter/extract_vector_map_bevfusion.py
|
||
# 输出: data/nuscenes/vector_maps.pkl (~500MB)
|
||
|
||
# 3. 可视化验证
|
||
python tools/visualize_vector_map.py --samples 10
|
||
```
|
||
|
||
**Day 3-4:代码实现**
|
||
```bash
|
||
# 1. 实现MapTRHead
|
||
# 文件: mmdet3d/models/heads/vector_map/maptr_head.py
|
||
# 参考: MAPTR_INTEGRATION_PLAN.md
|
||
|
||
# 2. 实现LoadVectorMap pipeline
|
||
# 文件: mmdet3d/datasets/pipelines/loading.py
|
||
|
||
# 3. 修改BEVFusion forward
|
||
# 支持vector_map head
|
||
```
|
||
|
||
**Day 5:测试**
|
||
```bash
|
||
# 小规模测试(100样本)
|
||
python tools/train.py \
|
||
configs/nuscenes/three_tasks/test_config.yaml \
|
||
--cfg-options max_epochs=1
|
||
```
|
||
|
||
**Day 6-7:训练**
|
||
```bash
|
||
# 三任务训练
|
||
bash scripts/train_three_tasks.sh
|
||
# 预计时间: 2天
|
||
```
|
||
|
||
---
|
||
|
||
### Week 4:定位功能集成(11-08 ~ 11-14)
|
||
|
||
**Day 1-3:地图数据库构建**
|
||
```python
|
||
# tools/build_bev_map_database.py
|
||
|
||
任务:
|
||
1. 从nuScenes map提取BEV地图
|
||
2. 构建地图tile数据库
|
||
3. 为每个场景匹配对应tile
|
||
|
||
输出:
|
||
- data/nuscenes/bev_maps/
|
||
├── boston-seaport/
|
||
├── singapore-onenorth/
|
||
└── ...
|
||
总大小: ~5GB
|
||
```
|
||
|
||
**Day 4-5:定位Head实现**
|
||
```python
|
||
# mmdet3d/models/heads/localization/bev_localization_head.py
|
||
|
||
功能:
|
||
1. BEV特征编码
|
||
2. 地图特征编码
|
||
3. 特征匹配
|
||
4. 位姿回归
|
||
5. 不确定性估计
|
||
```
|
||
|
||
**Day 6-7:四任务训练**
|
||
```bash
|
||
# 阶段1: 训练定位head(3 epochs)
|
||
torchpack dist-run -np 8 python tools/train.py \
|
||
configs/nuscenes/four_tasks/bevfusion_full.yaml \
|
||
--load_from runs/three_tasks/epoch_8.pth \
|
||
--freeze-heads object,map,vector_map
|
||
|
||
# 阶段2: 联合fine-tune(5 epochs)
|
||
torchpack dist-run -np 8 python tools/train.py \
|
||
configs/nuscenes/four_tasks/bevfusion_full.yaml \
|
||
--load_from runs/four_tasks_stage1/epoch_3.pth
|
||
```
|
||
|
||
---
|
||
|
||
## 🔧 代码实现框架
|
||
|
||
### 1. 三任务配置文件
|
||
|
||
```yaml
|
||
# configs/nuscenes/three_tasks/bevfusion_det_seg_vec.yaml
|
||
|
||
model:
|
||
type: BEVFusion
|
||
|
||
encoders:
|
||
camera: ${camera_encoder}
|
||
lidar: ${lidar_encoder}
|
||
|
||
fuser:
|
||
type: ConvFuser
|
||
|
||
decoder:
|
||
backbone: ${decoder_backbone}
|
||
neck: ${decoder_neck}
|
||
|
||
heads:
|
||
# Task 1: 3D检测
|
||
object:
|
||
type: TransFusionHead
|
||
# ... 配置
|
||
|
||
# Task 2: BEV分割
|
||
map:
|
||
type: EnhancedBEVSegmentationHead
|
||
# ... 配置
|
||
|
||
# Task 3: 矢量地图 🆕
|
||
vector_map:
|
||
type: MapTRHead
|
||
in_channels: 256
|
||
num_queries: 50
|
||
num_points: 20
|
||
num_classes: 3 # divider, boundary, crossing
|
||
embed_dims: 256
|
||
num_decoder_layers: 6
|
||
|
||
loss_scale:
|
||
object: 1.0
|
||
map: 1.0
|
||
vector_map: 1.0
|
||
|
||
# 数据pipeline
|
||
train_pipeline:
|
||
- type: LoadMultiViewImageFromFiles
|
||
- type: LoadPointsFromFile
|
||
- type: LoadAnnotations3D
|
||
- type: LoadVectorMap 🆕
|
||
# ...
|
||
```
|
||
|
||
---
|
||
|
||
### 2. 四任务配置文件
|
||
|
||
```yaml
|
||
# configs/nuscenes/four_tasks/bevfusion_full.yaml
|
||
|
||
model:
|
||
heads:
|
||
object: ${object_head}
|
||
map: ${map_head}
|
||
vector_map: ${vector_map_head}
|
||
|
||
# Task 4: 定位 🆕
|
||
localization:
|
||
type: BEVLocalizationHead
|
||
in_channels: 256
|
||
map_embedding_dim: 128
|
||
pose_dims: 6 # x,y,z,roll,pitch,yaw
|
||
|
||
loss_scale:
|
||
object: 1.0
|
||
map: 1.0
|
||
vector_map: 1.0
|
||
localization: 2.0 # 定位权重更高
|
||
|
||
# 数据pipeline
|
||
train_pipeline:
|
||
# ... 其他pipeline
|
||
- type: LoadBEVMapTile 🆕
|
||
- type: LoadEgoPose 🆕
|
||
```
|
||
|
||
---
|
||
|
||
## 💾 数据准备脚本
|
||
|
||
### 矢量地图提取
|
||
```bash
|
||
# 已有脚本(需创建)
|
||
python tools/data_converter/extract_vector_map_bevfusion.py \
|
||
--root data/nuscenes \
|
||
--output data/nuscenes/vector_maps.pkl
|
||
|
||
# 预计时间: 30分钟
|
||
# 输出大小: ~500MB
|
||
```
|
||
|
||
### BEV地图数据库构建
|
||
```bash
|
||
# 需要创建
|
||
python tools/build_bev_map_database.py \
|
||
--root data/nuscenes \
|
||
--output data/nuscenes/bev_maps \
|
||
--resolution 0.3 \
|
||
--tile-size 100
|
||
|
||
# 预计时间: 1-2天
|
||
# 输出大小: ~5GB
|
||
```
|
||
|
||
---
|
||
|
||
## 📊 性能预估
|
||
|
||
### 四任务系统
|
||
|
||
| 任务 | 预期性能 | 说明 |
|
||
|------|---------|------|
|
||
| 3D检测 | mAP 64-66% | 略微下降(多任务竞争) |
|
||
| BEV分割 | mIoU 55-58% | 略微下降 |
|
||
| 矢量地图 | mAP 50-55% | 新任务 |
|
||
| 定位 | 误差<0.5m | 新任务 |
|
||
|
||
**推理性能**:
|
||
- 参数量:130M
|
||
- 推理时间:120ms(A100)
|
||
- 推理时间:600-800ms(Orin,未优化)
|
||
- 优化后:<200ms(Orin)
|
||
|
||
---
|
||
|
||
## 🎯 立即可做的准备
|
||
|
||
### 本周准备(训练期间)
|
||
|
||
**1. 研究MapTR(4小时)**
|
||
```bash
|
||
# 克隆代码
|
||
git clone https://github.com/hustvl/MapTR.git
|
||
|
||
# 研究重点
|
||
- MapTRHead结构
|
||
- 数据格式
|
||
- 损失函数
|
||
```
|
||
|
||
**2. 设计定位方案(2小时)**
|
||
```
|
||
- 确定技术路线(地图匹配 vs VIO)
|
||
- 设计数据流
|
||
- 准备BEV地图tile规格
|
||
```
|
||
|
||
**3. 准备数据提取脚本(2小时)**
|
||
```bash
|
||
# 基于MAPTR_INTEGRATION_PLAN.md
|
||
# 实现extract_vector_map_bevfusion.py
|
||
```
|
||
|
||
---
|
||
|
||
### 下周准备(训练完成前)
|
||
|
||
**4. 实现MapTRHead(8小时)**
|
||
```
|
||
- 复制MapTR的Transformer Decoder
|
||
- 适配BEVFusion接口
|
||
- 实现Hungarian匹配
|
||
- 实现损失函数
|
||
```
|
||
|
||
**5. 构建BEV地图数据库(16小时)**
|
||
```
|
||
- 从nuScenes map提取
|
||
- 渲染为BEV表示
|
||
- 构建tile索引
|
||
- 测试查询效率
|
||
```
|
||
|
||
---
|
||
|
||
## 💡 技术难点和解决方案
|
||
|
||
### 难点1:多任务Loss平衡
|
||
**问题**:不同任务Loss量级差异大
|
||
**解决**:
|
||
```yaml
|
||
loss_scale:
|
||
object: 1.0
|
||
map: 1.0
|
||
vector_map: 1.0
|
||
localization: 2.0 # 动态调整
|
||
|
||
# 监控各任务loss,及时调整权重
|
||
```
|
||
|
||
### 难点2:定位精度
|
||
**问题**:GPS精度不足
|
||
**解决**:
|
||
- 使用地图匹配提升精度
|
||
- 多帧时序融合
|
||
- 卡尔曼滤波平滑
|
||
|
||
### 难点3:实时性能
|
||
**问题**:多任务推理时间长
|
||
**解决**:
|
||
- 共享backbone(节省计算)
|
||
- 模型剪枝(减少参数)
|
||
- TensorRT优化
|
||
- 任务优先级调度
|
||
|
||
---
|
||
|
||
## 📋 完整实施检查清单
|
||
|
||
### MapTR集成(Week 3-4)
|
||
- [ ] MapTR代码研究
|
||
- [ ] 矢量地图数据提取
|
||
- [ ] MapTRHead实现
|
||
- [ ] LoadVectorMap pipeline
|
||
- [ ] 三任务配置文件
|
||
- [ ] 三任务训练
|
||
- [ ] 性能评估
|
||
|
||
### 定位功能(Week 5)
|
||
- [ ] BEV地图数据库构建
|
||
- [ ] 定位Head实现
|
||
- [ ] LoadBEVMapTile pipeline
|
||
- [ ] LoadEgoPose pipeline
|
||
- [ ] 四任务配置文件
|
||
- [ ] 四任务训练
|
||
- [ ] 定位精度评估
|
||
|
||
### 可选扩展
|
||
- [ ] 轨迹预测Head
|
||
- [ ] 占用网格Head
|
||
- [ ] 五任务/六任务训练
|
||
|
||
---
|
||
|
||
## 🎓 参考资源
|
||
|
||
### 矢量地图相关
|
||
- MapTR: https://github.com/hustvl/MapTR
|
||
- MapTRv2: https://arxiv.org/abs/2308.05736
|
||
- VectorMapNet: https://github.com/Mrmoore98/VectorMapNet
|
||
|
||
### 定位相关
|
||
- BEV定位论文: https://arxiv.org/abs/2307.00138
|
||
- OrienterNet: https://github.com/facebookresearch/OrienterNet
|
||
- 地图匹配算法综述
|
||
|
||
### 轨迹预测
|
||
- MTR: https://github.com/sshaoshuai/MTR
|
||
- Wayformer: https://arxiv.org/abs/2207.05844
|
||
- nuScenes Prediction: https://www.nuscenes.org/prediction
|
||
|
||
### 占用网格
|
||
- MonoScene: https://github.com/astra-vision/MonoScene
|
||
- TPVFormer: https://github.com/wzzheng/TPVFormer
|
||
- OccNet: https://github.com/OpenDriveLab/OccNet
|
||
|
||
---
|
||
|
||
## 🎯 建议行动
|
||
|
||
### 立即决策
|
||
**问题1**:是否集成MapTR?
|
||
- ✅ 是 → 增加2周,获得矢量地图能力
|
||
- ❌ 否 → 节省时间,专注部署
|
||
|
||
**问题2**:是否需要定位?
|
||
- ✅ 是 → 增加1周,获得精确定位
|
||
- ❌ 否 → 依赖外部GPS/RTK
|
||
|
||
**问题3**:是否需要轨迹预测?
|
||
- ✅ 是 → 增加1周,适合规划决策
|
||
- ❌ 否 → 仅做感知
|
||
|
||
### 推荐配置(核心系统)
|
||
```
|
||
✅ 检测(已有)
|
||
✅ 分割(已有)
|
||
✅ 矢量地图(推荐)
|
||
✅ 定位(推荐)
|
||
❌ 轨迹(可选,暂不实现)
|
||
❌ 占用(可选,暂不实现)
|
||
|
||
总时间: 3-4周
|
||
参数量: 130M
|
||
```
|
||
|
||
---
|
||
|
||
## 🚀 快速启动(训练完成后)
|
||
|
||
### Step 1:决策扩展范围
|
||
```
|
||
填写决策表:
|
||
[ ] 需要矢量地图? → 是/否
|
||
[ ] 需要定位? → 是/否
|
||
[ ] 需要轨迹预测? → 是/否
|
||
[ ] 需要占用网格? → 是/否
|
||
|
||
基于决策选择实施路径
|
||
```
|
||
|
||
### Step 2:开始实施
|
||
```bash
|
||
# 如果选择三任务
|
||
bash scripts/implement_three_tasks.sh
|
||
|
||
# 如果选择四任务
|
||
bash scripts/implement_four_tasks.sh
|
||
```
|
||
|
||
---
|
||
|
||
**详细技术方案**:见`自动驾驶全感知网络扩展方案.md`
|
||
|
||
|