472 lines
9.4 KiB
Markdown
472 lines
9.4 KiB
Markdown
|
|
# BEVFusion 全感知网络快速启动指南
|
|||
|
|
|
|||
|
|
**目标**:将BEVFusion扩展为完整的自动驾驶感知+定位+地图系统
|
|||
|
|
**当前基础**:双任务模型(检测+分割)
|
|||
|
|
**扩展方向**:+矢量地图 +定位 +轨迹预测
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 扩展目标
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
当前BEVFusion
|
|||
|
|
├── 3D目标检测 ✅
|
|||
|
|
└── BEV语义分割 ✅
|
|||
|
|
|
|||
|
|
扩展后完整系统
|
|||
|
|
├── 3D目标检测 ✅
|
|||
|
|
├── BEV语义分割 ✅
|
|||
|
|
├── 矢量地图预测 🆕 (高精地图)
|
|||
|
|
├── 自车定位 🆕 (厘米级定位)
|
|||
|
|
├── 轨迹预测 🆕 (6秒预测)
|
|||
|
|
└── 占用网格 🆕 (3D空间理解)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 推荐方案:核心四任务系统
|
|||
|
|
|
|||
|
|
### 系统架构
|
|||
|
|
```
|
|||
|
|
检测 + 分割 + 矢量地图 + 定位
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**为什么选这四个**:
|
|||
|
|
- ✅ 覆盖自动驾驶核心需求
|
|||
|
|
- ✅ 时间可控(3-4周)
|
|||
|
|
- ✅ 性能和效率平衡
|
|||
|
|
- ✅ 适合后续部署
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📅 4周实施计划
|
|||
|
|
|
|||
|
|
### Week 1-2:当前训练(进行中)✅
|
|||
|
|
```
|
|||
|
|
状态: Epoch 3/23
|
|||
|
|
预计完成: 2025-10-29
|
|||
|
|
交付: 增强版双任务模型
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Week 3:矢量地图集成(11-01 ~ 11-07)
|
|||
|
|
|
|||
|
|
**Day 1-2:准备工作**
|
|||
|
|
```bash
|
|||
|
|
# 1. 克隆MapTR代码
|
|||
|
|
cd /workspace
|
|||
|
|
git clone https://github.com/hustvl/MapTR.git
|
|||
|
|
|
|||
|
|
# 2. 提取矢量地图数据
|
|||
|
|
python tools/data_converter/extract_vector_map_bevfusion.py
|
|||
|
|
# 输出: data/nuscenes/vector_maps.pkl (~500MB)
|
|||
|
|
|
|||
|
|
# 3. 可视化验证
|
|||
|
|
python tools/visualize_vector_map.py --samples 10
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Day 3-4:代码实现**
|
|||
|
|
```bash
|
|||
|
|
# 1. 实现MapTRHead
|
|||
|
|
# 文件: mmdet3d/models/heads/vector_map/maptr_head.py
|
|||
|
|
# 参考: MAPTR_INTEGRATION_PLAN.md
|
|||
|
|
|
|||
|
|
# 2. 实现LoadVectorMap pipeline
|
|||
|
|
# 文件: mmdet3d/datasets/pipelines/loading.py
|
|||
|
|
|
|||
|
|
# 3. 修改BEVFusion forward
|
|||
|
|
# 支持vector_map head
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Day 5:测试**
|
|||
|
|
```bash
|
|||
|
|
# 小规模测试(100样本)
|
|||
|
|
python tools/train.py \
|
|||
|
|
configs/nuscenes/three_tasks/test_config.yaml \
|
|||
|
|
--cfg-options max_epochs=1
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Day 6-7:训练**
|
|||
|
|
```bash
|
|||
|
|
# 三任务训练
|
|||
|
|
bash scripts/train_three_tasks.sh
|
|||
|
|
# 预计时间: 2天
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Week 4:定位功能集成(11-08 ~ 11-14)
|
|||
|
|
|
|||
|
|
**Day 1-3:地图数据库构建**
|
|||
|
|
```python
|
|||
|
|
# tools/build_bev_map_database.py
|
|||
|
|
|
|||
|
|
任务:
|
|||
|
|
1. 从nuScenes map提取BEV地图
|
|||
|
|
2. 构建地图tile数据库
|
|||
|
|
3. 为每个场景匹配对应tile
|
|||
|
|
|
|||
|
|
输出:
|
|||
|
|
- data/nuscenes/bev_maps/
|
|||
|
|
├── boston-seaport/
|
|||
|
|
├── singapore-onenorth/
|
|||
|
|
└── ...
|
|||
|
|
总大小: ~5GB
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Day 4-5:定位Head实现**
|
|||
|
|
```python
|
|||
|
|
# mmdet3d/models/heads/localization/bev_localization_head.py
|
|||
|
|
|
|||
|
|
功能:
|
|||
|
|
1. BEV特征编码
|
|||
|
|
2. 地图特征编码
|
|||
|
|
3. 特征匹配
|
|||
|
|
4. 位姿回归
|
|||
|
|
5. 不确定性估计
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Day 6-7:四任务训练**
|
|||
|
|
```bash
|
|||
|
|
# 阶段1: 训练定位head(3 epochs)
|
|||
|
|
torchpack dist-run -np 8 python tools/train.py \
|
|||
|
|
configs/nuscenes/four_tasks/bevfusion_full.yaml \
|
|||
|
|
--load_from runs/three_tasks/epoch_8.pth \
|
|||
|
|
--freeze-heads object,map,vector_map
|
|||
|
|
|
|||
|
|
# 阶段2: 联合fine-tune(5 epochs)
|
|||
|
|
torchpack dist-run -np 8 python tools/train.py \
|
|||
|
|
configs/nuscenes/four_tasks/bevfusion_full.yaml \
|
|||
|
|
--load_from runs/four_tasks_stage1/epoch_3.pth
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔧 代码实现框架
|
|||
|
|
|
|||
|
|
### 1. 三任务配置文件
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
# configs/nuscenes/three_tasks/bevfusion_det_seg_vec.yaml
|
|||
|
|
|
|||
|
|
model:
|
|||
|
|
type: BEVFusion
|
|||
|
|
|
|||
|
|
encoders:
|
|||
|
|
camera: ${camera_encoder}
|
|||
|
|
lidar: ${lidar_encoder}
|
|||
|
|
|
|||
|
|
fuser:
|
|||
|
|
type: ConvFuser
|
|||
|
|
|
|||
|
|
decoder:
|
|||
|
|
backbone: ${decoder_backbone}
|
|||
|
|
neck: ${decoder_neck}
|
|||
|
|
|
|||
|
|
heads:
|
|||
|
|
# Task 1: 3D检测
|
|||
|
|
object:
|
|||
|
|
type: TransFusionHead
|
|||
|
|
# ... 配置
|
|||
|
|
|
|||
|
|
# Task 2: BEV分割
|
|||
|
|
map:
|
|||
|
|
type: EnhancedBEVSegmentationHead
|
|||
|
|
# ... 配置
|
|||
|
|
|
|||
|
|
# Task 3: 矢量地图 🆕
|
|||
|
|
vector_map:
|
|||
|
|
type: MapTRHead
|
|||
|
|
in_channels: 256
|
|||
|
|
num_queries: 50
|
|||
|
|
num_points: 20
|
|||
|
|
num_classes: 3 # divider, boundary, crossing
|
|||
|
|
embed_dims: 256
|
|||
|
|
num_decoder_layers: 6
|
|||
|
|
|
|||
|
|
loss_scale:
|
|||
|
|
object: 1.0
|
|||
|
|
map: 1.0
|
|||
|
|
vector_map: 1.0
|
|||
|
|
|
|||
|
|
# 数据pipeline
|
|||
|
|
train_pipeline:
|
|||
|
|
- type: LoadMultiViewImageFromFiles
|
|||
|
|
- type: LoadPointsFromFile
|
|||
|
|
- type: LoadAnnotations3D
|
|||
|
|
- type: LoadVectorMap 🆕
|
|||
|
|
# ...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 2. 四任务配置文件
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
# configs/nuscenes/four_tasks/bevfusion_full.yaml
|
|||
|
|
|
|||
|
|
model:
|
|||
|
|
heads:
|
|||
|
|
object: ${object_head}
|
|||
|
|
map: ${map_head}
|
|||
|
|
vector_map: ${vector_map_head}
|
|||
|
|
|
|||
|
|
# Task 4: 定位 🆕
|
|||
|
|
localization:
|
|||
|
|
type: BEVLocalizationHead
|
|||
|
|
in_channels: 256
|
|||
|
|
map_embedding_dim: 128
|
|||
|
|
pose_dims: 6 # x,y,z,roll,pitch,yaw
|
|||
|
|
|
|||
|
|
loss_scale:
|
|||
|
|
object: 1.0
|
|||
|
|
map: 1.0
|
|||
|
|
vector_map: 1.0
|
|||
|
|
localization: 2.0 # 定位权重更高
|
|||
|
|
|
|||
|
|
# 数据pipeline
|
|||
|
|
train_pipeline:
|
|||
|
|
# ... 其他pipeline
|
|||
|
|
- type: LoadBEVMapTile 🆕
|
|||
|
|
- type: LoadEgoPose 🆕
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💾 数据准备脚本
|
|||
|
|
|
|||
|
|
### 矢量地图提取
|
|||
|
|
```bash
|
|||
|
|
# 已有脚本(需创建)
|
|||
|
|
python tools/data_converter/extract_vector_map_bevfusion.py \
|
|||
|
|
--root data/nuscenes \
|
|||
|
|
--output data/nuscenes/vector_maps.pkl
|
|||
|
|
|
|||
|
|
# 预计时间: 30分钟
|
|||
|
|
# 输出大小: ~500MB
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### BEV地图数据库构建
|
|||
|
|
```bash
|
|||
|
|
# 需要创建
|
|||
|
|
python tools/build_bev_map_database.py \
|
|||
|
|
--root data/nuscenes \
|
|||
|
|
--output data/nuscenes/bev_maps \
|
|||
|
|
--resolution 0.3 \
|
|||
|
|
--tile-size 100
|
|||
|
|
|
|||
|
|
# 预计时间: 1-2天
|
|||
|
|
# 输出大小: ~5GB
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 性能预估
|
|||
|
|
|
|||
|
|
### 四任务系统
|
|||
|
|
|
|||
|
|
| 任务 | 预期性能 | 说明 |
|
|||
|
|
|------|---------|------|
|
|||
|
|
| 3D检测 | mAP 64-66% | 略微下降(多任务竞争) |
|
|||
|
|
| BEV分割 | mIoU 55-58% | 略微下降 |
|
|||
|
|
| 矢量地图 | mAP 50-55% | 新任务 |
|
|||
|
|
| 定位 | 误差<0.5m | 新任务 |
|
|||
|
|
|
|||
|
|
**推理性能**:
|
|||
|
|
- 参数量:130M
|
|||
|
|
- 推理时间:120ms(A100)
|
|||
|
|
- 推理时间:600-800ms(Orin,未优化)
|
|||
|
|
- 优化后:<200ms(Orin)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 立即可做的准备
|
|||
|
|
|
|||
|
|
### 本周准备(训练期间)
|
|||
|
|
|
|||
|
|
**1. 研究MapTR(4小时)**
|
|||
|
|
```bash
|
|||
|
|
# 克隆代码
|
|||
|
|
git clone https://github.com/hustvl/MapTR.git
|
|||
|
|
|
|||
|
|
# 研究重点
|
|||
|
|
- MapTRHead结构
|
|||
|
|
- 数据格式
|
|||
|
|
- 损失函数
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**2. 设计定位方案(2小时)**
|
|||
|
|
```
|
|||
|
|
- 确定技术路线(地图匹配 vs VIO)
|
|||
|
|
- 设计数据流
|
|||
|
|
- 准备BEV地图tile规格
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**3. 准备数据提取脚本(2小时)**
|
|||
|
|
```bash
|
|||
|
|
# 基于MAPTR_INTEGRATION_PLAN.md
|
|||
|
|
# 实现extract_vector_map_bevfusion.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 下周准备(训练完成前)
|
|||
|
|
|
|||
|
|
**4. 实现MapTRHead(8小时)**
|
|||
|
|
```
|
|||
|
|
- 复制MapTR的Transformer Decoder
|
|||
|
|
- 适配BEVFusion接口
|
|||
|
|
- 实现Hungarian匹配
|
|||
|
|
- 实现损失函数
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**5. 构建BEV地图数据库(16小时)**
|
|||
|
|
```
|
|||
|
|
- 从nuScenes map提取
|
|||
|
|
- 渲染为BEV表示
|
|||
|
|
- 构建tile索引
|
|||
|
|
- 测试查询效率
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💡 技术难点和解决方案
|
|||
|
|
|
|||
|
|
### 难点1:多任务Loss平衡
|
|||
|
|
**问题**:不同任务Loss量级差异大
|
|||
|
|
**解决**:
|
|||
|
|
```yaml
|
|||
|
|
loss_scale:
|
|||
|
|
object: 1.0
|
|||
|
|
map: 1.0
|
|||
|
|
vector_map: 1.0
|
|||
|
|
localization: 2.0 # 动态调整
|
|||
|
|
|
|||
|
|
# 监控各任务loss,及时调整权重
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 难点2:定位精度
|
|||
|
|
**问题**:GPS精度不足
|
|||
|
|
**解决**:
|
|||
|
|
- 使用地图匹配提升精度
|
|||
|
|
- 多帧时序融合
|
|||
|
|
- 卡尔曼滤波平滑
|
|||
|
|
|
|||
|
|
### 难点3:实时性能
|
|||
|
|
**问题**:多任务推理时间长
|
|||
|
|
**解决**:
|
|||
|
|
- 共享backbone(节省计算)
|
|||
|
|
- 模型剪枝(减少参数)
|
|||
|
|
- TensorRT优化
|
|||
|
|
- 任务优先级调度
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 完整实施检查清单
|
|||
|
|
|
|||
|
|
### MapTR集成(Week 3-4)
|
|||
|
|
- [ ] MapTR代码研究
|
|||
|
|
- [ ] 矢量地图数据提取
|
|||
|
|
- [ ] MapTRHead实现
|
|||
|
|
- [ ] LoadVectorMap pipeline
|
|||
|
|
- [ ] 三任务配置文件
|
|||
|
|
- [ ] 三任务训练
|
|||
|
|
- [ ] 性能评估
|
|||
|
|
|
|||
|
|
### 定位功能(Week 5)
|
|||
|
|
- [ ] BEV地图数据库构建
|
|||
|
|
- [ ] 定位Head实现
|
|||
|
|
- [ ] LoadBEVMapTile pipeline
|
|||
|
|
- [ ] LoadEgoPose pipeline
|
|||
|
|
- [ ] 四任务配置文件
|
|||
|
|
- [ ] 四任务训练
|
|||
|
|
- [ ] 定位精度评估
|
|||
|
|
|
|||
|
|
### 可选扩展
|
|||
|
|
- [ ] 轨迹预测Head
|
|||
|
|
- [ ] 占用网格Head
|
|||
|
|
- [ ] 五任务/六任务训练
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎓 参考资源
|
|||
|
|
|
|||
|
|
### 矢量地图相关
|
|||
|
|
- MapTR: https://github.com/hustvl/MapTR
|
|||
|
|
- MapTRv2: https://arxiv.org/abs/2308.05736
|
|||
|
|
- VectorMapNet: https://github.com/Mrmoore98/VectorMapNet
|
|||
|
|
|
|||
|
|
### 定位相关
|
|||
|
|
- BEV定位论文: https://arxiv.org/abs/2307.00138
|
|||
|
|
- OrienterNet: https://github.com/facebookresearch/OrienterNet
|
|||
|
|
- 地图匹配算法综述
|
|||
|
|
|
|||
|
|
### 轨迹预测
|
|||
|
|
- MTR: https://github.com/sshaoshuai/MTR
|
|||
|
|
- Wayformer: https://arxiv.org/abs/2207.05844
|
|||
|
|
- nuScenes Prediction: https://www.nuscenes.org/prediction
|
|||
|
|
|
|||
|
|
### 占用网格
|
|||
|
|
- MonoScene: https://github.com/astra-vision/MonoScene
|
|||
|
|
- TPVFormer: https://github.com/wzzheng/TPVFormer
|
|||
|
|
- OccNet: https://github.com/OpenDriveLab/OccNet
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 建议行动
|
|||
|
|
|
|||
|
|
### 立即决策
|
|||
|
|
**问题1**:是否集成MapTR?
|
|||
|
|
- ✅ 是 → 增加2周,获得矢量地图能力
|
|||
|
|
- ❌ 否 → 节省时间,专注部署
|
|||
|
|
|
|||
|
|
**问题2**:是否需要定位?
|
|||
|
|
- ✅ 是 → 增加1周,获得精确定位
|
|||
|
|
- ❌ 否 → 依赖外部GPS/RTK
|
|||
|
|
|
|||
|
|
**问题3**:是否需要轨迹预测?
|
|||
|
|
- ✅ 是 → 增加1周,适合规划决策
|
|||
|
|
- ❌ 否 → 仅做感知
|
|||
|
|
|
|||
|
|
### 推荐配置(核心系统)
|
|||
|
|
```
|
|||
|
|
✅ 检测(已有)
|
|||
|
|
✅ 分割(已有)
|
|||
|
|
✅ 矢量地图(推荐)
|
|||
|
|
✅ 定位(推荐)
|
|||
|
|
❌ 轨迹(可选,暂不实现)
|
|||
|
|
❌ 占用(可选,暂不实现)
|
|||
|
|
|
|||
|
|
总时间: 3-4周
|
|||
|
|
参数量: 130M
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 快速启动(训练完成后)
|
|||
|
|
|
|||
|
|
### Step 1:决策扩展范围
|
|||
|
|
```
|
|||
|
|
填写决策表:
|
|||
|
|
[ ] 需要矢量地图? → 是/否
|
|||
|
|
[ ] 需要定位? → 是/否
|
|||
|
|
[ ] 需要轨迹预测? → 是/否
|
|||
|
|
[ ] 需要占用网格? → 是/否
|
|||
|
|
|
|||
|
|
基于决策选择实施路径
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 2:开始实施
|
|||
|
|
```bash
|
|||
|
|
# 如果选择三任务
|
|||
|
|
bash scripts/implement_three_tasks.sh
|
|||
|
|
|
|||
|
|
# 如果选择四任务
|
|||
|
|
bash scripts/implement_four_tasks.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**详细技术方案**:见`自动驾驶全感知网络扩展方案.md`
|
|||
|
|
|
|||
|
|
|