374 lines
9.6 KiB
Markdown
374 lines
9.6 KiB
Markdown
|
|
# BEVFusion项目状态更新
|
|||
|
|
|
|||
|
|
**更新时间**: 2025年10月30日
|
|||
|
|
**总体进度**: Phase 3完成 (100%), Phase 4A准备完成 (待启动)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 总体进度
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
项目进度: ████████████████░░░░ 75%
|
|||
|
|
|
|||
|
|
✅ Phase 1: 基础训练 (100%) - 完成
|
|||
|
|
✅ Phase 2: 性能优化 (100%) - 完成
|
|||
|
|
✅ Phase 3: 增强训练 (100%) - 完成 ⭐
|
|||
|
|
⏸️ Phase 4A: BEV分辨率提升 (95%) - 配置完成,待环境恢复
|
|||
|
|
⏳ Phase 4B: 模型压缩 (0%) - 未开始
|
|||
|
|
🔄 Phase 5: 实车数据准备 (30%) - 并行进行
|
|||
|
|
⏳ Phase 6: 实车微调 (0%) - 未开始
|
|||
|
|
⏳ Phase 7: 部署优化 (0%) - 未开始
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎉 Phase 3 最终成果 (2025-10-29完成)
|
|||
|
|
|
|||
|
|
### 训练概况
|
|||
|
|
|
|||
|
|
**训练周期**: 2025-10-21 ~ 2025-10-29 (8天)
|
|||
|
|
**总Epochs**: 23 (超出原计划20 epochs)
|
|||
|
|
**训练配置**:
|
|||
|
|
- BEV分辨率: 0.3m
|
|||
|
|
- Decoder: 2层 [256, 128]
|
|||
|
|
- GPU: 6张 Tesla V100S
|
|||
|
|
- Batch size: 2/GPU
|
|||
|
|
- 学习率: 5e-5 → 2.8e-7 (cosine)
|
|||
|
|
|
|||
|
|
### 最终性能
|
|||
|
|
|
|||
|
|
**3D目标检测**:
|
|||
|
|
- **NDS**: 0.6941 (69.41%) ⭐ 达到SOTA的97.2%
|
|||
|
|
- **mAP**: 0.6446 (64.46%) ⭐ 达到SOTA的91.6%
|
|||
|
|
|
|||
|
|
**各类别检测AP** (最佳@4m):
|
|||
|
|
```
|
|||
|
|
行人 (Pedestrian): 85.79% 优秀 ✅
|
|||
|
|
小汽车 (Car): 90.39% 优秀 ✅
|
|||
|
|
巴士 (Bus): 86.12% 优秀 ✅
|
|||
|
|
锥桶 (Traffic Cone): 79.35% 良好 ✅
|
|||
|
|
摩托车 (Motorcycle): 76.87% 良好 ✅
|
|||
|
|
路障 (Barrier): 73.04% 良好 ✅
|
|||
|
|
卡车 (Truck): 71.01% 良好 ✅
|
|||
|
|
拖车 (Trailer): 66.12% 中等 ⚠️
|
|||
|
|
自行车 (Bicycle): 60.18% 中等 ⚠️
|
|||
|
|
工程车 (Construction): 44.39% 待提升 ⚠️
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**BEV语义分割**:
|
|||
|
|
- **整体mIoU**: 0.4130 (41.30%)
|
|||
|
|
|
|||
|
|
**各类别IoU**:
|
|||
|
|
```
|
|||
|
|
可行驶区域 (Drivable): 70.63% 优秀 ✅
|
|||
|
|
人行道 (Walkway): 52.78% 良好 ✅
|
|||
|
|
停车区域 (Carpark): 39.48% 中等 ⚠️
|
|||
|
|
人行横道 (Ped Cross): 39.31% 中等 ⚠️
|
|||
|
|
停止线 (Stop Line): 26.57% 低 ❌ 需改进
|
|||
|
|
分隔线 (Divider): 19.03% 低 ❌ 需改进
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 关键发现
|
|||
|
|
|
|||
|
|
✅ **优势**:
|
|||
|
|
1. NDS达到论文SOTA的97.2%,非常接近
|
|||
|
|
2. 主要类别(行人、车辆)检测优秀(85%+)
|
|||
|
|
3. 可行驶区域分割IoU达70.63%
|
|||
|
|
4. 训练过程稳定,无异常
|
|||
|
|
|
|||
|
|
❌ **待改进**:
|
|||
|
|
1. **小目标分割瓶颈**: Stop Line和Divider IoU极低
|
|||
|
|
- 根本原因: 0.3m分辨率无法表达0.1-0.15m的线条
|
|||
|
|
- 解决方案: Phase 4A BEV 2x分辨率提升
|
|||
|
|
2. 稀有类别AP偏低(工程车、拖车)
|
|||
|
|
|
|||
|
|
### Checkpoint
|
|||
|
|
|
|||
|
|
**最佳模型**:
|
|||
|
|
- `epoch_22.pth`: NDS 0.6948 (最佳)
|
|||
|
|
- `epoch_23.pth`: NDS 0.6941, mAP 0.6446 (最终版,推荐)
|
|||
|
|
- `epoch_21.pth`: mAP 0.6453 (最佳)
|
|||
|
|
|
|||
|
|
**存储位置**: `runs/enhanced_from_epoch19/`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 Phase 4A: BEV 2x分辨率提升 (当前阶段)
|
|||
|
|
|
|||
|
|
### 目标
|
|||
|
|
|
|||
|
|
**解决Phase 3的小目标分割瓶颈**
|
|||
|
|
|
|||
|
|
### 技术方案
|
|||
|
|
|
|||
|
|
**核心改进**:
|
|||
|
|
1. **BEV分辨率翻倍**: 0.3m → 0.15m
|
|||
|
|
2. **Decoder升级**: 2层 → 4层
|
|||
|
|
3. **启用Deep Supervision**
|
|||
|
|
4. **启用Dice Loss**
|
|||
|
|
|
|||
|
|
**具体配置**:
|
|||
|
|
```yaml
|
|||
|
|
model:
|
|||
|
|
encoders:
|
|||
|
|
camera:
|
|||
|
|
vtransform:
|
|||
|
|
xbound: [-54.0, 54.0, 0.15] # 从0.3改为0.15
|
|||
|
|
ybound: [-54.0, 54.0, 0.15]
|
|||
|
|
# BEV: 720×720 (之前360×360)
|
|||
|
|
|
|||
|
|
heads:
|
|||
|
|
map:
|
|||
|
|
decoder_channels: [256, 256, 128, 128] # 从[256,128]升级
|
|||
|
|
deep_supervision: true # 启用
|
|||
|
|
use_dice_loss: true # 启用
|
|||
|
|
grid_transform:
|
|||
|
|
output_scope: [[-50, 50, 0.25], [-50, 50, 0.25]] # 400×400
|
|||
|
|
|
|||
|
|
train_pipeline:
|
|||
|
|
- type: LoadBEVSegmentation
|
|||
|
|
xbound: [-50.0, 50.0, 0.125] # GT: 800×800
|
|||
|
|
ybound: [-50.0, 50.0, 0.125]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 预期性能
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
检测性能 (略微提升):
|
|||
|
|
NDS: 0.6941 → 0.710 (+2.3%)
|
|||
|
|
mAP: 0.6446 → 0.670 (+3.9%)
|
|||
|
|
|
|||
|
|
分割性能 (重大突破):
|
|||
|
|
整体mIoU: 0.4130 → 0.540 (+30.7%) 🎉
|
|||
|
|
|
|||
|
|
小目标IoU (关键提升):
|
|||
|
|
Stop Line: 0.2657 → 0.445 (+67.3%) 🚀
|
|||
|
|
Divider: 0.1903 → 0.365 (+92.1%) 🚀
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 当前状态
|
|||
|
|
|
|||
|
|
✅ **已完成**:
|
|||
|
|
1. 配置文件创建: `multitask_BEV2X_phase4a.yaml`
|
|||
|
|
2. 启动脚本准备: `start_phase4a_bev2x_fixed.sh`
|
|||
|
|
3. 监控脚本创建: `monitor_phase4a.sh`
|
|||
|
|
4. Checkpoint确认: `epoch_23.pth` (516MB)
|
|||
|
|
|
|||
|
|
⏸️ **当前阻塞**:
|
|||
|
|
- 环境库问题: `ImportError: libtorch_cuda_cu.so`
|
|||
|
|
- 需要解决后才能启动训练
|
|||
|
|
|
|||
|
|
📋 **待完成**:
|
|||
|
|
1. 解决环境问题
|
|||
|
|
2. 启动20 epochs训练 (~12.5天)
|
|||
|
|
3. 每个epoch评估性能
|
|||
|
|
4. 选择最佳checkpoint
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💻 环境配置记录
|
|||
|
|
|
|||
|
|
### 成功的训练环境 (Phase 3)
|
|||
|
|
|
|||
|
|
**系统环境**:
|
|||
|
|
```
|
|||
|
|
OS: Linux (Docker容器)
|
|||
|
|
GPU: 6× Tesla V100S-PCIE-32GB
|
|||
|
|
CUDA: 11.3
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Python环境**:
|
|||
|
|
```
|
|||
|
|
Python: 3.8
|
|||
|
|
PyTorch: 1.10.1
|
|||
|
|
mmcv: 1.4.0
|
|||
|
|
mmdet: 2.24.0
|
|||
|
|
mmdet3d: 1.0.0rc2
|
|||
|
|
torchpack: 已安装
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**成功的启动方式**:
|
|||
|
|
```bash
|
|||
|
|
export PATH=/opt/conda/bin:$PATH
|
|||
|
|
|
|||
|
|
/opt/conda/bin/torchpack dist-run -np 6 /opt/conda/bin/python tools/train.py \
|
|||
|
|
configs/.../config.yaml \
|
|||
|
|
--model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \
|
|||
|
|
--load_from checkpoint.pth \
|
|||
|
|
--data.samples_per_gpu 2 \
|
|||
|
|
--data.workers_per_gpu 0 \
|
|||
|
|
2>&1 | tee training.log
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 当前环境问题 (Phase 4A)
|
|||
|
|
|
|||
|
|
**错误**:
|
|||
|
|
```
|
|||
|
|
ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**可能原因**:
|
|||
|
|
1. Docker环境可能需要重启
|
|||
|
|
2. 库路径配置问题
|
|||
|
|
3. Conda环境需要重新激活
|
|||
|
|
|
|||
|
|
**解决建议**: 参考 `PHASE4A_STATUS_AND_ENVIRONMENT.md`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📁 重要文件索引
|
|||
|
|
|
|||
|
|
### 配置文件
|
|||
|
|
```
|
|||
|
|
configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/
|
|||
|
|
├── multitask_BEV2X_phase4a.yaml # Phase 4A配置 ⭐
|
|||
|
|
├── multitask_enhanced_phase1_HIGHRES.yaml # Phase 3配置
|
|||
|
|
└── convfuser.yaml # 基础配置
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 启动脚本
|
|||
|
|
```
|
|||
|
|
start_phase4a_bev2x_fixed.sh # Phase 4A启动 ⭐
|
|||
|
|
start_enhanced_training_fixed.sh # Phase 3启动 (参考)
|
|||
|
|
monitor_phase4a.sh # 监控脚本
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Checkpoint
|
|||
|
|
```
|
|||
|
|
runs/enhanced_from_epoch19/
|
|||
|
|
├── epoch_23.pth # Phase 3最终版 ⭐ (516MB)
|
|||
|
|
├── epoch_22.pth # Phase 3最佳NDS (516MB)
|
|||
|
|
└── epoch_21.pth # Phase 3最佳mAP (516MB)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 文档
|
|||
|
|
```
|
|||
|
|
PHASE4A_STATUS_AND_ENVIRONMENT.md # Phase 4A详细文档 ⭐
|
|||
|
|
PROJECT_STATUS_UPDATE_20251030.md # 本文档
|
|||
|
|
PROGRESSIVE_ENHANCEMENT_PLAN.md # 渐进增强计划
|
|||
|
|
PROJECT_MASTER_PLAN.md # 总体规划
|
|||
|
|
BEVFusion完整项目路线图.md # 完整路线图
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 性能演进历史
|
|||
|
|
|
|||
|
|
| 阶段 | NDS | mAP | mIoU | Stop Line IoU | Divider IoU | 备注 |
|
|||
|
|
|------|-----|-----|------|---------------|-------------|------|
|
|||
|
|
| **Epoch 1** | 0.6597 | 0.6597 | 0.38 | 0.20 | 0.15 | 初始 |
|
|||
|
|
| **Epoch 10** | 0.6968 | 0.6509 | 0.39 | 0.24 | 0.17 | 稳步提升 |
|
|||
|
|
| **Epoch 19** | 0.6926 | 0.6425 | 0.40 | 0.26 | 0.18 | Phase 3起点 |
|
|||
|
|
| **Epoch 22** | 0.6948 | 0.6447 | 0.41 | 0.27 | 0.19 | 最佳NDS ⭐ |
|
|||
|
|
| **Epoch 23** | 0.6941 | 0.6446 | 0.41 | 0.27 | 0.19 | Phase 3终点 ✅ |
|
|||
|
|
| **Phase 4A目标** | 0.710 | 0.670 | **0.54** | **0.42+** | **0.35+** | BEV 2x 🎯 |
|
|||
|
|
|
|||
|
|
**关键发现**:
|
|||
|
|
- Epoch 10-23性能平台期,小目标IoU无法突破
|
|||
|
|
- **必须通过BEV分辨率提升才能解决**
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 后续工作计划
|
|||
|
|
|
|||
|
|
### 短期 (1-2周)
|
|||
|
|
|
|||
|
|
1. **优先级1**: 解决Phase 4A环境问题
|
|||
|
|
- 检查Docker环境
|
|||
|
|
- 验证库路径
|
|||
|
|
- 成功启动训练
|
|||
|
|
|
|||
|
|
2. **优先级2**: 监控Phase 4A训练
|
|||
|
|
- 每日检查GPU和Loss
|
|||
|
|
- Epoch 5进行初步评估
|
|||
|
|
- 确认小目标IoU改善
|
|||
|
|
|
|||
|
|
### 中期 (3-4周)
|
|||
|
|
|
|||
|
|
3. **Phase 4A完成**:
|
|||
|
|
- 完成20 epochs训练
|
|||
|
|
- 全面性能评估
|
|||
|
|
- 选择最佳checkpoint
|
|||
|
|
|
|||
|
|
4. **Phase 4B启动** (可选):
|
|||
|
|
- 模型压缩和量化
|
|||
|
|
- TensorRT优化
|
|||
|
|
- Orin部署准备
|
|||
|
|
|
|||
|
|
### 长期 (1-3个月)
|
|||
|
|
|
|||
|
|
5. **Phase 5**: 实车数据采集和标注
|
|||
|
|
6. **Phase 6**: 实车微调
|
|||
|
|
7. **Phase 7**: 部署优化和上车
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📈 项目里程碑
|
|||
|
|
|
|||
|
|
| 里程碑 | 计划时间 | 实际时间 | 状态 |
|
|||
|
|
|--------|---------|---------|------|
|
|||
|
|
| M1: 基础训练完成 | 2025-10-10 | 2025-10-10 | ✅ |
|
|||
|
|
| M2: 增强训练启动 | 2025-10-21 | 2025-10-21 | ✅ |
|
|||
|
|
| M3: 增强训练完成 | 2025-10-29 | 2025-10-29 | ✅ |
|
|||
|
|
| M4: BEV 2x配置完成 | 2025-10-30 | 2025-10-30 | ✅ |
|
|||
|
|
| M5: BEV 2x训练启动 | 2025-10-30 | **待定** | ⏸️ |
|
|||
|
|
| M6: BEV 2x训练完成 | 2025-11-12 | 待定 | ⏳ |
|
|||
|
|
| M7: 实车数据采集 | 2025-11-15 | 待定 | ⏳ |
|
|||
|
|
| M8: 模型压缩完成 | 2025-11-20 | 待定 | ⏳ |
|
|||
|
|
| M9: 实车微调完成 | 2025-12-20 | 待定 | ⏳ |
|
|||
|
|
| M10: 部署上车 | 2025-12-31 | 待定 | ⏳ |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💡 经验总结
|
|||
|
|
|
|||
|
|
### Phase 3成功经验
|
|||
|
|
|
|||
|
|
✅ **做得好的**:
|
|||
|
|
1. 训练稳定性优秀,23 epochs无中断
|
|||
|
|
2. EnhancedBEVSegmentationHead有效提升性能
|
|||
|
|
3. 6 GPU配置平衡了速度和稳定性
|
|||
|
|
4. workers_per_gpu=0避免了数据加载问题
|
|||
|
|
|
|||
|
|
⚠️ **需改进的**:
|
|||
|
|
1. 小目标分割受分辨率限制
|
|||
|
|
2. 应更早发现BEV分辨率瓶颈
|
|||
|
|
3. 可以更激进地尝试BEV 2x
|
|||
|
|
|
|||
|
|
### Phase 4A待验证
|
|||
|
|
|
|||
|
|
🎯 **关键问题**:
|
|||
|
|
1. BEV 2x显存是否够用? (预计28GB vs 32GB可用)
|
|||
|
|
2. 训练速度是否可接受? (预计4.5-5s/iter vs 当前2.7s/iter)
|
|||
|
|
3. 性能提升是否达预期? (Stop Line IoU目标>0.42)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔗 相关资源
|
|||
|
|
|
|||
|
|
### 论文参考
|
|||
|
|
- BEVFusion (Liu et al., 2022): NDS 0.714, mAP 0.704
|
|||
|
|
- BEVFormer (Li et al., 2022): 高分辨率BEV表示
|
|||
|
|
|
|||
|
|
### 代码仓库
|
|||
|
|
- 本项目: `/workspace/bevfusion`
|
|||
|
|
- 主分支: enhanced_training
|
|||
|
|
- 配置: `configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/`
|
|||
|
|
|
|||
|
|
### 数据集
|
|||
|
|
- nuScenes v1.0-trainval
|
|||
|
|
- 位置: `/workspace/bevfusion/data/nuscenes`
|
|||
|
|
- 大小: ~500GB
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**项目负责人**: [您的名字]
|
|||
|
|
**最后更新**: 2025-10-30
|
|||
|
|
**下次更新**: Phase 4A训练启动后
|
|||
|
|
|
|||
|
|
**状态**: 🟡 Phase 4A配置完成,等待环境恢复启动训练
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|