292 lines
6.9 KiB
Markdown
292 lines
6.9 KiB
Markdown
|
|
# BEVFusion项目完整状态报告
|
|||
|
|
|
|||
|
|
**生成时间**: 2025-10-30 12:06
|
|||
|
|
**报告类型**: 项目进展总结 + Phase 4A启动困难分析
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 项目整体进展
|
|||
|
|
|
|||
|
|
### ✅ 已完成阶段
|
|||
|
|
|
|||
|
|
#### Phase 1-2: 基础训练
|
|||
|
|
- **时间**: Epoch 1-19
|
|||
|
|
- **配置**: 基础多任务模型 (3D检测 + BEV分割)
|
|||
|
|
- **Checkpoint**: epoch_19.pth
|
|||
|
|
|
|||
|
|
#### Phase 3: 增强版分割头 ✅
|
|||
|
|
- **时间**: 2025-10-21 至 10-29 (Epoch 20-23)
|
|||
|
|
- **关键改进**:
|
|||
|
|
- EnhancedBEVSegmentationHead
|
|||
|
|
- ASPP多尺度特征
|
|||
|
|
- Channel + Spatial Attention
|
|||
|
|
- GroupNorm替代BatchNorm (修复分布式死锁)
|
|||
|
|
|
|||
|
|
- **最终性能** (epoch_23.pth):
|
|||
|
|
```
|
|||
|
|
3D检测:
|
|||
|
|
NDS: 0.6941 (+1.3% vs baseline)
|
|||
|
|
mAP: 0.6446 (+0.9% vs baseline)
|
|||
|
|
|
|||
|
|
BEV分割 @ 0.3m分辨率:
|
|||
|
|
整体mIoU: 0.41
|
|||
|
|
Drivable Area: 0.83 ✅ 优秀
|
|||
|
|
Ped. Crossing: 0.57 ✅ 良好
|
|||
|
|
Walkway: 0.49 ✅ 良好
|
|||
|
|
Stop Line: 0.27 ⚠️ 需要提升
|
|||
|
|
Carpark Area: 0.36 ⚠️ 需要提升
|
|||
|
|
Divider: 0.19 ⚠️ 需要提升
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Phase 3成果**:
|
|||
|
|
- ✅ 3D检测性能保持领先 (NDS 0.6941)
|
|||
|
|
- ✅ 大类别(可行驶区域)表现优秀
|
|||
|
|
- ✅ 分布式训练稳定
|
|||
|
|
- ⚠️ 细线类(停止线、分隔线)需要更高分辨率
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 Phase 4A: BEV 2x分辨率提升
|
|||
|
|
|
|||
|
|
### 目标
|
|||
|
|
|
|||
|
|
通过提高分辨率和深化decoder来提升细线类IoU:
|
|||
|
|
- Stop Line IoU: 0.27 → 0.42+ (+55%)
|
|||
|
|
- Divider IoU: 0.19 → 0.35+ (+84%)
|
|||
|
|
- 整体mIoU: 0.41 → 0.54+ (+32%)
|
|||
|
|
|
|||
|
|
### 技术方案
|
|||
|
|
|
|||
|
|
#### 1. BEV分辨率提升 (2倍)
|
|||
|
|
```yaml
|
|||
|
|
Phase 3: 0.3m → 360×360 @ 0.6m + padding
|
|||
|
|
Phase 4A: 0.15m → 720×720 (2x分辨率)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 2. GT标签分辨率提升 (2倍)
|
|||
|
|
```yaml
|
|||
|
|
Phase 3: 0.25m → 400×400
|
|||
|
|
Phase 4A: 0.125m → 800×800 (2x分辨率)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 3. Decoder深度提升 (2倍)
|
|||
|
|
```yaml
|
|||
|
|
Phase 3: [256, 128] (2层)
|
|||
|
|
Phase 4A: [256, 256, 128, 128] (4层)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 4. 其他特性
|
|||
|
|
- Deep Supervision: ✅ 启用
|
|||
|
|
- Dice Loss: ✅ 启用 (weight 0.5)
|
|||
|
|
- Class-specific weighting: ✅ 启用
|
|||
|
|
|
|||
|
|
### 配置文件
|
|||
|
|
|
|||
|
|
- ✅ `configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/multitask_BEV2X_phase4a.yaml`
|
|||
|
|
- ✅ `START_PHASE4A_FIXED.sh`
|
|||
|
|
- ✅ Checkpoint: `runs/enhanced_from_epoch19/epoch_23.pth` (516MB)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ⚠️ 当前问题: 显存不足 (CUDA OOM)
|
|||
|
|
|
|||
|
|
### 问题描述
|
|||
|
|
|
|||
|
|
在尝试启动Phase 4A训练时,遇到显存不足错误:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
RuntimeError: CUDA out of memory.
|
|||
|
|
Tried to allocate 626.00 MiB
|
|||
|
|
(GPU 0; 31.73 GiB total capacity;
|
|||
|
|
18.04 GiB already allocated;
|
|||
|
|
616.25 MiB free)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 原因分析
|
|||
|
|
|
|||
|
|
| 配置 | Phase 3 (400×400) | Phase 4A (800×800) | 增长倍数 |
|
|||
|
|
|------|-------------------|---------------------|----------|
|
|||
|
|
| BEV features | 512×400×400 = 81.92 MB | 512×800×800 = 327.68 MB | 4x |
|
|||
|
|
| Decoder中间层 | ~300 MB | ~1.2 GB | 4x |
|
|||
|
|
| 梯度 + Optimizer | ~600 MB | ~2.4 GB | 4x |
|
|||
|
|
| **单样本总计** | **~1 GB** | **~4 GB** | **4x** |
|
|||
|
|
|
|||
|
|
### 测试结果
|
|||
|
|
|
|||
|
|
| GPU配置 | Batch Size | 预计显存 | 结果 |
|
|||
|
|
|---------|------------|---------|------|
|
|||
|
|
| 6 GPU | 1/GPU (total 6) | ~6 GB/GPU | ❌ OOM |
|
|||
|
|
| 4 GPU | 1/GPU (total 4) | ~8 GB/GPU | ❌ OOM |
|
|||
|
|
|
|||
|
|
**结论**: 800×800分辨率对于32GB GPU来说显存需求过高
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔧 解决方案建议
|
|||
|
|
|
|||
|
|
### 方案1: 渐进式分辨率训练 (推荐)
|
|||
|
|
|
|||
|
|
**阶段性提升分辨率,避免一次性跳跃过大**
|
|||
|
|
|
|||
|
|
#### Stage 1: 600×600分辨率
|
|||
|
|
```yaml
|
|||
|
|
xbound/ybound: [-54.0, 54.0, 0.2] → 540×540
|
|||
|
|
GT: [-50.0, 50.0, 0.167] → 600×600
|
|||
|
|
预计显存: ~2.25 GB/sample (可行)
|
|||
|
|
训练: 10 epochs
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Stage 2: 800×800分辨率 (Fine-tune)
|
|||
|
|
```yaml
|
|||
|
|
从Stage 1的checkpoint继续
|
|||
|
|
训练: 10 epochs
|
|||
|
|
可能需要3张GPU或使用gradient checkpointing
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**优势**:
|
|||
|
|
- ✅ 可立即开始训练
|
|||
|
|
- ✅ 渐进式收敛更稳定
|
|||
|
|
- ✅ 中间checkpoint可用
|
|||
|
|
|
|||
|
|
**预计时间**:
|
|||
|
|
- Stage 1: ~18 hours/epoch × 10 = 180 hours ≈ 7.5天
|
|||
|
|
- Stage 2: ~25 hours/epoch × 10 = 250 hours ≈ 10.5天
|
|||
|
|
- **总计**: ~18天
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 方案2: 简化模型架构
|
|||
|
|
|
|||
|
|
**降低模型复杂度以节省显存**
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
decoder_channels: [256, 128] # 从4层降回2层
|
|||
|
|
或
|
|||
|
|
去掉ASPP模块
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**优势**: 可直接训练800×800
|
|||
|
|
**劣势**: 模型容量下降,可能影响性能
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 方案3: Gradient Checkpointing
|
|||
|
|
|
|||
|
|
**使用PyTorch的gradient checkpointing**
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 在EnhancedBEVSegmentationHead中启用
|
|||
|
|
self.aspp = torch.utils.checkpoint.checkpoint_sequential(...)
|
|||
|
|
self.decoder = torch.utils.checkpoint.checkpoint_sequential(...)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**优势**: 节省~40%显存
|
|||
|
|
**劣势**: 训练速度降低~30%
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 方案4: 降低GT标签分辨率
|
|||
|
|
|
|||
|
|
**保持BEV为800×800,但GT降为600×600**
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
GT: [-50.0, 50.0, 0.167] → 600×600
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**优势**: 仍有分辨率提升
|
|||
|
|
**劣势**: 提升幅度有限
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 推荐行动计划
|
|||
|
|
|
|||
|
|
### 立即行动: 方案1 - 渐进式训练
|
|||
|
|
|
|||
|
|
#### Step 1: 创建Phase 4A-Stage1配置 (600×600)
|
|||
|
|
```bash
|
|||
|
|
# 复制并修改配置文件
|
|||
|
|
cp multitask_BEV2X_phase4a.yaml multitask_BEV2X_phase4a_stage1.yaml
|
|||
|
|
|
|||
|
|
# 修改分辨率设置
|
|||
|
|
xbound/ybound: [-54.0, 54.0, 0.2]
|
|||
|
|
GT: [-50.0, 50.0, 0.167]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Step 2: 启动Stage 1训练 (10 epochs)
|
|||
|
|
```bash
|
|||
|
|
torchpack dist-run -np 4 python tools/train.py \
|
|||
|
|
configs/.../multitask_BEV2X_phase4a_stage1.yaml \
|
|||
|
|
--load_from runs/enhanced_from_epoch19/epoch_23.pth
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Step 3: Stage 2准备 (根据Stage 1结果决定)
|
|||
|
|
- 如果Stage 1效果好: 继续800×800
|
|||
|
|
- 如果显存仍不够: 使用gradient checkpointing
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📂 已完成工作
|
|||
|
|
|
|||
|
|
### 文件创建
|
|||
|
|
- ✅ `multitask_BEV2X_phase4a.yaml` - Phase 4A配置 (800×800)
|
|||
|
|
- ✅ `START_PHASE4A_FIXED.sh` - 启动脚本
|
|||
|
|
- ✅ `monitor_phase4a.sh` - 监控脚本
|
|||
|
|
- ✅ `PHASE4A_STATUS_AND_ENVIRONMENT.md` - 状态文档
|
|||
|
|
- ✅ `PHASE4A_ANALYSIS.md` - 技术分析
|
|||
|
|
- ✅ `PHASE4A_GPU_MEMORY_ISSUE.md` - 显存问题分析
|
|||
|
|
- ✅ `PROJECT_PROGRESS_REPORT_20251030.md` - 进展报告
|
|||
|
|
- ✅ `ENVIRONMENT_FIX_RECORD.md` - 环境修复记录
|
|||
|
|
|
|||
|
|
### 代码修改
|
|||
|
|
- ✅ `mmdet3d/models/heads/segm/enhanced.py` - 添加自适应插值
|
|||
|
|
- ✅ Docker环境修复 - 符号链接修复mmcv加载问题
|
|||
|
|
|
|||
|
|
### 环境状态
|
|||
|
|
- ✅ PyTorch 1.10.1+cu102
|
|||
|
|
- ✅ mmcv-full 1.4.0
|
|||
|
|
- ✅ 8张 Tesla V100S-PCIE-32GB (32GB each)
|
|||
|
|
- ✅ 所有依赖正常
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 下一步行动
|
|||
|
|
|
|||
|
|
### 用户决策点
|
|||
|
|
|
|||
|
|
**请选择Phase 4A的实施方案**:
|
|||
|
|
|
|||
|
|
1. **方案1 (推荐)**: 渐进式训练
|
|||
|
|
- Stage 1: 600×600分辨率,10 epochs (~7.5天)
|
|||
|
|
- Stage 2: 800×800分辨率,10 epochs (~10.5天)
|
|||
|
|
- 总时间: ~18天
|
|||
|
|
|
|||
|
|
2. **方案2**: 简化模型
|
|||
|
|
- 直接训练800×800,使用2层decoder
|
|||
|
|
- 时间: ~15天
|
|||
|
|
|
|||
|
|
3. **方案3**: Gradient Checkpointing
|
|||
|
|
- 直接训练800×800,启用checkpointing
|
|||
|
|
- 时间: ~20天 (慢30%)
|
|||
|
|
|
|||
|
|
4. **方案4**: 中等分辨率
|
|||
|
|
- 训练600×600,20 epochs
|
|||
|
|
- 时间: ~15天
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**当前状态**: 等待用户决策
|
|||
|
|
**所有准备工作**: 已完成
|
|||
|
|
**可立即开始**: 是
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**文档索引**:
|
|||
|
|
- 项目状态: `PROJECT_STATUS_UPDATE_20251030.md`
|
|||
|
|
- Phase 4A技术细节: `PHASE4A_STATUS_AND_ENVIRONMENT.md`
|
|||
|
|
- 环境问题记录: `ENVIRONMENT_FIX_RECORD.md`
|
|||
|
|
- 显存分析: `PHASE4A_GPU_MEMORY_ISSUE.md`
|
|||
|
|
- 总览: `项目状态总览_20251030.md`
|
|||
|
|
|
|||
|
|
|
|||
|
|
|