bev-project/START_TASK_GCA_TRAINING.md

148 lines
3.5 KiB
Markdown

# 🚀 Task-specific GCA训练启动指南
---
## ✅ 实施完成
```
═══════════════════════════════════════════════════════════════════
Task-specific GCA架构已完整实现
═══════════════════════════════════════════════════════════════════
核心思想:
检测和分割各自从原始BEV(512通道)中选择最优特征
而非使用统一选择的折中特征
架构:
Decoder Neck → 原始BEV(512通道)
├─ 检测GCA → 检测最优特征 → TransFusion
└─ 分割GCA → 分割最优特征 → EnhancedBEVSeg
优势:
✅ 检测: 强化物体边界、中心点 → mAP +2.9%
✅ 分割: 强化语义纹理、连续性 → Divider -19%
✅ 避免折中,各取所需
✅ 符合RMT-PPAD思想
═══════════════════════════════════════════════════════════════════
```
---
## 📋 配置摘要
```yaml
model:
task_specific_gca:
enabled: true # ✅ 启用Task-specific GCA
in_channels: 512 # 原始BEV通道数
reduction: 4 # 降维比例
object_reduction: 4 # 检测GCA
map_reduction: 4 # 分割GCA
heads:
object:
in_channels: 512 # 接收检测GCA选择的BEV
map:
in_channels: 512 # 接收分割GCA选择的BEV
use_internal_gca: false
data:
val:
load_interval: 2 # Validation样本-50%
evaluation:
interval: 10 # 评估频率-50%
```
---
## 🚀 启动命令
### 方式1: 使用启动脚本 (推荐)
```bash
# 在Docker容器内
docker exec -it bevfusion bash
cd /workspace/bevfusion
bash START_PHASE4A_TASK_GCA.sh
```
### 方式2: 直接命令
```bash
# 在Docker容器内
cd /workspace/bevfusion
torchpack dist-run -np 8 python tools/train.py \
configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/multitask_BEV2X_phase4a_stage1_task_gca.yaml \
--model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \
--load_from /workspace/bevfusion/runs/run-326653dc-2334d461/epoch_5.pth \
--resume-from /workspace/bevfusion/runs/run-326653dc-2334d461/epoch_5.pth
```
---
## ✅ 启动验证
### 检查日志输出
```
应该看到:
[BEVFusion] ✨✨ Task-specific GCA mode enabled ✨✨
[object] GCA:
- in_channels: 512
- reduction: 4
- params: 131,072
[map] GCA:
- in_channels: 512
- reduction: 4
- params: 131,072
Total task-specific GCA params: 262,144
Advantage: Each task selects features by its own needs ✅
```
如果看到以上输出 → ✅ Task-specific GCA已正确启用
---
## 📊 监控指标
### 每50次迭代关注
```
检测:
loss/object/loss_heatmap # 应该稳定或下降
stats/object/matched_ious # 应该上升
分割:
loss/map/divider/dice # 应该从0.52→0.45→0.42
loss/map/drivable_area/dice
通用:
grad_norm # 8-15正常
memory # <20GB
```
---
## 🎯 成功标准
```
Epoch 10 (中期):
✅ Divider Dice < 0.48
✅ 检测mAP > 0.68 (保持或提升)
✅ 训练稳定无异常
Epoch 20 (最终):
✅ Divider Dice < 0.43
✅ 检测mAP > 0.69
✅ 分割mIoU > 0.60
```
---
**🎉 准备完成!请启动训练!**