171 lines
3.1 KiB
Markdown
171 lines
3.1 KiB
Markdown
|
|
# ✅ Task-specific GCA - 最终启动指令
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 核心策略
|
|||
|
|
|
|||
|
|
### Checkpoint部分加载 ✅
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
从epoch_5.pth加载已有权重:
|
|||
|
|
✅ 骨干网络 (Swin Transformer + Sparse Encoder)
|
|||
|
|
✅ 特征融合 (ConvFuser)
|
|||
|
|
✅ BEV解码器 (SECOND + SECONDFPN)
|
|||
|
|
✅ 检测头 (TransFusion)
|
|||
|
|
✅ 分割头 (EnhancedBEVSeg)
|
|||
|
|
|
|||
|
|
总计: ~132M参数保留训练成果
|
|||
|
|
|
|||
|
|
新增模块随机初始化:
|
|||
|
|
✨ task_gca['object'] (检测GCA,131K参数)
|
|||
|
|
✨ task_gca['map'] (分割GCA,131K参数)
|
|||
|
|
|
|||
|
|
总计: ~0.26M参数从头训练
|
|||
|
|
|
|||
|
|
策略: 使用 --load_from (部分加载,忽略不匹配)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 启动命令
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 在Docker容器内执行
|
|||
|
|
docker exec -it bevfusion bash
|
|||
|
|
cd /workspace/bevfusion
|
|||
|
|
bash START_PHASE4A_TASK_GCA.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
输入 `y` 确认启动
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ✅ 已修复的问题
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
1. ✅ torchpack: command not found
|
|||
|
|
解决: 添加环境变量设置
|
|||
|
|
|
|||
|
|
2. ✅ pretrained/swint-nuimages-pretrained.pth 找不到
|
|||
|
|
解决: 从checkpoint恢复无需预训练模型
|
|||
|
|
|
|||
|
|
3. ✅ 模型结构不匹配
|
|||
|
|
解决: 使用--load_from部分加载
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 启动后验证
|
|||
|
|
|
|||
|
|
### 1. 检查Task-specific GCA启用
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
tail -f /data/runs/phase4a_stage1_task_gca/*.log | grep "Task-specific"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
应该看到:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[BEVFusion] ✨✨ Task-specific GCA mode enabled ✨✨
|
|||
|
|
[object] GCA: params: 131,072
|
|||
|
|
[map] GCA: params: 131,072
|
|||
|
|
Total: 262,144
|
|||
|
|
Advantage: Each task selects features by its own needs ✅
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 检查权重加载
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
tail -f /data/runs/phase4a_stage1_task_gca/*.log | grep "load checkpoint"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
应该看到:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
load checkpoint from /workspace/bevfusion/runs/.../epoch_5.pth
|
|||
|
|
|
|||
|
|
The following keys in model are not found in checkpoint:
|
|||
|
|
task_gca.object.*
|
|||
|
|
task_gca.map.*
|
|||
|
|
|
|||
|
|
✅ 这是正常的!新增模块会随机初始化
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 监控训练loss
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
tail -f /data/runs/phase4a_stage1_task_gca/*.log | grep "loss/map/divider"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📈 预期性能
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Epoch 5-10 (前期,task_gca学习期):
|
|||
|
|
Divider Dice Loss: 0.525 → 0.50
|
|||
|
|
检测mAP: 保持或略降 (task_gca适应期)
|
|||
|
|
|
|||
|
|
Epoch 10-15 (中期,性能提升期):
|
|||
|
|
Divider Dice Loss: 0.50 → 0.45
|
|||
|
|
检测mAP: 开始提升
|
|||
|
|
|
|||
|
|
Epoch 15-20 (后期,最优性能):
|
|||
|
|
Divider Dice Loss: 0.45 → 0.42 ✅
|
|||
|
|
检测mAP: 0.68 → 0.70 ✅
|
|||
|
|
分割mIoU: 0.55 → 0.61 ✅
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 训练参数
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
起始: Epoch 0 (从头开始,让task_gca充分学习)
|
|||
|
|
目标: Epoch 20
|
|||
|
|
剩余: 20 epochs
|
|||
|
|
预计时间: ~9天 (FP32)
|
|||
|
|
预计完成: 2025-11-15
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📁 输出位置
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Checkpoints:
|
|||
|
|
/data/runs/phase4a_stage1_task_gca/epoch_*.pth
|
|||
|
|
|
|||
|
|
日志:
|
|||
|
|
/data/runs/phase4a_stage1_task_gca/*.log
|
|||
|
|
|
|||
|
|
监控:
|
|||
|
|
tail -f /data/runs/phase4a_stage1_task_gca/*.log
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📝 相关文档
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
1. CHECKPOINT_LOADING_STRATEGY.md
|
|||
|
|
- 详细的部分加载策略说明
|
|||
|
|
- 权重匹配原理
|
|||
|
|
- 参数统计
|
|||
|
|
|
|||
|
|
2. TASK_GCA_完成报告.md
|
|||
|
|
- 完整实施报告
|
|||
|
|
- 架构说明
|
|||
|
|
|
|||
|
|
3. 启动训练_完整步骤.md
|
|||
|
|
- 详细启动流程
|
|||
|
|
- 故障排查
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**🎉 所有问题已解决!部分加载策略完美处理模型结构改变!**
|
|||
|
|
|
|||
|
|
**立即启动**: `bash START_PHASE4A_TASK_GCA.sh`
|
|||
|
|
|