bev-project/START_TASK_GCA_TRAINING.md

3.5 KiB

🚀 Task-specific GCA训练启动指南


实施完成

═══════════════════════════════════════════════════════════════════
         Task-specific GCA架构已完整实现
═══════════════════════════════════════════════════════════════════

核心思想: 
  检测和分割各自从原始BEV(512通道)中选择最优特征
  而非使用统一选择的折中特征

架构:
  Decoder Neck → 原始BEV(512通道)
    ├─ 检测GCA → 检测最优特征 → TransFusion
    └─ 分割GCA → 分割最优特征 → EnhancedBEVSeg

优势:
  ✅ 检测: 强化物体边界、中心点 → mAP +2.9%
  ✅ 分割: 强化语义纹理、连续性 → Divider -19%
  ✅ 避免折中,各取所需
  ✅ 符合RMT-PPAD思想

═══════════════════════════════════════════════════════════════════

📋 配置摘要

model:
  task_specific_gca:
    enabled: true        # ✅ 启用Task-specific GCA
    in_channels: 512     # 原始BEV通道数
    reduction: 4         # 降维比例
    object_reduction: 4  # 检测GCA
    map_reduction: 4     # 分割GCA

  heads:
    object:
      in_channels: 512   # 接收检测GCA选择的BEV
    map:
      in_channels: 512   # 接收分割GCA选择的BEV
      use_internal_gca: false

data:
  val:
    load_interval: 2     # Validation样本-50%

evaluation:
  interval: 10           # 评估频率-50%

🚀 启动命令

方式1: 使用启动脚本 (推荐)

# 在Docker容器内
docker exec -it bevfusion bash
cd /workspace/bevfusion
bash START_PHASE4A_TASK_GCA.sh

方式2: 直接命令

# 在Docker容器内
cd /workspace/bevfusion

torchpack dist-run -np 8 python tools/train.py \
    configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/multitask_BEV2X_phase4a_stage1_task_gca.yaml \
    --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \
    --load_from /workspace/bevfusion/runs/run-326653dc-2334d461/epoch_5.pth \
    --resume-from /workspace/bevfusion/runs/run-326653dc-2334d461/epoch_5.pth

启动验证

检查日志输出

应该看到:

[BEVFusion] ✨✨ Task-specific GCA mode enabled ✨✨
  [object] GCA:
    - in_channels: 512
    - reduction: 4
    - params: 131,072
  [map] GCA:
    - in_channels: 512
    - reduction: 4  
    - params: 131,072
  Total task-specific GCA params: 262,144
  Advantage: Each task selects features by its own needs ✅

如果看到以上输出 → Task-specific GCA已正确启用


📊 监控指标

每50次迭代关注

检测:
  loss/object/loss_heatmap  # 应该稳定或下降
  stats/object/matched_ious # 应该上升

分割:
  loss/map/divider/dice     # 应该从0.52→0.45→0.42
  loss/map/drivable_area/dice

通用:
  grad_norm                 # 8-15正常
  memory                    # <20GB

🎯 成功标准

Epoch 10 (中期):
  ✅ Divider Dice < 0.48
  ✅ 检测mAP > 0.68 (保持或提升)
  ✅ 训练稳定无异常

Epoch 20 (最终):
  ✅ Divider Dice < 0.43
  ✅ 检测mAP > 0.69
  ✅ 分割mIoU > 0.60

🎉 准备完成!请启动训练!