# BEVFusion训练快速参考卡 **更新**: 2025-10-30 **用途**: 后续训练的快速参考手册 --- ## 🚨 Docker重启后必做 ```bash cd /workspace/bevfusion export PATH=/opt/conda/bin:$PATH # 1. 创建必要的符号链接 (关键!) cd /opt/conda/lib/python3.8/site-packages/torch/lib ln -sf libtorch_cuda.so libtorch_cuda_cu.so ln -sf libtorch_cuda.so libtorch_cuda_cpp.so ln -sf libtorch_cpu.so libtorch_cpu_cpp.so # 2. 验证环境 cd /workspace/bevfusion python -c "import torch; from mmcv.ops import nms_match; print('✅ 环境OK')" # 3. 查看训练状态 bash monitor_phase4a_stage1.sh # 如果Stage 1在运行 ``` --- ## ⚡ 快速启动训练 ```bash cd /workspace/bevfusion # Stage 1 (600×600) - 当前推荐 bash START_PHASE4A_STAGE1.sh # 监控 bash monitor_phase4a_stage1.sh tail -f phase4a_stage1_*.log | grep "Epoch \[" ``` --- ## 🔧 常见问题快速修复 ### mmcv无法加载 ```bash cd /opt/conda/lib/python3.8/site-packages/torch/lib ln -sf libtorch_cuda.so libtorch_cuda_cu.so ln -sf libtorch_cuda.so libtorch_cuda_cpp.so ln -sf libtorch_cpu.so libtorch_cpu_cpp.so ``` ### 显存不足 ```bash # 减少GPU数量或降低分辨率 # 600×600: 4 GPU可行 # 800×800: 3 GPU + gradient checkpointing ``` ### 代码修改不生效 ```bash find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null ``` ### 训练卡住 ```bash pkill -9 -f "torchpack\|mpirun" nvidia-smi # 检查GPU bash START_SCRIPT.sh # 重新启动 ``` --- ## 📊 性能baseline ``` Phase 3 (epoch_23): NDS: 0.6941 mAP: 0.6446 mIoU: 0.41 Stop Line: 0.27 Divider: 0.19 Stage 1目标 (10 epochs): Stop Line: 0.35+ Divider: 0.28+ mIoU: 0.48+ ``` --- ## 📂 关键文件位置 ``` Checkpoint: Phase 3: runs/enhanced_from_epoch19/epoch_23.pth Stage 1: runs/run-326653dc-c038af2c/epoch_*.pth 配置: Phase 3: configs/.../multitask_enhanced_phase1_HIGHRES.yaml Stage 1: configs/.../multitask_BEV2X_phase4a_stage1.yaml 启动脚本: Stage 1: START_PHASE4A_STAGE1.sh 监控: monitor_phase4a_stage1.sh 代码: 分割头: mmdet3d/models/heads/segm/enhanced.py ``` --- ## 🎯 训练配置速查 | 配置 | Phase 3 | Stage 1 | Stage 2计划 | |------|---------|---------|-------------| | BEV分辨率 | 0.3m (360×360) | 0.2m (540×540) | 0.15m (720×720) | | GT分辨率 | 0.25m (400×400) | 0.167m (600×600) | 0.125m (800×800) | | Decoder | 2层 | 4层 | 4层 | | Deep Sup | ❌ | ✅ | ✅ | | Dice Loss | ❌ | ✅ | ✅ | | GPU | 8张 | 4张 | 3-4张 | | 显存/GPU | ~8GB | ~30GB | ~32GB | --- **完整文档**: `项目进展与问题解决总结_20251030.md`