bev-project/project/docs/QUICK_REFERENCE_CARD.md

135 lines
2.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# BEVFusion训练快速参考卡
**更新**: 2025-10-30
**用途**: 后续训练的快速参考手册
---
## 🚨 Docker重启后必做
```bash
cd /workspace/bevfusion
export PATH=/opt/conda/bin:$PATH
# 1. 创建必要的符号链接 (关键!)
cd /opt/conda/lib/python3.8/site-packages/torch/lib
ln -sf libtorch_cuda.so libtorch_cuda_cu.so
ln -sf libtorch_cuda.so libtorch_cuda_cpp.so
ln -sf libtorch_cpu.so libtorch_cpu_cpp.so
# 2. 验证环境
cd /workspace/bevfusion
python -c "import torch; from mmcv.ops import nms_match; print('✅ 环境OK')"
# 3. 查看训练状态
bash monitor_phase4a_stage1.sh # 如果Stage 1在运行
```
---
## ⚡ 快速启动训练
```bash
cd /workspace/bevfusion
# Stage 1 (600×600) - 当前推荐
bash START_PHASE4A_STAGE1.sh
# 监控
bash monitor_phase4a_stage1.sh
tail -f phase4a_stage1_*.log | grep "Epoch \["
```
---
## 🔧 常见问题快速修复
### mmcv无法加载
```bash
cd /opt/conda/lib/python3.8/site-packages/torch/lib
ln -sf libtorch_cuda.so libtorch_cuda_cu.so
ln -sf libtorch_cuda.so libtorch_cuda_cpp.so
ln -sf libtorch_cpu.so libtorch_cpu_cpp.so
```
### 显存不足
```bash
# 减少GPU数量或降低分辨率
# 600×600: 4 GPU可行
# 800×800: 3 GPU + gradient checkpointing
```
### 代码修改不生效
```bash
find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
```
### 训练卡住
```bash
pkill -9 -f "torchpack\|mpirun"
nvidia-smi # 检查GPU
bash START_SCRIPT.sh # 重新启动
```
---
## 📊 性能baseline
```
Phase 3 (epoch_23):
NDS: 0.6941
mAP: 0.6446
mIoU: 0.41
Stop Line: 0.27
Divider: 0.19
Stage 1目标 (10 epochs):
Stop Line: 0.35+
Divider: 0.28+
mIoU: 0.48+
```
---
## 📂 关键文件位置
```
Checkpoint:
Phase 3: runs/enhanced_from_epoch19/epoch_23.pth
Stage 1: runs/run-326653dc-c038af2c/epoch_*.pth
配置:
Phase 3: configs/.../multitask_enhanced_phase1_HIGHRES.yaml
Stage 1: configs/.../multitask_BEV2X_phase4a_stage1.yaml
启动脚本:
Stage 1: START_PHASE4A_STAGE1.sh
监控:
monitor_phase4a_stage1.sh
代码:
分割头: mmdet3d/models/heads/segm/enhanced.py
```
---
## 🎯 训练配置速查
| 配置 | Phase 3 | Stage 1 | Stage 2计划 |
|------|---------|---------|-------------|
| BEV分辨率 | 0.3m (360×360) | 0.2m (540×540) | 0.15m (720×720) |
| GT分辨率 | 0.25m (400×400) | 0.167m (600×600) | 0.125m (800×800) |
| Decoder | 2层 | 4层 | 4层 |
| Deep Sup | ❌ | ✅ | ✅ |
| Dice Loss | ❌ | ✅ | ✅ |
| GPU | 8张 | 4张 | 3-4张 |
| 显存/GPU | ~8GB | ~30GB | ~32GB |
---
**完整文档**: `项目进展与问题解决总结_20251030.md`