135 lines
2.6 KiB
Markdown
135 lines
2.6 KiB
Markdown
# BEVFusion训练快速参考卡
|
||
|
||
**更新**: 2025-10-30
|
||
**用途**: 后续训练的快速参考手册
|
||
|
||
---
|
||
|
||
## 🚨 Docker重启后必做
|
||
|
||
```bash
|
||
cd /workspace/bevfusion
|
||
export PATH=/opt/conda/bin:$PATH
|
||
|
||
# 1. 创建必要的符号链接 (关键!)
|
||
cd /opt/conda/lib/python3.8/site-packages/torch/lib
|
||
ln -sf libtorch_cuda.so libtorch_cuda_cu.so
|
||
ln -sf libtorch_cuda.so libtorch_cuda_cpp.so
|
||
ln -sf libtorch_cpu.so libtorch_cpu_cpp.so
|
||
|
||
# 2. 验证环境
|
||
cd /workspace/bevfusion
|
||
python -c "import torch; from mmcv.ops import nms_match; print('✅ 环境OK')"
|
||
|
||
# 3. 查看训练状态
|
||
bash monitor_phase4a_stage1.sh # 如果Stage 1在运行
|
||
```
|
||
|
||
---
|
||
|
||
## ⚡ 快速启动训练
|
||
|
||
```bash
|
||
cd /workspace/bevfusion
|
||
|
||
# Stage 1 (600×600) - 当前推荐
|
||
bash START_PHASE4A_STAGE1.sh
|
||
|
||
# 监控
|
||
bash monitor_phase4a_stage1.sh
|
||
tail -f phase4a_stage1_*.log | grep "Epoch \["
|
||
```
|
||
|
||
---
|
||
|
||
## 🔧 常见问题快速修复
|
||
|
||
### mmcv无法加载
|
||
```bash
|
||
cd /opt/conda/lib/python3.8/site-packages/torch/lib
|
||
ln -sf libtorch_cuda.so libtorch_cuda_cu.so
|
||
ln -sf libtorch_cuda.so libtorch_cuda_cpp.so
|
||
ln -sf libtorch_cpu.so libtorch_cpu_cpp.so
|
||
```
|
||
|
||
### 显存不足
|
||
```bash
|
||
# 减少GPU数量或降低分辨率
|
||
# 600×600: 4 GPU可行
|
||
# 800×800: 3 GPU + gradient checkpointing
|
||
```
|
||
|
||
### 代码修改不生效
|
||
```bash
|
||
find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
|
||
```
|
||
|
||
### 训练卡住
|
||
```bash
|
||
pkill -9 -f "torchpack\|mpirun"
|
||
nvidia-smi # 检查GPU
|
||
bash START_SCRIPT.sh # 重新启动
|
||
```
|
||
|
||
---
|
||
|
||
## 📊 性能baseline
|
||
|
||
```
|
||
Phase 3 (epoch_23):
|
||
NDS: 0.6941
|
||
mAP: 0.6446
|
||
mIoU: 0.41
|
||
Stop Line: 0.27
|
||
Divider: 0.19
|
||
|
||
Stage 1目标 (10 epochs):
|
||
Stop Line: 0.35+
|
||
Divider: 0.28+
|
||
mIoU: 0.48+
|
||
```
|
||
|
||
---
|
||
|
||
## 📂 关键文件位置
|
||
|
||
```
|
||
Checkpoint:
|
||
Phase 3: runs/enhanced_from_epoch19/epoch_23.pth
|
||
Stage 1: runs/run-326653dc-c038af2c/epoch_*.pth
|
||
|
||
配置:
|
||
Phase 3: configs/.../multitask_enhanced_phase1_HIGHRES.yaml
|
||
Stage 1: configs/.../multitask_BEV2X_phase4a_stage1.yaml
|
||
|
||
启动脚本:
|
||
Stage 1: START_PHASE4A_STAGE1.sh
|
||
|
||
监控:
|
||
monitor_phase4a_stage1.sh
|
||
|
||
代码:
|
||
分割头: mmdet3d/models/heads/segm/enhanced.py
|
||
```
|
||
|
||
---
|
||
|
||
## 🎯 训练配置速查
|
||
|
||
| 配置 | Phase 3 | Stage 1 | Stage 2计划 |
|
||
|------|---------|---------|-------------|
|
||
| BEV分辨率 | 0.3m (360×360) | 0.2m (540×540) | 0.15m (720×720) |
|
||
| GT分辨率 | 0.25m (400×400) | 0.167m (600×600) | 0.125m (800×800) |
|
||
| Decoder | 2层 | 4层 | 4层 |
|
||
| Deep Sup | ❌ | ✅ | ✅ |
|
||
| Dice Loss | ❌ | ✅ | ✅ |
|
||
| GPU | 8张 | 4张 | 3-4张 |
|
||
| 显存/GPU | ~8GB | ~30GB | ~32GB |
|
||
|
||
---
|
||
|
||
**完整文档**: `项目进展与问题解决总结_20251030.md`
|
||
|
||
|
||
|