bev-project/project/docs/8卡训练快速参考.md

79 lines
1.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 4A Stage 1 - 8卡训练快速参考
**版本**: v1.0
**更新**: 2025-11-01
---
## 🚀 快速启动
### 启动训练
```bash
cd /workspace/bevfusion
nohup bash START_FROM_EPOCH1.sh > /tmp/train_8gpu.log 2>&1 &
```
### 监控状态
```bash
# GPU状态
nvidia-smi
# 训练进度
tail -f $(ls -t phase4a_stage1_new_*.log | head -1) | grep "Epoch"
# 磁盘空间
df -h | grep -E "/workspace|/data"
```
---
## 📊 关键配置
```yaml
GPU: 8×Tesla V100S-32GB
Batch: 1/GPU (总batch=8)
分辨率: 600×600 GT
Epochs: 10
预计耗时: 9.5天
输出: /data/runs/phase4a_stage1/
```
---
## 🎯 预期性能
**Epoch 1 (11/2完成)**
- Stop Line: 0.27 → 0.30+
- Divider: 0.19 → 0.22+
- mIoU: 0.41 → 0.43+
**Epoch 10 (11/10完成)**
- Stop Line: 0.27 → 0.35+
- Divider: 0.19 → 0.28+
- mIoU: 0.41 → 0.48+
---
## 🔧 常用命令
```bash
# 停止训练
pkill -f "train.py"
# 清理评估文件
bash cleanup_eval_hook.sh
# 查看checkpoints
ls -lh /data/runs/phase4a_stage1/
# 查看进程
ps aux | grep "train.py" | grep -v grep
```
---
## 📝 完整文档
详见: `project/docs/Phase4A_Stage1_8GPU配置_20251101.md`