# Phase 4A Stage 1 - 8卡训练快速参考 **版本**: v1.0 **更新**: 2025-11-01 --- ## 🚀 快速启动 ### 启动训练 ```bash cd /workspace/bevfusion nohup bash START_FROM_EPOCH1.sh > /tmp/train_8gpu.log 2>&1 & ``` ### 监控状态 ```bash # GPU状态 nvidia-smi # 训练进度 tail -f $(ls -t phase4a_stage1_new_*.log | head -1) | grep "Epoch" # 磁盘空间 df -h | grep -E "/workspace|/data" ``` --- ## 📊 关键配置 ```yaml GPU: 8×Tesla V100S-32GB Batch: 1/GPU (总batch=8) 分辨率: 600×600 GT Epochs: 10 预计耗时: 9.5天 输出: /data/runs/phase4a_stage1/ ``` --- ## 🎯 预期性能 **Epoch 1 (11/2完成)** - Stop Line: 0.27 → 0.30+ - Divider: 0.19 → 0.22+ - mIoU: 0.41 → 0.43+ **Epoch 10 (11/10完成)** - Stop Line: 0.27 → 0.35+ - Divider: 0.19 → 0.28+ - mIoU: 0.41 → 0.48+ --- ## 🔧 常用命令 ```bash # 停止训练 pkill -f "train.py" # 清理评估文件 bash cleanup_eval_hook.sh # 查看checkpoints ls -lh /data/runs/phase4a_stage1/ # 查看进程 ps aux | grep "train.py" | grep -v grep ``` --- ## 📝 完整文档 详见: `project/docs/Phase4A_Stage1_8GPU配置_20251101.md`