383 lines
7.5 KiB
Markdown
383 lines
7.5 KiB
Markdown
|
|
# BEVFusion 项目准备工作清单
|
|||
|
|
|
|||
|
|
**更新时间**:2025-10-22 14:20 UTC
|
|||
|
|
**当前进度**:Epoch 3/23 (9.2%)
|
|||
|
|
**预计训练完成**:2025-10-29
|
|||
|
|
**可用准备时间**:7天
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ✅ 已完成的准备工作
|
|||
|
|
|
|||
|
|
### 脚本工具
|
|||
|
|
- [x] evaluate_checkpoint.sh - Checkpoint评估脚本
|
|||
|
|
- [x] plot_training_curves.py - Loss曲线绘制
|
|||
|
|
- [x] quick_status.sh - 快速状态查看
|
|||
|
|
- [x] 项目进度分析与准备清单.md - 详细分析文档
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 准备工作清单(按优先级)
|
|||
|
|
|
|||
|
|
### 🔥 P0 - 本周必做(训练完成前)
|
|||
|
|
|
|||
|
|
#### ✅ 1. 安装和学习剪枝工具
|
|||
|
|
**时间投入**:2小时
|
|||
|
|
**截止时间**:10-24
|
|||
|
|
**价值**:⭐⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 安装torch-pruning
|
|||
|
|
pip install torch-pruning
|
|||
|
|
|
|||
|
|
# 学习内容
|
|||
|
|
- 阅读官方文档
|
|||
|
|
- 理解剪枝原理
|
|||
|
|
- 测试基本示例
|
|||
|
|
- 准备BEVFusion剪枝脚本
|
|||
|
|
|
|||
|
|
# 目标
|
|||
|
|
- 了解如何剪枝SwinTransformer
|
|||
|
|
- 了解如何剪枝FPN
|
|||
|
|
- 准备剪枝配置模板
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**完成标准**:
|
|||
|
|
- [ ] torch-pruning安装成功
|
|||
|
|
- [ ] 运行过基本示例
|
|||
|
|
- [ ] 创建了剪枝脚本框架
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### ✅ 2. 学习PyTorch量化
|
|||
|
|
**时间投入**:2小时
|
|||
|
|
**截止时间**:10-25
|
|||
|
|
**价值**:⭐⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 学习内容
|
|||
|
|
- PyTorch Quantization文档
|
|||
|
|
- QAT vs PTQ对比
|
|||
|
|
- INT8校准流程
|
|||
|
|
- 敏感层分析
|
|||
|
|
|
|||
|
|
# 准备工作
|
|||
|
|
- 创建QAT配置模板
|
|||
|
|
- 了解量化最佳实践
|
|||
|
|
- 准备校准数据集
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**完成标准**:
|
|||
|
|
- [ ] 理解QAT和PTQ区别
|
|||
|
|
- [ ] 创建量化配置文件
|
|||
|
|
- [ ] 准备100样本校准集
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### ✅ 3. 准备中期评估方案
|
|||
|
|
**时间投入**:1小时
|
|||
|
|
**截止时间**:10-24(Epoch 10前)
|
|||
|
|
**价值**:⭐⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
**评估计划**:
|
|||
|
|
```markdown
|
|||
|
|
Epoch 10中期评估(10-25执行)
|
|||
|
|
├─ 检测性能评估
|
|||
|
|
│ └─ 目标: mAP >64%
|
|||
|
|
├─ 分割性能评估
|
|||
|
|
│ └─ 目标: mIoU >45%
|
|||
|
|
├─ 与Epoch 2对比
|
|||
|
|
│ └─ 分析改进幅度
|
|||
|
|
└─ 决策是否需要调整
|
|||
|
|
└─ loss权重、学习率等
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**准备内容**:
|
|||
|
|
- [ ] 评估脚本已创建 ✅
|
|||
|
|
- [ ] 结果保存路径
|
|||
|
|
- [ ] 对比分析模板
|
|||
|
|
- [ ] 决策标准文档
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### ✅ 4. 创建性能监控仪表板
|
|||
|
|
**时间投入**:2小时
|
|||
|
|
**截止时间**:10-26
|
|||
|
|
**价值**:⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 功能需求
|
|||
|
|
1. 实时Loss曲线
|
|||
|
|
2. 各类别IoU趋势
|
|||
|
|
3. 检测mAP趋势
|
|||
|
|
4. GPU利用率监控
|
|||
|
|
5. 预计完成时间
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**完成标准**:
|
|||
|
|
- [ ] Loss曲线脚本完成
|
|||
|
|
- [ ] 可以自动生成图表
|
|||
|
|
- [ ] 可以定时更新
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 🟡 P1 - 下周建议做(训练期间)
|
|||
|
|
|
|||
|
|
#### ✅ 5. 学习TensorRT
|
|||
|
|
**时间投入**:3-4小时
|
|||
|
|
**截止时间**:10-28
|
|||
|
|
**价值**:⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
**学习重点**:
|
|||
|
|
```
|
|||
|
|
1. ONNX导出最佳实践
|
|||
|
|
- Dynamic Shape处理
|
|||
|
|
- 自定义算子处理
|
|||
|
|
|
|||
|
|
2. TensorRT Engine构建
|
|||
|
|
- Builder配置
|
|||
|
|
- Optimization Profile
|
|||
|
|
- INT8 Calibration
|
|||
|
|
|
|||
|
|
3. Orin特定优化
|
|||
|
|
- DLA使用方法
|
|||
|
|
- Unified Memory优化
|
|||
|
|
- 多CUDA Stream
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**资源**:
|
|||
|
|
- NVIDIA TensorRT Documentation
|
|||
|
|
- NVIDIA Orin Developer Guide
|
|||
|
|
- BEVFusion部署案例
|
|||
|
|
|
|||
|
|
**完成标准**:
|
|||
|
|
- [ ] 了解ONNX导出流程
|
|||
|
|
- [ ] 理解TensorRT基本概念
|
|||
|
|
- [ ] 了解Orin DLA用法
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### ✅ 6. MapTR代码研究(如果倾向集成)
|
|||
|
|
**时间投入**:4小时
|
|||
|
|
**截止时间**:10-30
|
|||
|
|
**价值**:⭐⭐⭐
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 1. 克隆MapTR代码
|
|||
|
|
cd /workspace
|
|||
|
|
git clone https://github.com/hustvl/MapTR.git
|
|||
|
|
|
|||
|
|
# 2. 重点研究文件
|
|||
|
|
MapTR/projects/mmdet3d_plugin/maptr/
|
|||
|
|
├── dense_heads/map_head.py # 核心Head
|
|||
|
|
├── modules/decoder.py # Transformer解码器
|
|||
|
|
└── ...
|
|||
|
|
|
|||
|
|
# 3. 理解数据格式
|
|||
|
|
- 矢量地图标注格式
|
|||
|
|
- 从nuScenes提取方法
|
|||
|
|
- 数据增强策略
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**完成标准**:
|
|||
|
|
- [ ] MapTR代码下载
|
|||
|
|
- [ ] 理解MapTRHead结构
|
|||
|
|
- [ ] 了解数据格式
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### ✅ 7. 准备测试数据集
|
|||
|
|
**时间投入**:2小时
|
|||
|
|
**截止时间**:10-28
|
|||
|
|
**价值**:⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 创建快速测试集
|
|||
|
|
# 100个样本,用于快速验证剪枝/量化效果
|
|||
|
|
|
|||
|
|
目标:
|
|||
|
|
- 覆盖各种场景
|
|||
|
|
- 包含所有类别
|
|||
|
|
- 数据平衡
|
|||
|
|
- 快速加载(<1分钟)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### ✅ 8. 准备Benchmark工具
|
|||
|
|
**时间投入**:2小时
|
|||
|
|
**截止时间**:10-29
|
|||
|
|
**价值**:⭐⭐⭐
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# tools/benchmark.py
|
|||
|
|
# 功能:
|
|||
|
|
1. 推理速度测试
|
|||
|
|
2. 内存使用分析
|
|||
|
|
3. FLOPs计算
|
|||
|
|
4. 模型复杂度分析
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 🔵 P2 - 训练完成后准备
|
|||
|
|
|
|||
|
|
#### ✅ 9. 联系Orin硬件
|
|||
|
|
**时间投入**:采购流程
|
|||
|
|
**截止时间**:11-10
|
|||
|
|
**价值**:⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
**需要准备**:
|
|||
|
|
- [ ] NVIDIA AGX Orin 270T
|
|||
|
|
- [ ] JetPack 5.1+ 安装盘
|
|||
|
|
- [ ] 网络环境配置
|
|||
|
|
- [ ] 测试数据存储
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### ✅ 10. 准备部署文档
|
|||
|
|
**时间投入**:3小时
|
|||
|
|
**截止时间**:11-15
|
|||
|
|
**价值**:⭐⭐⭐
|
|||
|
|
|
|||
|
|
```markdown
|
|||
|
|
部署文档内容:
|
|||
|
|
1. 环境配置步骤
|
|||
|
|
2. 依赖库安装
|
|||
|
|
3. 模型转换流程
|
|||
|
|
4. 性能测试方案
|
|||
|
|
5. 故障排查指南
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 本周行动计划(10-22 ~ 10-27)
|
|||
|
|
|
|||
|
|
### 今天(10-22 周二)✅
|
|||
|
|
- [x] 创建评估脚本
|
|||
|
|
- [x] 创建监控脚本
|
|||
|
|
- [x] 创建准备清单文档
|
|||
|
|
- [ ] 测试Loss曲线绘制
|
|||
|
|
|
|||
|
|
**预计时间**:1小时
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 明天(10-23 周三)
|
|||
|
|
- [ ] 绘制当前Loss曲线
|
|||
|
|
- [ ] 安装torch-pruning
|
|||
|
|
- [ ] 阅读剪枝文档
|
|||
|
|
- [ ] 创建剪枝脚本框架
|
|||
|
|
|
|||
|
|
**预计时间**:3小时
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 周四(10-24)
|
|||
|
|
- [ ] 学习PyTorch量化
|
|||
|
|
- [ ] 准备量化配置
|
|||
|
|
- [ ] 准备校准数据集
|
|||
|
|
- [ ] 测试量化基本流程
|
|||
|
|
|
|||
|
|
**预计时间**:3小时
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 周五(10-25)⭐ 重要
|
|||
|
|
- [ ] **Epoch 10中期评估**
|
|||
|
|
- [ ] 分析性能趋势
|
|||
|
|
- [ ] 决策是否调整
|
|||
|
|
- [ ] 生成中期报告
|
|||
|
|
|
|||
|
|
**预计时间**:2-3小时
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 周末(10-26 ~ 10-27)
|
|||
|
|
- [ ] 学习TensorRT基础
|
|||
|
|
- [ ] 准备ONNX导出脚本
|
|||
|
|
- [ ] 研究Orin DLA优化
|
|||
|
|
- [ ] 准备部署文档框架
|
|||
|
|
|
|||
|
|
**预计时间**:6-8小时
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 准备工作时间分配
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
本周准备时间分配(总计15-20小时):
|
|||
|
|
|
|||
|
|
剪枝准备 ████░░░░░░ 4小时 (26%)
|
|||
|
|
量化准备 ███░░░░░░░ 3小时 (20%)
|
|||
|
|
TensorRT ████░░░░░░ 4小时 (26%)
|
|||
|
|
评估工具 ██░░░░░░░░ 2小时 (13%)
|
|||
|
|
监控工具 ██░░░░░░░░ 2小时 (13%)
|
|||
|
|
其他 ░░░░░░░░░░ 1小时 ( 7%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎓 学习资源
|
|||
|
|
|
|||
|
|
### 剪枝相关
|
|||
|
|
- Torch-Pruning GitHub: https://github.com/VainF/Torch-Pruning
|
|||
|
|
- 论文: Network Slimming
|
|||
|
|
- 论文: Learning Efficient Convolutional Networks
|
|||
|
|
|
|||
|
|
### 量化相关
|
|||
|
|
- PyTorch Quantization: https://pytorch.org/docs/stable/quantization.html
|
|||
|
|
- 论文: Quantization and Training of Neural Networks
|
|||
|
|
- TensorRT INT8: https://docs.nvidia.com/deeplearning/tensorrt/
|
|||
|
|
|
|||
|
|
### TensorRT
|
|||
|
|
- TensorRT Documentation: https://docs.nvidia.com/deeplearning/tensorrt/
|
|||
|
|
- ONNX Runtime: https://github.com/microsoft/onnxruntime
|
|||
|
|
- NVIDIA Orin Guide: https://developer.nvidia.com/embedded/
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💡 执行建议
|
|||
|
|
|
|||
|
|
### 每天花1-2小时准备
|
|||
|
|
```
|
|||
|
|
周一-周五晚上: 2小时/天 × 5 = 10小时
|
|||
|
|
周末: 6小时
|
|||
|
|
|
|||
|
|
总计: 16小时准备工作
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 优先级原则
|
|||
|
|
1. **P0任务优先** - 训练完成后立即需要
|
|||
|
|
2. **边学边做** - 理论+实践结合
|
|||
|
|
3. **循序渐进** - 从简单到复杂
|
|||
|
|
4. **文档记录** - 边学边写文档
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 成功标准
|
|||
|
|
|
|||
|
|
### Epoch 10评估(10-25)
|
|||
|
|
- [ ] 检测mAP >64%
|
|||
|
|
- [ ] 分割mIoU >45%
|
|||
|
|
- [ ] Loss持续下降
|
|||
|
|
- [ ] 训练稳定无异常
|
|||
|
|
|
|||
|
|
### 训练完成(10-29)
|
|||
|
|
- [ ] 检测mAP >65%
|
|||
|
|
- [ ] 分割mIoU >60%
|
|||
|
|
- [ ] 所有checkpoint保存完整
|
|||
|
|
- [ ] 性能评估报告完成
|
|||
|
|
|
|||
|
|
### 准备工作完成
|
|||
|
|
- [ ] 剪枝工具ready
|
|||
|
|
- [ ] 量化工具ready
|
|||
|
|
- [ ] TensorRT知识储备
|
|||
|
|
- [ ] 评估脚本ready
|
|||
|
|
- [ ] 部署文档框架ready
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**下一步**:开始执行P0任务,为后续阶段做好准备!
|
|||
|
|
|