445 lines
9.0 KiB
Markdown
445 lines
9.0 KiB
Markdown
|
|
# BEVFusion模型分析结果与优化方案
|
|||
|
|
|
|||
|
|
**分析时间**: 2025-10-30
|
|||
|
|
**Checkpoint**: epoch_23.pth
|
|||
|
|
**Baseline性能**: NDS 0.6941, mAP 0.6446, mIoU 0.4130
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 模型分析结果
|
|||
|
|
|
|||
|
|
### 总体规模
|
|||
|
|
```
|
|||
|
|
总参数量: 45,721,876 (45.72M)
|
|||
|
|
模型层数: 636层
|
|||
|
|
|
|||
|
|
模型大小:
|
|||
|
|
FP32: 174.42 MB
|
|||
|
|
FP16: 87.21 MB (-50%)
|
|||
|
|
INT8: 43.60 MB (-75%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 各模块参数分布
|
|||
|
|
|
|||
|
|
| 模块 | 参数量 | 占比 | 大小(FP32) | 优化优先级 |
|
|||
|
|
|------|--------|------|-----------|-----------|
|
|||
|
|
| **encoders** | **34.45M** | **75.3%** | **131.41MB** | 🔴 最高 |
|
|||
|
|
| heads | 5.92M | 12.9% | 22.56MB | 🟡 中等 |
|
|||
|
|
| decoder | 4.58M | 10.0% | 17.48MB | 🟡 中等 |
|
|||
|
|
| fuser | 0.78M | 1.7% | 2.96MB | 🟢 低 |
|
|||
|
|
|
|||
|
|
### Encoders详细分布
|
|||
|
|
|
|||
|
|
| 子模块 | 参数量 | 占比 | 说明 |
|
|||
|
|
|--------|--------|------|------|
|
|||
|
|
| **camera.backbone** | **27.55M** | **60.3%** | SwinTransformer(最大瓶颈)🔴 |
|
|||
|
|
| lidar.backbone | 2.70M | 5.9% | Sparse 3D CNN |
|
|||
|
|
| camera.vtransform | 2.61M | 5.7% | DepthLSS视图转换 |
|
|||
|
|
| camera.neck | 1.59M | 3.5% | FPN |
|
|||
|
|
|
|||
|
|
### Heads详细分布
|
|||
|
|
|
|||
|
|
| 子模块 | 参数量 | 占比 | 说明 |
|
|||
|
|
|--------|--------|------|------|
|
|||
|
|
| **map.aspp** | **4.13M** | **9.0%** | ASPP模块(Enhanced分割头)🟡 |
|
|||
|
|
| object.shared_conv | 0.59M | 1.3% | TransFusion共享卷积 |
|
|||
|
|
| map.classifiers | 0.44M | 1.0% | 6个分类器 |
|
|||
|
|
| map.decoder | 0.30M | 0.7% | 2层decoder |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 优化方案设计
|
|||
|
|
|
|||
|
|
### 方案A: 保守优化(推荐用于Orin)
|
|||
|
|
|
|||
|
|
#### 目标
|
|||
|
|
```
|
|||
|
|
参数量: 45.72M → 32M (-30%)
|
|||
|
|
模型大小: 174MB → 30MB (INT8) (-83%)
|
|||
|
|
推理时间: 估计90ms → 50ms (-44%)
|
|||
|
|
精度损失: <2%
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 剪枝策略
|
|||
|
|
```
|
|||
|
|
1. Camera Backbone (SwinTransformer)
|
|||
|
|
当前: 27.55M
|
|||
|
|
剪枝: 20%通道
|
|||
|
|
目标: 22M
|
|||
|
|
预期影响: <1%精度损失
|
|||
|
|
|
|||
|
|
2. ASPP模块
|
|||
|
|
当前: 4.13M
|
|||
|
|
剪枝: 25%通道
|
|||
|
|
目标: 3.1M
|
|||
|
|
预期影响: <0.5%精度损失
|
|||
|
|
|
|||
|
|
3. Decoder
|
|||
|
|
当前: 4.58M
|
|||
|
|
剪枝: 15%通道
|
|||
|
|
目标: 3.9M
|
|||
|
|
预期影响: <0.3%精度损失
|
|||
|
|
|
|||
|
|
4. Camera VTransform
|
|||
|
|
当前: 2.61M
|
|||
|
|
剪枝: 10%通道
|
|||
|
|
目标: 2.35M
|
|||
|
|
预期影响: <0.2%精度损失
|
|||
|
|
|
|||
|
|
总计: 45.72M → 31.35M (-31.4%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 量化策略
|
|||
|
|
```
|
|||
|
|
PTQ + QAT混合:
|
|||
|
|
- 敏感层: FP16(Attention, 分类器)
|
|||
|
|
- 其他层: INT8
|
|||
|
|
|
|||
|
|
预期模型大小:
|
|||
|
|
剪枝后FP32: 119.6MB
|
|||
|
|
混合精度INT8: ~30MB
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 方案B: 激进优化(如果性能不够)
|
|||
|
|
|
|||
|
|
#### 目标
|
|||
|
|
```
|
|||
|
|
参数量: 45.72M → 22M (-52%)
|
|||
|
|
模型大小: 174MB → 22MB (INT8) (-87%)
|
|||
|
|
推理时间: 估计90ms → 35ms (-61%)
|
|||
|
|
精度损失: <4%
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 剪枝策略
|
|||
|
|
```
|
|||
|
|
1. Camera Backbone
|
|||
|
|
剪枝: 40%通道 + 层数减少
|
|||
|
|
16.5M (减少11M)
|
|||
|
|
|
|||
|
|
2. ASPP简化
|
|||
|
|
剪枝: 50%通道
|
|||
|
|
2.1M (减少2M)
|
|||
|
|
|
|||
|
|
3. Decoder简化
|
|||
|
|
剪枝: 30%通道
|
|||
|
|
3.2M (减少1.4M)
|
|||
|
|
|
|||
|
|
总计: 45.72M → 21.8M (-52%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 实施计划
|
|||
|
|
|
|||
|
|
### 阶段1: 模型剪枝(3-5天)
|
|||
|
|
|
|||
|
|
#### Day 1: 准备工具
|
|||
|
|
```bash
|
|||
|
|
# 安装torch-pruning
|
|||
|
|
pip install torch-pruning
|
|||
|
|
|
|||
|
|
# 创建剪枝脚本
|
|||
|
|
tools/pruning/
|
|||
|
|
├── prune_swin_backbone.py # 剪枝SwinTransformer
|
|||
|
|
├── prune_aspp.py # 剪枝ASPP模块
|
|||
|
|
├── prune_decoder.py # 剪枝Decoder
|
|||
|
|
└── prune_bevfusion.py # 整体剪枝脚本
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Day 2-3: 执行剪枝
|
|||
|
|
```bash
|
|||
|
|
# 运行剪枝
|
|||
|
|
python tools/pruning/prune_bevfusion.py \
|
|||
|
|
--checkpoint runs/enhanced_from_epoch19/epoch_23.pth \
|
|||
|
|
--config configs/.../multitask_enhanced_phase1_HIGHRES.yaml \
|
|||
|
|
--target-params 32000000 \
|
|||
|
|
--output bevfusion_pruned_32M.pth
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Day 4-5: 微调
|
|||
|
|
```bash
|
|||
|
|
# 微调3 epochs恢复精度
|
|||
|
|
torchpack dist-run -np 8 python tools/train.py \
|
|||
|
|
configs/.../multitask_enhanced_phase1_HIGHRES_pruned.yaml \
|
|||
|
|
--load_from bevfusion_pruned_32M.pth \
|
|||
|
|
--cfg-options \
|
|||
|
|
max_epochs=3 \
|
|||
|
|
optimizer.lr=5.0e-6
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 阶段2: INT8量化(4-6天)
|
|||
|
|
|
|||
|
|
#### Day 1: PTQ验证
|
|||
|
|
```python
|
|||
|
|
# 快速PTQ测试
|
|||
|
|
python tools/quantization/ptq_test.py \
|
|||
|
|
--model bevfusion_pruned_32M_finetuned.pth \
|
|||
|
|
--calibration-samples 200
|
|||
|
|
|
|||
|
|
预期结果:
|
|||
|
|
精度损失: 1.5-2.5%
|
|||
|
|
如果>3%: 需要QAT
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Day 2-5: QAT训练
|
|||
|
|
```bash
|
|||
|
|
# QAT训练5 epochs
|
|||
|
|
torchpack dist-run -np 8 python tools/train.py \
|
|||
|
|
configs/.../multitask_enhanced_phase1_HIGHRES_qat.yaml \
|
|||
|
|
--load_from bevfusion_pruned_32M_finetuned.pth \
|
|||
|
|
--cfg-options \
|
|||
|
|
max_epochs=5 \
|
|||
|
|
optimizer.lr=1.0e-6 \
|
|||
|
|
quantization.enabled=true
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Day 6: 评估
|
|||
|
|
```bash
|
|||
|
|
# 评估INT8模型
|
|||
|
|
python tools/test.py \
|
|||
|
|
configs/.../multitask_enhanced_phase1_HIGHRES_qat.yaml \
|
|||
|
|
bevfusion_pruned_32M_qat.pth \
|
|||
|
|
--eval bbox map
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 预期性能对比
|
|||
|
|
|
|||
|
|
| 模型版本 | 参数量 | 大小 | 推理时间(A100) | 推理时间(Orin) | NDS | mAP | mIoU |
|
|||
|
|
|---------|--------|------|---------------|---------------|-----|-----|------|
|
|||
|
|
| **Epoch 23 (FP32)** | 45.72M | 174MB | ~90ms | ~450-900ms | 0.6941 | 0.6446 | 0.4130 |
|
|||
|
|
| **剪枝后 (FP32)** | 32M | 122MB | ~60ms | ~300-600ms | 0.685+ | 0.635+ | 0.405+ |
|
|||
|
|
| **剪枝+INT8** | 32M | 30MB | ~35ms | ~150-300ms | 0.680+ | 0.630+ | 0.400+ |
|
|||
|
|
| **+TensorRT** | 32M | 30MB | ~25ms | **60-80ms** ✅ | 0.680+ | 0.630+ | 0.400+ |
|
|||
|
|
|
|||
|
|
**结论**: 通过剪枝+量化+TensorRT,可以在Orin上达到**60-80ms推理时间**,满足部署要求!
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 关键优化目标
|
|||
|
|
|
|||
|
|
### 1. Camera Backbone (27.55M → 20M)
|
|||
|
|
|
|||
|
|
**优化策略**:
|
|||
|
|
```python
|
|||
|
|
# 通道剪枝策略
|
|||
|
|
原始channels: [96, 192, 384, 768]
|
|||
|
|
剪枝后: [80, 160, 320, 640] (-20%)
|
|||
|
|
|
|||
|
|
# 或层数剪枝
|
|||
|
|
原始depths: [2, 2, 6, 2]
|
|||
|
|
剪枝后: [2, 2, 4, 2] (减少2层)
|
|||
|
|
|
|||
|
|
预期减少: 5-7M参数
|
|||
|
|
精度影响: <1%
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. ASPP模块 (4.13M → 3M)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 降低ASPP通道数
|
|||
|
|
原始: 512 channels ASPP
|
|||
|
|
剪枝: 384 channels (-25%)
|
|||
|
|
|
|||
|
|
预期减少: 1M参数
|
|||
|
|
精度影响: <0.5% (主要影响分割)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. Decoder (4.58M → 3.9M)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 简化decoder通道
|
|||
|
|
原始channels: [128, 256]
|
|||
|
|
剪枝后: [96, 192] (-25%)
|
|||
|
|
|
|||
|
|
预期减少: 0.7M参数
|
|||
|
|
精度影响: <0.3%
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 立即行动清单
|
|||
|
|
|
|||
|
|
### 今天完成
|
|||
|
|
- [x] 模型分析 ✅
|
|||
|
|
- [ ] 查看分析结果
|
|||
|
|
- [ ] 确定剪枝策略(方案A或B)
|
|||
|
|
- [ ] 准备剪枝工具
|
|||
|
|
|
|||
|
|
### 明天开始
|
|||
|
|
- [ ] 安装torch-pruning
|
|||
|
|
- [ ] 创建剪枝脚本
|
|||
|
|
- [ ] 开始剪枝实施
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 下一步命令
|
|||
|
|
|
|||
|
|
### 查看完整分析结果
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 查看最新分析
|
|||
|
|
ANALYSIS_FILE=$(ls -t analysis_results/checkpoint_analysis_*.txt | head -1)
|
|||
|
|
cat $ANALYSIS_FILE
|
|||
|
|
|
|||
|
|
# 查看详细模块分布
|
|||
|
|
cat $ANALYSIS_FILE | grep -A 30 "各模块参数分布"
|
|||
|
|
|
|||
|
|
# 查看优化建议
|
|||
|
|
cat $ANALYSIS_FILE | grep -A 20 "优化建议"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 准备剪枝工具
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 检查torch-pruning是否已安装
|
|||
|
|
python -c "import torch_pruning" 2>/dev/null && echo "已安装" || echo "需要安装"
|
|||
|
|
|
|||
|
|
# 如果需要安装
|
|||
|
|
pip install torch-pruning
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💡 重要发现
|
|||
|
|
|
|||
|
|
### 实际参数量比预期小
|
|||
|
|
|
|||
|
|
**原因分析**:
|
|||
|
|
- 之前估计110M是包括optimizer状态的总大小
|
|||
|
|
- **实际模型参数**: 45.72M ✅
|
|||
|
|
- 这意味着优化后模型会更小,更适合Orin部署!
|
|||
|
|
|
|||
|
|
### 优化潜力更大
|
|||
|
|
```
|
|||
|
|
原始: 45.72M, 174MB (FP32)
|
|||
|
|
优化: 32M, 30MB (INT8) ← 比预期更好!
|
|||
|
|
压缩: 83%
|
|||
|
|
|
|||
|
|
Orin推理估算:
|
|||
|
|
原始FP32: 450-900ms ❌ 太慢
|
|||
|
|
剪枝INT8: 150-300ms ⚠️ 仍慢
|
|||
|
|
+TensorRT: 60-80ms ✅ 达标!
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 修订后的优化目标
|
|||
|
|
|
|||
|
|
### 新目标(基于实际参数量)
|
|||
|
|
```
|
|||
|
|
阶段1剪枝: 45.72M → 32M (-30%)
|
|||
|
|
阶段2量化: 122MB (FP32) → 30MB (INT8)
|
|||
|
|
总压缩比: 83% (174MB → 30MB)
|
|||
|
|
|
|||
|
|
精度目标:
|
|||
|
|
NDS: 0.6941 → >0.680 (损失<2%)
|
|||
|
|
mAP: 0.6446 → >0.630 (损失<2.5%)
|
|||
|
|
mIoU: 0.4130 → >0.400 (损失<3%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📅 更新的时间表
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
今天 (Day 1):
|
|||
|
|
✅ 模型分析完成
|
|||
|
|
→ 确定剪枝策略
|
|||
|
|
→ 准备工具
|
|||
|
|
|
|||
|
|
明天 (Day 2):
|
|||
|
|
→ 安装torch-pruning
|
|||
|
|
→ 创建剪枝脚本
|
|||
|
|
→ 开始剪枝实施
|
|||
|
|
|
|||
|
|
Day 3-4:
|
|||
|
|
→ 剪枝微调 (3 epochs, ~12小时)
|
|||
|
|
|
|||
|
|
Day 5:
|
|||
|
|
→ 评估剪枝效果
|
|||
|
|
→ PTQ量化测试
|
|||
|
|
|
|||
|
|
Day 6-9:
|
|||
|
|
→ QAT训练 (5 epochs, ~20小时)
|
|||
|
|
|
|||
|
|
Day 10:
|
|||
|
|
→ 最终评估
|
|||
|
|
→ 准备TensorRT转换
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**总时间**: 约10天完成优化
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎉 好消息!
|
|||
|
|
|
|||
|
|
### 比预期更好的发现
|
|||
|
|
|
|||
|
|
1. **模型更小**: 45.72M vs 预估110M
|
|||
|
|
- 优化空间更大
|
|||
|
|
- 最终模型会更小
|
|||
|
|
|
|||
|
|
2. **优化目标可达**:
|
|||
|
|
- 剪枝后32M非常合理
|
|||
|
|
- INT8后仅30MB
|
|||
|
|
- Orin部署更有信心
|
|||
|
|
|
|||
|
|
3. **时间更短**:
|
|||
|
|
- 模型小,剪枝和微调更快
|
|||
|
|
- 预计10天完成(vs 之前14天)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📂 分析结果文件
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
analysis_results/
|
|||
|
|
└── checkpoint_analysis_<timestamp>.txt
|
|||
|
|
├── 总体统计
|
|||
|
|
├── 模块分布
|
|||
|
|
├── Encoders详细
|
|||
|
|
├── Heads详细
|
|||
|
|
└── 优化建议
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 立即行动
|
|||
|
|
|
|||
|
|
### 确认优化方案
|
|||
|
|
|
|||
|
|
**推荐: 方案A(保守优化)**
|
|||
|
|
- 剪枝30%: 45.72M → 32M
|
|||
|
|
- INT8量化: 122MB → 30MB
|
|||
|
|
- 精度损失: <2%
|
|||
|
|
- Orin推理: 60-80ms ✅
|
|||
|
|
|
|||
|
|
**是否采用?**
|
|||
|
|
|
|||
|
|
### 下一步
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 1. 查看完整分析
|
|||
|
|
cat analysis_results/checkpoint_analysis_*.txt
|
|||
|
|
|
|||
|
|
# 2. 准备剪枝工具
|
|||
|
|
pip install torch-pruning
|
|||
|
|
|
|||
|
|
# 3. 创建剪枝配置和脚本
|
|||
|
|
# (明天开始)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**状态**: ✅ 分析完成
|
|||
|
|
**发现**: 模型比预期小,优化潜力更大
|
|||
|
|
**建议**: 采用方案A(30%剪枝 + INT8)
|
|||
|
|
**下一步**: 准备剪枝工具
|
|||
|
|
|