13 KiB

Raw Blame History

BEVFusion 剪枝工具使用指南

更新时间: 2025-10-30
工具类型: PyTorch内置剪枝（适用于PyTorch 1.10+）

📚 剪枝知识快速理解

什么是剪枝？

训练好的模型中，有些神经元/通道不太重要
剪枝 = 移除这些不重要的部分
结果 = 更小更快的模型，精度略有下降

剪枝需要重新训练吗？

❌ 不是完全重新训练
✅ 只需要短期微调（3-5 epochs）

对比:
  完全训练: 20-24 epochs, 7-8天
  剪枝微调: 3-5 epochs, 12-20小时  ← 快30倍！

为什么需要微调？

剪枝后 → 精度下降3-5%
微调训练 → 恢复精度，最终损失<2%

微调时间短，因为:
  ✓ 起点好（epoch_23已训练得很好）
  ✓ 只需调整，不需学习新特征
  ✓ 使用很小的学习率

🛠️ 已准备的工具

1. 剪枝脚本

文件: tools/pruning/prune_bevfusion_builtin.py
功能:

基于L1范数自动剪枝
智能分配各模块剪枝比例
保留最重要的通道

2. 一键执行脚本

文件: 一键剪枝和微调.sh
功能:

自动执行剪枝
可选启动微调训练
生成完整日志

3. 分析工具

文件: tools/analysis/analyze_checkpoint.py
功能: 分析剪枝前后的模型

⚡ 快速开始（3种方式）

方式1: 一键执行（推荐）

cd /workspace/bevfusion

# 一键剪枝+微调
chmod +x 一键剪枝和微调.sh
bash 一键剪枝和微调.sh

# 根据提示选择:
#   [1] 立即微调（后台12小时）
#   [2] 稍后微调
#   [3] 先查看结果

适合: 快速上手用户

方式2: 分步执行

Step 1: 只剪枝（15分钟）

cd /workspace/bevfusion

# 执行剪枝
python tools/pruning/prune_bevfusion_builtin.py \
  --checkpoint runs/enhanced_from_epoch19/epoch_23.pth \
  --output pruning_results/bevfusion_pruned_32M.pth \
  --target-ratio 0.70

# 输出: bevfusion_pruned_32M.pth (~32M参数)

Step 2: 查看剪枝结果

# 分析剪枝后的模型
python tools/analysis/analyze_checkpoint.py \
  pruning_results/bevfusion_pruned_32M.pth

Step 3: 微调训练（12小时）

# 仅在对剪枝结果满意后执行
torchpack dist-run -np 8 python tools/train.py \
  configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/multitask_enhanced_phase1_HIGHRES.yaml \
  --load_from pruning_results/bevfusion_pruned_32M.pth \
  --run-dir runs/pruned_finetune \
  --cfg-options \
    max_epochs=3 \
    optimizer.lr=5.0e-6 \
    data.samples_per_gpu=2 \
    data.workers_per_gpu=0

适合: 想逐步控制的用户

方式3: 仅剪枝不微调（测试用）

# 快速剪枝测试
python tools/pruning/prune_bevfusion_builtin.py \
  --checkpoint runs/enhanced_from_epoch19/epoch_23.pth \
  --output pruning_results/test_pruned.pth \
  --target-ratio 0.80  # 保守测试：只剪20%

# 查看结果
python tools/analysis/analyze_checkpoint.py \
  pruning_results/test_pruned.pth

适合: 想先测试效果的用户

📊 剪枝参数说明

--target-ratio 参数

值	含义	保留参数	剪枝比例	适用场景
0.80	保留80%	36.6M	20%剪枝	保守测试
0.70	保留70%	32M	30%剪枝	推荐 ✅
0.60	保留60%	27.4M	40%剪枝	激进优化
0.50	保留50%	22.9M	50%剪枝	极限压缩

🎯 剪枝策略详解

自动分配的剪枝比例

剪枝脚本会自动对不同模块使用不同比例：

剪枝计划:
  encoders.camera.backbone:  剪20%  # 最大模块，温和剪枝
  heads.map.aspp:           剪25%  # ASPP模块，可适度剪枝
  decoder:                  剪15%  # Decoder，保守剪枝
  encoders.camera.vtransform: 剪10%  # VTransform，轻度剪枝
  
其他模块: 不剪枝（如LiDAR backbone, heads等）

为什么这样分配？

Camera Backbone (27.55M, 60%)
- 最大模块，剪枝收益最大
- 但也重要，所以只剪20%
ASPP (4.13M, 9%)
- 可以适度剪枝（25%）
- 主要影响分割，检测不受影响
Decoder (4.58M, 10%)
- 保守剪枝（15%）
- 影响最终特征质量
不剪枝的模块
- LiDAR Backbone: 已经很小（2.7M）
- Detection Head: 影响检测精度
- Fuser: 很小（0.78M）

⏱️ 完整流程时间表

今天（准备阶段）

✅ 创建剪枝工具  (已完成)
⏳ 测试剪枝功能  (5分钟)
⏳ 执行实际剪枝  (15分钟)
⏳ 分析剪枝结果  (5分钟)

总计: ~30分钟

明天（微调阶段）

启动微调训练:
  - 3 epochs
  - 每epoch ~4小时
  - 总计: 12小时（后台运行）
  
监控和评估:
  - 监控loss下降
  - Epoch 3完成后评估

后天（评估和量化）

评估剪枝效果:
  - 对比epoch_23 baseline
  - 确认精度损失<2%
  
如果满意:
  → 进入INT8量化阶段

📋 执行检查清单

剪枝前检查

Checkpoint存在: epoch_23.pth ✅
剪枝工具已创建 ✅
输出目录已创建 ✅
Stage 1训练状态（不要冲突）

剪枝中检查

剪枝进度正常
无错误信息
输出文件生成

剪枝后检查

参数量符合预期（~32M）
模型文件完整
可正常加载

微调前检查

GPU资源可用（8张或4张）
配置文件正确
学习率设置合理（5e-6）

🚀 立即执行命令

快速测试（5分钟）

cd /workspace/bevfusion

# 小规模剪枝测试（剪20%）
python tools/pruning/prune_bevfusion_builtin.py \
  --checkpoint runs/enhanced_from_epoch19/epoch_23.pth \
  --output pruning_results/test_pruned_20percent.pth \
  --target-ratio 0.80

# 查看结果
python tools/analysis/analyze_checkpoint.py \
  pruning_results/test_pruned_20percent.pth

正式剪枝（15分钟）

# 剪枝30%（推荐）
python tools/pruning/prune_bevfusion_builtin.py \
  --checkpoint runs/enhanced_from_epoch19/epoch_23.pth \
  --output pruning_results/bevfusion_pruned_32M.pth \
  --target-ratio 0.70

一键完整流程

# 自动化执行剪枝+询问是否微调
bash 一键剪枝和微调.sh

📊 预期结果

剪枝后（立即）

参数量: 45.72M → ~32M (-30%)
模型大小: 174MB → ~122MB (FP32)
精度: 未知（需要评估或微调）
预期精度损失（剪枝直接评估）: 3-5%

微调后（12小时后）

参数量: 32M（不变）
模型大小: ~122MB (FP32)
精度恢复: 预期损失<2%
  NDS: 0.6941 → 0.680-0.690
  mAP: 0.6446 → 0.630-0.640
  mIoU: 0.4130 → 0.400-0.410

INT8量化后（5天后）

参数量: 32M（不变）
模型大小: 122MB → 30MB (INT8)
精度: 微调后基础上再损失<1%
总精度损失: <3%
推理加速: 2-3倍

🔍 监控剪枝和微调

剪枝过程监控

# 实时查看剪枝进度
tail -f pruning_results/pruning_log_*.txt

微调过程监控

# 查看微调训练
tail -f runs/pruned_finetune_*/finetune.log | grep "Epoch \["

# 查看GPU状态
watch -n 10 nvidia-smi

⚠️ 注意事项

1. 与Stage 1训练的关系

Stage 1训练: 使用GPU 0-3，还需~9天
剪枝操作: CPU操作，15分钟
微调训练: 建议等Stage 1完成后再启动
- 或者使用不同的GPU

2. 剪枝的不可逆性

剪枝后无法直接恢复被删除的通道
建议保留原始checkpoint（epoch_23.pth）
剪枝后的模型需要微调才能使用

3. 微调的重要性

剪枝不微调: 精度损失3-5% ❌
剪枝+微调: 精度损失<2% ✅

微调时间: 仅12小时
收益: 恢复2-3%精度

结论: 微调必不可少！

📂 工具文件清单

工具和脚本:
├── tools/pruning/
│   ├── prune_bevfusion_builtin.py    # 剪枝主脚本
│   └── test_pruning.py               # 测试工具可用性
├── tools/analysis/
│   └── analyze_checkpoint.py         # 分析模型
├── 一键剪枝和微调.sh                  # 一键执行
└── 剪枝工具使用指南.md                # 本文档

结果目录:
├── pruning_results/
│   ├── bevfusion_pruned_32M.pth      # 剪枝后模型（待生成）
│   ├── pruning_log_*.txt             # 剪枝日志
│   └── pruning_plan.md               # 剪枝方案
└── runs/pruned_finetune_*/
    ├── epoch_1.pth                   # 微调checkpoint
    ├── epoch_2.pth
    ├── epoch_3.pth
    └── finetune.log                  # 微调日志

🎯 三种使用场景

场景1: 我想快速测试剪枝效果

# 小规模剪枝测试（剪20%）
python tools/pruning/prune_bevfusion_builtin.py \
  --checkpoint runs/enhanced_from_epoch19/epoch_23.pth \
  --output pruning_results/test_20percent.pth \
  --target-ratio 0.80

# 立即查看结果
python tools/analysis/analyze_checkpoint.py \
  pruning_results/test_20percent.pth

# 时间: 5分钟
# 风险: 无

场景2: 我想正式剪枝，但手动控制微调

# Step 1: 剪枝（15分钟）
python tools/pruning/prune_bevfusion_builtin.py \
  --checkpoint runs/enhanced_from_epoch19/epoch_23.pth \
  --output pruning_results/bevfusion_pruned_32M.pth \
  --target-ratio 0.70

# Step 2: 查看结果，决定是否满意
python tools/analysis/analyze_checkpoint.py \
  pruning_results/bevfusion_pruned_32M.pth

# Step 3: 如果满意，等Stage 1完成后启动微调
# (或者现在启动，使用不同GPU)

场景3: 我想一键完成所有操作

# 一键执行，自动询问
bash 一键剪枝和微调.sh

# 根据提示选择是否立即微调
# 如果选择[1]，会自动后台启动微调

📊 剪枝效果预期

立即效果（剪枝后，未微调）

参数量: 45.72M → 32M ✅
模型大小: 174MB → 122MB ✅
推理速度: 预计提升20-30% ✅
精度: 预计下降3-5% ⚠️

此时的模型:
  - 可以运行
  - 但精度不够
  - 需要微调恢复

微调后效果（12小时后）

参数量: 32M ✅
模型大小: 122MB ✅
推理速度: 提升20-30% ✅
精度损失: <2% ✅

预期性能:
  NDS: 0.680-0.690 (vs 0.6941)
  mAP: 0.630-0.640 (vs 0.6446)
  mIoU: 0.400-0.410 (vs 0.4130)

🚀 推荐执行流程

今天（30分钟）

# 1. 快速测试（5分钟）
python tools/pruning/prune_bevfusion_builtin.py \
  --checkpoint runs/enhanced_from_epoch19/epoch_23.pth \
  --output pruning_results/test_pruned.pth \
  --target-ratio 0.80

# 2. 查看测试结果（2分钟）
python tools/analysis/analyze_checkpoint.py \
  pruning_results/test_pruned.pth

# 3. 如果满意，执行正式剪枝（15分钟）
python tools/pruning/prune_bevfusion_builtin.py \
  --checkpoint runs/enhanced_from_epoch19/epoch_23.pth \
  --output pruning_results/bevfusion_pruned_32M.pth \
  --target-ratio 0.70

# 4. 分析剪枝结果（2分钟）
python tools/analysis/analyze_checkpoint.py \
  pruning_results/bevfusion_pruned_32M.pth

等Stage 1完成后（~9天后）

# 启动微调训练（12小时）
torchpack dist-run -np 8 python tools/train.py \
  configs/.../multitask_enhanced_phase1_HIGHRES.yaml \
  --load_from pruning_results/bevfusion_pruned_32M.pth \
  --run-dir runs/pruned_finetune \
  --cfg-options \
    max_epochs=3 \
    optimizer.lr=5.0e-6

💡 常见问题

Q1: 剪枝会损坏模型吗？

A: 不会。剪枝只是移除部分通道，模型结构依然完整，可以正常推理。只是精度会下降，需要微调恢复。

Q2: 可以不微调直接使用吗？

A: 可以，但不推荐。剪枝后精度会下降3-5%，微调12小时可以恢复大部分精度（最终损失<2%），非常值得。

Q3: 微调可以用更少的GPU吗？

A: 可以。可以用4张或6张GPU，时间会相应延长。

# 使用4张GPU微调
torchpack dist-run -np 4 python tools/train.py ... 
# 时间: ~18小时

Q4: 剪枝失败了怎么办？

原始checkpoint（epoch_23.pth）不会被修改
可以重新尝试不同的剪枝比例
或者跳过剪枝，直接进行INT8量化

Q5: 微调失败了怎么办？

剪枝后的模型已保存，可以重新微调
尝试更小的学习率（1e-6）
或者减少剪枝比例重新开始

🎉 总结

剪枝需要训练吗？

✅ 需要，但是短期微调，不是完全重新训练

时间对比:
  完全训练: 7-8天
  剪枝微调: 12小时  ← 快14倍！

结论: 剪枝+微调是性价比最高的优化方式

立即可执行

# 现在就可以开始剪枝（不影响Stage 1训练）
cd /workspace/bevfusion
bash 一键剪枝和微调.sh

# 或者分步执行（更可控）
python tools/pruning/prune_bevfusion_builtin.py \
  --checkpoint runs/enhanced_from_epoch19/epoch_23.pth \
  --output pruning_results/bevfusion_pruned_32M.pth \
  --target-ratio 0.70

13 KiB Raw Blame History Unescape Escape