586 lines
14 KiB
Markdown
586 lines
14 KiB
Markdown
|
|
# BEVFusion 模型优化启动计划
|
|||
|
|
|
|||
|
|
**开始时间**: 2025-10-30
|
|||
|
|
**Baseline**: Epoch 23 (NDS 0.6941, mAP 0.6446, mIoU 0.4130)
|
|||
|
|
**目标**: 准备Orin部署的优化模型
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 优化目标
|
|||
|
|
|
|||
|
|
### 最终部署目标
|
|||
|
|
```
|
|||
|
|
硬件: NVIDIA Orin 270T
|
|||
|
|
推理时间: <80ms (理想<60ms)
|
|||
|
|
吞吐量: >12 FPS (理想>16 FPS)
|
|||
|
|
功耗: <60W (理想<45W)
|
|||
|
|
精度损失: <3%
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 优化路线
|
|||
|
|
```
|
|||
|
|
原始模型: 110M参数, 450 GFLOPs, 90ms@A100
|
|||
|
|
↓
|
|||
|
|
剪枝模型: 60M参数, 250 GFLOPs, 50ms@A100 (-45%)
|
|||
|
|
↓
|
|||
|
|
INT8模型: 15M参数, 62 GFLOPs, 40ms@A100 (-56%)
|
|||
|
|
↓
|
|||
|
|
TensorRT: 15M参数, 优化kernel, 30ms@A100 (-67%)
|
|||
|
|
↓
|
|||
|
|
Orin部署: 50-60ms推理, 16+ FPS, <50W 目标达成✅
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 三阶段优化计划
|
|||
|
|
|
|||
|
|
### 阶段1: 模型分析(1-2天,立即开始)
|
|||
|
|
|
|||
|
|
#### 任务清单
|
|||
|
|
- [ ] 分析模型参数量和FLOPs
|
|||
|
|
- [ ] Profiling推理性能瓶颈
|
|||
|
|
- [ ] 敏感度分析(哪些层可剪枝)
|
|||
|
|
- [ ] 确定剪枝策略
|
|||
|
|
|
|||
|
|
#### 需要的工具
|
|||
|
|
```python
|
|||
|
|
tools/analysis/
|
|||
|
|
├── model_complexity.py # 模型复杂度分析
|
|||
|
|
├── profile_inference.py # 推理性能profiling
|
|||
|
|
├── sensitivity_analysis.py # 敏感度分析
|
|||
|
|
└── layer_statistics.py # 层统计信息
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 阶段2: 模型剪枝(3-5天)
|
|||
|
|
|
|||
|
|
#### 目标
|
|||
|
|
```
|
|||
|
|
参数量: 110M → 60M (-45%)
|
|||
|
|
FLOPs: 450G → 250G (-44%)
|
|||
|
|
精度损失: <1.5%
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 剪枝策略
|
|||
|
|
```
|
|||
|
|
1. SwinTransformer Backbone
|
|||
|
|
- 通道剪枝: 减少20-30% channels
|
|||
|
|
- 层数剪枝: 可选择减少attention层
|
|||
|
|
|
|||
|
|
2. FPN Neck
|
|||
|
|
- 通道剪枝: 减少25-30% channels
|
|||
|
|
|
|||
|
|
3. Decoder
|
|||
|
|
- 通道剪枝: 减少20% channels
|
|||
|
|
|
|||
|
|
4. Detection/Segmentation Heads
|
|||
|
|
- 谨慎剪枝: 减少10-15% (影响精度)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 剪枝工具
|
|||
|
|
- Torch-Pruning (推荐)
|
|||
|
|
- torch.nn.utils.prune (内置)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 阶段3: 量化训练(4-6天)
|
|||
|
|
|
|||
|
|
#### 目标
|
|||
|
|
```
|
|||
|
|
模型大小: 441MB (FP32) → 110MB (INT8) (-75%)
|
|||
|
|
推理速度: 2-3倍提升
|
|||
|
|
精度损失: <2% (累计<3%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 量化策略
|
|||
|
|
```
|
|||
|
|
1. PTQ (Post-Training Quantization)
|
|||
|
|
- 快速验证可行性
|
|||
|
|
- 预期精度损失: 2-3%
|
|||
|
|
|
|||
|
|
2. QAT (Quantization-Aware Training)
|
|||
|
|
- 训练恢复精度
|
|||
|
|
- 5个epochs, lr=1e-6
|
|||
|
|
- 预期精度恢复: 1-2%
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 立即行动:阶段1启动
|
|||
|
|
|
|||
|
|
### Step 1: 模型复杂度分析
|
|||
|
|
|
|||
|
|
创建分析脚本:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# tools/analysis/model_complexity.py
|
|||
|
|
|
|||
|
|
import torch
|
|||
|
|
import torch.nn as nn
|
|||
|
|
from thop import profile, clever_format
|
|||
|
|
from mmcv import Config
|
|||
|
|
from mmdet3d.models import build_model
|
|||
|
|
|
|||
|
|
def analyze_model_complexity(config_file, checkpoint_file=None):
|
|||
|
|
"""分析模型复杂度"""
|
|||
|
|
|
|||
|
|
# 加载配置
|
|||
|
|
cfg = Config.fromfile(config_file)
|
|||
|
|
|
|||
|
|
# 构建模型
|
|||
|
|
model = build_model(cfg.model)
|
|||
|
|
model.eval()
|
|||
|
|
|
|||
|
|
if checkpoint_file:
|
|||
|
|
checkpoint = torch.load(checkpoint_file, map_location='cpu')
|
|||
|
|
model.load_state_dict(checkpoint['state_dict'])
|
|||
|
|
|
|||
|
|
# 统计参数量
|
|||
|
|
total_params = sum(p.numel() for p in model.parameters())
|
|||
|
|
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
|
|||
|
|
|
|||
|
|
print("=" * 80)
|
|||
|
|
print("模型参数统计")
|
|||
|
|
print("=" * 80)
|
|||
|
|
print(f"总参数量: {total_params:,} ({total_params/1e6:.2f}M)")
|
|||
|
|
print(f"可训练参数: {trainable_params:,} ({trainable_params/1e6:.2f}M)")
|
|||
|
|
print(f"模型大小 (FP32): {total_params * 4 / 1024 / 1024:.2f} MB")
|
|||
|
|
print()
|
|||
|
|
|
|||
|
|
# 分模块统计
|
|||
|
|
print("=" * 80)
|
|||
|
|
print("各模块参数统计")
|
|||
|
|
print("=" * 80)
|
|||
|
|
|
|||
|
|
module_params = {}
|
|||
|
|
for name, module in model.named_children():
|
|||
|
|
params = sum(p.numel() for p in module.parameters())
|
|||
|
|
module_params[name] = params
|
|||
|
|
print(f"{name:30s}: {params:12,} ({params/total_params*100:5.2f}%)")
|
|||
|
|
|
|||
|
|
print()
|
|||
|
|
|
|||
|
|
# FLOPs统计(需要dummy input)
|
|||
|
|
print("=" * 80)
|
|||
|
|
print("计算量统计 (需要dummy input)")
|
|||
|
|
print("=" * 80)
|
|||
|
|
|
|||
|
|
# 创建dummy inputs
|
|||
|
|
batch_size = 1
|
|||
|
|
dummy_images = torch.randn(batch_size, 6, 3, 256, 704) # 6个相机视角
|
|||
|
|
dummy_points = torch.randn(batch_size, 40000, 5) # 点云
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
flops, params = profile(model, inputs=(dummy_images, dummy_points))
|
|||
|
|
flops, params = clever_format([flops, params], "%.3f")
|
|||
|
|
print(f"FLOPs: {flops}")
|
|||
|
|
print(f"Params: {params}")
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"FLOPs计算失败: {e}")
|
|||
|
|
print("可能需要修改model forward以支持profile")
|
|||
|
|
|
|||
|
|
return model, total_params, module_params
|
|||
|
|
|
|||
|
|
if __name__ == '__main__':
|
|||
|
|
import sys
|
|||
|
|
|
|||
|
|
if len(sys.argv) < 2:
|
|||
|
|
print("用法: python model_complexity.py <config_file> [checkpoint_file]")
|
|||
|
|
sys.exit(1)
|
|||
|
|
|
|||
|
|
config_file = sys.argv[1]
|
|||
|
|
checkpoint_file = sys.argv[2] if len(sys.argv) > 2 else None
|
|||
|
|
|
|||
|
|
model, total_params, module_params = analyze_model_complexity(
|
|||
|
|
config_file,
|
|||
|
|
checkpoint_file
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
print("\n分析完成!")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 2: 推理性能Profiling
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# tools/analysis/profile_inference.py
|
|||
|
|
|
|||
|
|
import torch
|
|||
|
|
import time
|
|||
|
|
import numpy as np
|
|||
|
|
from mmcv import Config
|
|||
|
|
from mmdet3d.models import build_model
|
|||
|
|
from mmdet3d.datasets import build_dataloader, build_dataset
|
|||
|
|
|
|||
|
|
def profile_inference(config_file, checkpoint_file, num_samples=100):
|
|||
|
|
"""Profiling推理性能"""
|
|||
|
|
|
|||
|
|
# 加载配置和模型
|
|||
|
|
cfg = Config.fromfile(config_file)
|
|||
|
|
model = build_model(cfg.model).cuda()
|
|||
|
|
checkpoint = torch.load(checkpoint_file)
|
|||
|
|
model.load_state_dict(checkpoint['state_dict'])
|
|||
|
|
model.eval()
|
|||
|
|
|
|||
|
|
# 构建数据集
|
|||
|
|
dataset = build_dataset(cfg.data.val)
|
|||
|
|
data_loader = build_dataloader(
|
|||
|
|
dataset,
|
|||
|
|
samples_per_gpu=1,
|
|||
|
|
workers_per_gpu=0,
|
|||
|
|
dist=False,
|
|||
|
|
shuffle=False
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 预热
|
|||
|
|
print("预热GPU...")
|
|||
|
|
with torch.no_grad():
|
|||
|
|
for i, data in enumerate(data_loader):
|
|||
|
|
if i >= 10:
|
|||
|
|
break
|
|||
|
|
_ = model(return_loss=False, rescale=True, **data)
|
|||
|
|
|
|||
|
|
# 性能测试
|
|||
|
|
print(f"\n开始profiling (测试{num_samples}个样本)...")
|
|||
|
|
times = []
|
|||
|
|
|
|||
|
|
with torch.no_grad():
|
|||
|
|
for i, data in enumerate(data_loader):
|
|||
|
|
if i >= num_samples:
|
|||
|
|
break
|
|||
|
|
|
|||
|
|
torch.cuda.synchronize()
|
|||
|
|
start = time.time()
|
|||
|
|
|
|||
|
|
_ = model(return_loss=False, rescale=True, **data)
|
|||
|
|
|
|||
|
|
torch.cuda.synchronize()
|
|||
|
|
end = time.time()
|
|||
|
|
|
|||
|
|
times.append((end - start) * 1000) # ms
|
|||
|
|
|
|||
|
|
if (i + 1) % 10 == 0:
|
|||
|
|
print(f" 已处理: {i+1}/{num_samples}")
|
|||
|
|
|
|||
|
|
# 统计
|
|||
|
|
times = np.array(times)
|
|||
|
|
|
|||
|
|
print("\n" + "=" * 80)
|
|||
|
|
print("推理性能统计")
|
|||
|
|
print("=" * 80)
|
|||
|
|
print(f"平均推理时间: {np.mean(times):.2f} ms")
|
|||
|
|
print(f"中位数: {np.median(times):.2f} ms")
|
|||
|
|
print(f"最小值: {np.min(times):.2f} ms")
|
|||
|
|
print(f"最大值: {np.max(times):.2f} ms")
|
|||
|
|
print(f"标准差: {np.std(times):.2f} ms")
|
|||
|
|
print(f"P95: {np.percentile(times, 95):.2f} ms")
|
|||
|
|
print(f"P99: {np.percentile(times, 99):.2f} ms")
|
|||
|
|
print(f"\n吞吐量: {1000/np.mean(times):.2f} FPS")
|
|||
|
|
print("=" * 80)
|
|||
|
|
|
|||
|
|
return times
|
|||
|
|
|
|||
|
|
if __name__ == '__main__':
|
|||
|
|
import sys
|
|||
|
|
|
|||
|
|
if len(sys.argv) < 3:
|
|||
|
|
print("用法: python profile_inference.py <config> <checkpoint> [num_samples]")
|
|||
|
|
sys.exit(1)
|
|||
|
|
|
|||
|
|
config_file = sys.argv[1]
|
|||
|
|
checkpoint_file = sys.argv[2]
|
|||
|
|
num_samples = int(sys.argv[3]) if len(sys.argv) > 3 else 100
|
|||
|
|
|
|||
|
|
times = profile_inference(config_file, checkpoint_file, num_samples)
|
|||
|
|
|
|||
|
|
print("\nProfileing完成!")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Step 3: 敏感度分析
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# tools/analysis/sensitivity_analysis.py
|
|||
|
|
|
|||
|
|
import torch
|
|||
|
|
import torch.nn as nn
|
|||
|
|
import copy
|
|||
|
|
from tqdm import tqdm
|
|||
|
|
from mmcv import Config
|
|||
|
|
from mmdet3d.models import build_model
|
|||
|
|
from mmdet3d.datasets import build_dataloader, build_dataset
|
|||
|
|
from mmdet3d.apis import single_gpu_test
|
|||
|
|
|
|||
|
|
def prune_layer_channels(model, layer_name, ratio=0.5):
|
|||
|
|
"""临时剪枝指定层的通道"""
|
|||
|
|
# 这里简化处理,实际需要根据层类型处理
|
|||
|
|
pruned_model = copy.deepcopy(model)
|
|||
|
|
|
|||
|
|
# 找到目标层并剪枝
|
|||
|
|
for name, module in pruned_model.named_modules():
|
|||
|
|
if name == layer_name:
|
|||
|
|
if isinstance(module, nn.Conv2d):
|
|||
|
|
# 简化:只保留前50%的通道
|
|||
|
|
out_channels = module.out_channels
|
|||
|
|
keep_channels = int(out_channels * (1 - ratio))
|
|||
|
|
# 这里需要实际的剪枝实现
|
|||
|
|
pass
|
|||
|
|
|
|||
|
|
return pruned_model
|
|||
|
|
|
|||
|
|
def evaluate_model(model, data_loader):
|
|||
|
|
"""快速评估模型"""
|
|||
|
|
model.eval()
|
|||
|
|
results = []
|
|||
|
|
|
|||
|
|
with torch.no_grad():
|
|||
|
|
for data in tqdm(data_loader, desc="Evaluating"):
|
|||
|
|
result = model(return_loss=False, rescale=True, **data)
|
|||
|
|
results.extend(result)
|
|||
|
|
|
|||
|
|
# 简化:返回平均分数(实际需要计算mAP/NDS)
|
|||
|
|
return len(results) # 占位符
|
|||
|
|
|
|||
|
|
def analyze_sensitivity(config_file, checkpoint_file, prune_ratio=0.5):
|
|||
|
|
"""分析各层剪枝敏感度"""
|
|||
|
|
|
|||
|
|
print("加载模型...")
|
|||
|
|
cfg = Config.fromfile(config_file)
|
|||
|
|
model = build_model(cfg.model).cuda()
|
|||
|
|
checkpoint = torch.load(checkpoint_file)
|
|||
|
|
model.load_state_dict(checkpoint['state_dict'])
|
|||
|
|
|
|||
|
|
# 构建数据集(使用少量样本快速测试)
|
|||
|
|
print("构建数据集...")
|
|||
|
|
cfg.data.val.ann_file = cfg.data.val.ann_file # 使用mini val set
|
|||
|
|
dataset = build_dataset(cfg.data.val)
|
|||
|
|
data_loader = build_dataloader(
|
|||
|
|
dataset,
|
|||
|
|
samples_per_gpu=1,
|
|||
|
|
workers_per_gpu=0,
|
|||
|
|
dist=False,
|
|||
|
|
shuffle=False
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Baseline性能
|
|||
|
|
print("\n评估baseline性能...")
|
|||
|
|
baseline_score = evaluate_model(model, data_loader)
|
|||
|
|
print(f"Baseline score: {baseline_score}")
|
|||
|
|
|
|||
|
|
# 分析各层敏感度
|
|||
|
|
sensitivities = {}
|
|||
|
|
|
|||
|
|
print(f"\n开始敏感度分析 (剪枝比例: {prune_ratio})...")
|
|||
|
|
for name, module in tqdm(model.named_modules()):
|
|||
|
|
# 只分析Conv2d层
|
|||
|
|
if not isinstance(module, nn.Conv2d):
|
|||
|
|
continue
|
|||
|
|
|
|||
|
|
if module.out_channels < 64: # 跳过小层
|
|||
|
|
continue
|
|||
|
|
|
|||
|
|
print(f"\n测试层: {name}")
|
|||
|
|
|
|||
|
|
# 临时剪枝该层
|
|||
|
|
pruned_model = prune_layer_channels(model, name, prune_ratio)
|
|||
|
|
|
|||
|
|
# 评估
|
|||
|
|
pruned_score = evaluate_model(pruned_model, data_loader)
|
|||
|
|
|
|||
|
|
# 计算敏感度
|
|||
|
|
sensitivity = baseline_score - pruned_score
|
|||
|
|
sensitivities[name] = sensitivity
|
|||
|
|
|
|||
|
|
print(f" 剪枝后score: {pruned_score}")
|
|||
|
|
print(f" 敏感度: {sensitivity:.4f}")
|
|||
|
|
|
|||
|
|
del pruned_model
|
|||
|
|
|
|||
|
|
# 排序并保存
|
|||
|
|
sorted_sens = sorted(sensitivities.items(), key=lambda x: x[1])
|
|||
|
|
|
|||
|
|
print("\n" + "=" * 80)
|
|||
|
|
print("敏感度排序 (从低到高,低敏感度=易剪枝)")
|
|||
|
|
print("=" * 80)
|
|||
|
|
for name, sens in sorted_sens[:20]: # 显示前20个
|
|||
|
|
print(f"{name:60s}: {sens:.4f}")
|
|||
|
|
|
|||
|
|
return sensitivities
|
|||
|
|
|
|||
|
|
if __name__ == '__main__':
|
|||
|
|
import sys
|
|||
|
|
|
|||
|
|
if len(sys.argv) < 3:
|
|||
|
|
print("用法: python sensitivity_analysis.py <config> <checkpoint>")
|
|||
|
|
sys.exit(1)
|
|||
|
|
|
|||
|
|
config_file = sys.argv[1]
|
|||
|
|
checkpoint_file = sys.argv[2]
|
|||
|
|
|
|||
|
|
sensitivities = analyze_sensitivity(config_file, checkpoint_file)
|
|||
|
|
|
|||
|
|
# 保存结果
|
|||
|
|
import json
|
|||
|
|
with open('sensitivity_results.json', 'w') as f:
|
|||
|
|
json.dump(sensitivities, f, indent=2)
|
|||
|
|
|
|||
|
|
print("\n敏感度分析完成!结果已保存到 sensitivity_results.json")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 立即执行的命令
|
|||
|
|
|
|||
|
|
### 1. 模型复杂度分析(5分钟)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
cd /workspace/bevfusion
|
|||
|
|
|
|||
|
|
# 创建分析目录
|
|||
|
|
mkdir -p tools/analysis
|
|||
|
|
|
|||
|
|
# 创建并运行分析脚本
|
|||
|
|
python tools/analysis/model_complexity.py \
|
|||
|
|
configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/multitask_enhanced_phase1_HIGHRES.yaml \
|
|||
|
|
runs/enhanced_from_epoch19/epoch_23.pth \
|
|||
|
|
> analysis_results/model_complexity.txt
|
|||
|
|
|
|||
|
|
cat analysis_results/model_complexity.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 推理性能Profiling(15分钟)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Profiling推理性能
|
|||
|
|
python tools/analysis/profile_inference.py \
|
|||
|
|
configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/multitask_enhanced_phase1_HIGHRES.yaml \
|
|||
|
|
runs/enhanced_from_epoch19/epoch_23.pth \
|
|||
|
|
100 \
|
|||
|
|
> analysis_results/inference_profile.txt
|
|||
|
|
|
|||
|
|
cat analysis_results/inference_profile.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 敏感度分析(1-2小时,可选)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 敏感度分析(使用mini val set快速测试)
|
|||
|
|
python tools/analysis/sensitivity_analysis.py \
|
|||
|
|
configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/multitask_enhanced_phase1_HIGHRES.yaml \
|
|||
|
|
runs/enhanced_from_epoch19/epoch_23.pth \
|
|||
|
|
> analysis_results/sensitivity_analysis.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 预期分析结果
|
|||
|
|
|
|||
|
|
基于BEVFusion架构,预期结果:
|
|||
|
|
|
|||
|
|
### 模型复杂度
|
|||
|
|
```
|
|||
|
|
总参数量: ~110M
|
|||
|
|
- Camera Encoder (SwinT): ~47M (43%) ← 最大模块
|
|||
|
|
- LiDAR Encoder: ~19M (17%)
|
|||
|
|
- Fuser: ~2M (2%)
|
|||
|
|
- Decoder: ~16M (14%)
|
|||
|
|
- Detection Head: ~18M (16%)
|
|||
|
|
- Segmentation Head: ~8M (7%)
|
|||
|
|
|
|||
|
|
FLOPs: ~450 GFLOPs
|
|||
|
|
模型大小: ~441 MB (FP32)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 推理性能 (A100)
|
|||
|
|
```
|
|||
|
|
平均推理时间: ~90ms
|
|||
|
|
- Camera branch: ~40ms (44%) ← 最大瓶颈
|
|||
|
|
- LiDAR branch: ~17ms (19%)
|
|||
|
|
- Fusion + Decoder: ~15ms (17%)
|
|||
|
|
- Heads: ~18ms (20%)
|
|||
|
|
|
|||
|
|
吞吐量: ~11 FPS
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 优化潜力
|
|||
|
|
```
|
|||
|
|
1. Camera Encoder剪枝
|
|||
|
|
- 潜力: 减少40-50%参数
|
|||
|
|
- 加速: 20-30%
|
|||
|
|
- 敏感度: 中等
|
|||
|
|
|
|||
|
|
2. Decoder简化
|
|||
|
|
- 潜力: 减少30-40%参数
|
|||
|
|
- 加速: 10-15%
|
|||
|
|
- 敏感度: 低
|
|||
|
|
|
|||
|
|
3. INT8量化
|
|||
|
|
- 加速: 2-3倍
|
|||
|
|
- 精度损失: <2%
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 今天的目标
|
|||
|
|
|
|||
|
|
### 必须完成
|
|||
|
|
- [ ] 创建分析工具脚本
|
|||
|
|
- [ ] 运行模型复杂度分析
|
|||
|
|
- [ ] 运行推理性能profiling
|
|||
|
|
- [ ] 生成分析报告
|
|||
|
|
|
|||
|
|
### 可选
|
|||
|
|
- [ ] 敏感度分析(如果时间允许)
|
|||
|
|
- [ ] 确定剪枝策略
|
|||
|
|
- [ ] 准备剪枝工具
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📅 后续7天计划
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Day 1 (今天):
|
|||
|
|
✓ 模型分析
|
|||
|
|
✓ Profiling
|
|||
|
|
✓ 确定优化策略
|
|||
|
|
|
|||
|
|
Day 2-3:
|
|||
|
|
→ 实施剪枝
|
|||
|
|
→ 剪枝模型微调(3 epochs)
|
|||
|
|
|
|||
|
|
Day 4:
|
|||
|
|
→ 评估剪枝模型
|
|||
|
|
→ PTQ量化测试
|
|||
|
|
|
|||
|
|
Day 5-6:
|
|||
|
|
→ QAT量化训练(5 epochs)
|
|||
|
|
|
|||
|
|
Day 7:
|
|||
|
|
→ 评估量化模型
|
|||
|
|
→ 生成优化报告
|
|||
|
|
→ 准备TensorRT转换
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 立即开始
|
|||
|
|
|
|||
|
|
**当前Stage 1训练正在进行**(GPU 0-3),**可以并行进行模型分析**(GPU 4-7或CPU)
|
|||
|
|
|
|||
|
|
### 创建分析工具
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
cd /workspace/bevfusion
|
|||
|
|
mkdir -p tools/analysis
|
|||
|
|
mkdir -p analysis_results
|
|||
|
|
|
|||
|
|
# 创建分析脚本(见上面的Python代码)
|
|||
|
|
# 然后运行分析
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**状态**: 🚀 准备开始模型优化
|
|||
|
|
**重点**: 先分析,再优化
|
|||
|
|
**并行**: 不影响Stage 1训练
|
|||
|
|
|