166 lines
4.0 KiB
Markdown
166 lines
4.0 KiB
Markdown
|
|
# Evaluation优化策略 - 减少测评样本到3000张
|
|||
|
|
|
|||
|
|
## 当前状态
|
|||
|
|
- **验证集总样本**: 6,019个
|
|||
|
|
- **目标样本**: 3,000个
|
|||
|
|
- **减少比例**: ~50%
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 策略1: 使用load_interval参数 (推荐 ✅)
|
|||
|
|
|
|||
|
|
### 原理
|
|||
|
|
`load_interval`参数会对数据集进行均匀采样:
|
|||
|
|
- `load_interval=1`: 使用全部样本 (6,019个)
|
|||
|
|
- `load_interval=2`: 每隔1个采样,得到3,010个样本
|
|||
|
|
- `load_interval=3`: 每隔2个采样,得到2,007个样本
|
|||
|
|
|
|||
|
|
### 实现方式
|
|||
|
|
在配置文件的`data.val`中添加`load_interval: 2`
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
data:
|
|||
|
|
val:
|
|||
|
|
type: ${dataset_type}
|
|||
|
|
dataset_root: ${dataset_root}
|
|||
|
|
ann_file: ${dataset_root + "nuscenes_infos_val.pkl"}
|
|||
|
|
load_interval: 2 # ⬅️ 添加这一行,采样50%
|
|||
|
|
pipeline: ${test_pipeline}
|
|||
|
|
object_classes: ${object_classes}
|
|||
|
|
map_classes: ${map_classes}
|
|||
|
|
modality: ${input_modality}
|
|||
|
|
test_mode: false
|
|||
|
|
box_type_3d: LiDAR
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 优点
|
|||
|
|
- ✅ 简单快速,修改1行配置
|
|||
|
|
- ✅ 均匀采样,覆盖全部场景
|
|||
|
|
- ✅ 不需要创建新文件
|
|||
|
|
- ✅ 可灵活调整 (interval=2/3/4)
|
|||
|
|
|
|||
|
|
### 缺点
|
|||
|
|
- ⚠️ 采样比例固定 (只能是1/2, 1/3, 1/4...)
|
|||
|
|
- ⚠️ 无法精确控制为3000个
|
|||
|
|
|
|||
|
|
### 预期效果
|
|||
|
|
- 样本数: 6,019 ÷ 2 = **3,010个** ✅
|
|||
|
|
- .eval_hook大小: 75GB ÷ 2 = **~37.5GB**
|
|||
|
|
- 评估时间: 减少50%
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 策略2: 创建子集PKL文件 (精确控制)
|
|||
|
|
|
|||
|
|
### 原理
|
|||
|
|
从原始validation pkl中提取前3000个样本,创建新的索引文件
|
|||
|
|
|
|||
|
|
### 实现步骤
|
|||
|
|
|
|||
|
|
#### Step 1: 创建3000样本的子集
|
|||
|
|
```python
|
|||
|
|
import pickle
|
|||
|
|
|
|||
|
|
# 读取原始validation索引
|
|||
|
|
with open('data/nuscenes/nuscenes_infos_val.pkl', 'rb') as f:
|
|||
|
|
val_infos = pickle.load(f)
|
|||
|
|
|
|||
|
|
# 提取前3000个样本
|
|||
|
|
val_infos_3k = {
|
|||
|
|
'infos': val_infos['infos'][:3000],
|
|||
|
|
'metadata': val_infos['metadata']
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
# 保存新索引
|
|||
|
|
with open('data/nuscenes/nuscenes_infos_val_3k.pkl', 'wb') as f:
|
|||
|
|
pickle.dump(val_infos_3k, f)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Step 2: 修改配置使用新文件
|
|||
|
|
```yaml
|
|||
|
|
data:
|
|||
|
|
val:
|
|||
|
|
type: ${dataset_type}
|
|||
|
|
dataset_root: ${dataset_root}
|
|||
|
|
ann_file: ${dataset_root + "nuscenes_infos_val_3k.pkl"} # ⬅️ 使用3k子集
|
|||
|
|
pipeline: ${test_pipeline}
|
|||
|
|
object_classes: ${object_classes}
|
|||
|
|
map_classes: ${map_classes}
|
|||
|
|
modality: ${input_modality}
|
|||
|
|
test_mode: false
|
|||
|
|
box_type_3d: LiDAR
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 优点
|
|||
|
|
- ✅ 精确控制样本数 (恰好3000个)
|
|||
|
|
- ✅ 可以选择特定场景 (如白天/夜晚)
|
|||
|
|
- ✅ 可重复使用
|
|||
|
|
|
|||
|
|
### 缺点
|
|||
|
|
- ⚠️ 需要创建新文件 (~128MB)
|
|||
|
|
- ⚠️ 仅采样前3000个,可能不够均匀
|
|||
|
|
- ⚠️ 多一个维护的文件
|
|||
|
|
|
|||
|
|
### 预期效果
|
|||
|
|
- 样本数: **恰好3,000个** ✅
|
|||
|
|
- .eval_hook大小: **~37GB**
|
|||
|
|
- 评估时间: 减少50%
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 推荐方案: 策略1 + load_interval=2
|
|||
|
|
|
|||
|
|
### 原因
|
|||
|
|
1. **简单**: 仅修改1行配置
|
|||
|
|
2. **均匀**: 覆盖全部6019个样本的分布
|
|||
|
|
3. **灵活**: 可随时调整interval值
|
|||
|
|
4. **效果**: 3010个样本已满足评估需求
|
|||
|
|
|
|||
|
|
### 实施计划
|
|||
|
|
```yaml
|
|||
|
|
# multitask_BEV2X_phase4a_stage1.yaml
|
|||
|
|
|
|||
|
|
# 覆盖default.yaml的data.val配置
|
|||
|
|
data:
|
|||
|
|
val:
|
|||
|
|
load_interval: 2 # 减少50%样本
|
|||
|
|
|
|||
|
|
evaluation:
|
|||
|
|
interval: 10 # 从5改为10,进一步减少评估频率
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**组合效果**:
|
|||
|
|
- 原方案: 20 epochs × 4次评估(interval=5) × 6019样本 = 24,076次样本评估
|
|||
|
|
- 新方案: 20 epochs × 2次评估(interval=10) × 3010样本 = 6,020次样本评估
|
|||
|
|
- **减少75%的评估开销** ✅
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 对比表
|
|||
|
|
|
|||
|
|
| 指标 | 原始配置 | 策略1 (load_interval=2) | 策略2 (3k pkl) |
|
|||
|
|
|------|---------|------------------------|----------------|
|
|||
|
|
| **样本数** | 6,019 | 3,010 | 3,000 |
|
|||
|
|
| **采样方式** | 全部 | 均匀采样 | 连续前3000 |
|
|||
|
|
| **修改复杂度** | - | 1行配置 | 创建文件+修改配置 |
|
|||
|
|
| **.eval_hook大小** | 75GB | 37.5GB | 37GB |
|
|||
|
|
| **评估时间** | 100% | 50% | 50% |
|
|||
|
|
| **灵活性** | - | 高 (可调interval) | 低 |
|
|||
|
|
| **推荐度** | - | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 实施代码
|
|||
|
|
|
|||
|
|
### 使用策略1 (推荐)
|
|||
|
|
```bash
|
|||
|
|
# 将自动修改配置文件
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 使用策略2 (可选)
|
|||
|
|
```bash
|
|||
|
|
# 运行创建3k子集的脚本
|
|||
|
|
python tools/create_val_subset.py --samples 3000
|
|||
|
|
```
|
|||
|
|
|