103 lines
2.5 KiB
Markdown
103 lines
2.5 KiB
Markdown
|
|
# Phase 4A 分辨率问题分析
|
|||
|
|
|
|||
|
|
**时间**: 2025-10-30
|
|||
|
|
**状态**: 已识别问题
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 问题描述
|
|||
|
|
|
|||
|
|
训练启动时出现shape不匹配错误:
|
|||
|
|
```
|
|||
|
|
ValueError: Target size (torch.Size([1, 800, 800])) must be the same as input size (torch.Size([1, 400, 400]))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 根本原因
|
|||
|
|
|
|||
|
|
### BEV变换流程
|
|||
|
|
|
|||
|
|
1. **输入**: FPN输出特征 (B, 512, 360, 360) @ 0.15m分辨率
|
|||
|
|
2. **BEVGridTransform**: 使用`output_scope: [[-50, 50, 0.125], [-50, 50, 0.125]]` → (B, 512, 800, 800)
|
|||
|
|
3. **ASPP + Attention**: 保持空间维度 → (B, 256, 800, 800)
|
|||
|
|
4. **Decoder**: 当前配置会downsample!
|
|||
|
|
5. **最终输出**: (B, num_classes, 400, 400) ❌ 期望800×800
|
|||
|
|
|
|||
|
|
### Decoder问题
|
|||
|
|
|
|||
|
|
当前decoder配置:
|
|||
|
|
```python
|
|||
|
|
decoder_channels = [256, 256, 128, 128] # 4层
|
|||
|
|
|
|||
|
|
decoder_layers = []
|
|||
|
|
for i in range(len(decoder_channels)):
|
|||
|
|
in_ch = decoder_channels[i - 1] if i > 0 else decoder_channels[0]
|
|||
|
|
out_ch = decoder_channels[i]
|
|||
|
|
|
|||
|
|
decoder_layers.append(nn.Sequential(
|
|||
|
|
nn.Conv2d(in_ch, out_ch, 3, padding=1, stride=1, bias=False), # stride=1,维持尺寸
|
|||
|
|
nn.GroupNorm(32, out_ch),
|
|||
|
|
nn.ReLU(True),
|
|||
|
|
))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**问题**: Decoder没有明确的上采样,如果transform后的特征经过某些下采样操作,无法恢复到800×800
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 解决方案
|
|||
|
|
|
|||
|
|
### 方案A: 简化为直接匹配(推荐)
|
|||
|
|
|
|||
|
|
**修改策略**: decoder之后不应改变空间维度
|
|||
|
|
|
|||
|
|
1. 确保所有decoder层都使用 `stride=1`
|
|||
|
|
2. 不使用池化层
|
|||
|
|
3. 输出直接匹配GT标签的800×800
|
|||
|
|
|
|||
|
|
**修改**: 在EnhancedBEVSegmentationHead中添加最终上采样层
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def forward(self, x):
|
|||
|
|
# ...existing code...
|
|||
|
|
x = self.decoder(x)
|
|||
|
|
|
|||
|
|
# 确保输出维度与target匹配
|
|||
|
|
if x.shape[-2:] != target.shape[-2:]:
|
|||
|
|
x = F.interpolate(x, size=target.shape[-2:], mode='bilinear', align_corners=False)
|
|||
|
|
|
|||
|
|
# classification
|
|||
|
|
outputs = []
|
|||
|
|
for classifier in self.classifiers:
|
|||
|
|
outputs.append(classifier(x))
|
|||
|
|
pred = torch.cat(outputs, dim=1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 方案B: 降低GT标签分辨率(临时)
|
|||
|
|
|
|||
|
|
将GT标签分辨率降回到400×400:
|
|||
|
|
```yaml
|
|||
|
|
train_pipeline:
|
|||
|
|
LoadBEVSegmentation:
|
|||
|
|
xbound: [-50.0, 50.0, 0.25] # 400×400
|
|||
|
|
ybound: [-50.0, 50.0, 0.25]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**缺点**: 失去了高分辨率的优势
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 推荐行动
|
|||
|
|
|
|||
|
|
1. 使用方案A:在decoder后添加自适应插值
|
|||
|
|
2. 保持GT标签800×800的高分辨率
|
|||
|
|
3. 让模型输出自动匹配target尺寸
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 下一步
|
|||
|
|
|
|||
|
|
修改`mmdet3d/models/heads/segm/enhanced.py`的forward方法,添加自适应插值。
|
|||
|
|
|