2.5 KiB

Raw Blame History

Phase 4A 分辨率问题分析

时间: 2025-10-30
状态: 已识别问题

问题描述

训练启动时出现shape不匹配错误：

ValueError: Target size (torch.Size([1, 800, 800])) must be the same as input size (torch.Size([1, 400, 400]))

根本原因

BEV变换流程

输入: FPN输出特征 (B, 512, 360, 360) @ 0.15m分辨率
BEVGridTransform: 使用output_scope: [[-50, 50, 0.125], [-50, 50, 0.125]] → (B, 512, 800, 800)
ASPP + Attention: 保持空间维度 → (B, 256, 800, 800)
Decoder: 当前配置会downsample！
最终输出: (B, num_classes, 400, 400) ❌ 期望800×800

Decoder问题

当前decoder配置：

decoder_channels = [256, 256, 128, 128]  # 4层

decoder_layers = []
for i in range(len(decoder_channels)):
    in_ch = decoder_channels[i - 1] if i > 0 else decoder_channels[0]
    out_ch = decoder_channels[i]
    
    decoder_layers.append(nn.Sequential(
        nn.Conv2d(in_ch, out_ch, 3, padding=1, stride=1, bias=False),  # stride=1，维持尺寸
        nn.GroupNorm(32, out_ch),
        nn.ReLU(True),
    ))

问题: Decoder没有明确的上采样，如果transform后的特征经过某些下采样操作，无法恢复到800×800

解决方案

方案A: 简化为直接匹配（推荐）

修改策略: decoder之后不应改变空间维度

确保所有decoder层都使用 stride=1
不使用池化层
输出直接匹配GT标签的800×800

修改: 在EnhancedBEVSegmentationHead中添加最终上采样层

def forward(self, x):
    # ...existing code...
    x = self.decoder(x)
    
    # 确保输出维度与target匹配
    if x.shape[-2:] != target.shape[-2:]:
        x = F.interpolate(x, size=target.shape[-2:], mode='bilinear', align_corners=False)
    
    # classification
    outputs = []
    for classifier in self.classifiers:
        outputs.append(classifier(x))
    pred = torch.cat(outputs, dim=1)

方案B: 降低GT标签分辨率（临时）

将GT标签分辨率降回到400×400：

train_pipeline:
  LoadBEVSegmentation:
    xbound: [-50.0, 50.0, 0.25]  # 400×400
    ybound: [-50.0, 50.0, 0.25]

缺点: 失去了高分辨率的优势

下一步

修改mmdet3d/models/heads/segm/enhanced.py的forward方法，添加自适应插值。

2.5 KiB Raw Blame History Unescape Escape