969 lines
25 KiB
Markdown
969 lines
25 KiB
Markdown
# MapTR 代码深度研究报告
|
||
|
||
**研究时间**:2025-10-22
|
||
**MapTR版本**:main分支 + maptrv2分支
|
||
**代码位置**:/workspace/MapTR
|
||
**核心代码**:3540行Python代码
|
||
|
||
---
|
||
|
||
## 📊 MapTR项目概况
|
||
|
||
### 项目信息
|
||
- **论文**:ICLR 2023 Spotlight
|
||
- **扩展版**:MapTRv2 (IJCV 2024)
|
||
- **功能**:在线矢量化高精地图构建
|
||
- **性能**:nuScenes mAP 50-73%(不同配置)
|
||
- **速度**:14-35 FPS(RTX 3090)
|
||
|
||
### 核心创新
|
||
1. **统一的点集建模**:将地图元素建模为点集
|
||
2. **层次化Query Embedding**:灵活编码结构化地图信息
|
||
3. **端到端学习**:从图像直接预测矢量地图
|
||
|
||
---
|
||
|
||
## 📁 代码结构
|
||
|
||
### 目录组织
|
||
```
|
||
MapTR/
|
||
├── projects/mmdet3d_plugin/maptr/ # 核心代码
|
||
│ ├── dense_heads/
|
||
│ │ └── maptr_head.py ★ 35KB 核心Head实现
|
||
│ ├── detectors/
|
||
│ │ └── maptr.py ★ 19KB 主模型
|
||
│ ├── modules/
|
||
│ │ ├── decoder.py ★ 3KB Transformer解码器
|
||
│ │ ├── encoder.py ★ 12KB BEV编码器
|
||
│ │ ├── transformer.py ★ 15KB 整体Transformer
|
||
│ │ └── geometry_kernel_attention.py ★ 23KB 几何注意力
|
||
│ ├── losses/
|
||
│ │ └── map_loss.py ★ 26KB 损失函数
|
||
│ └── assigners/
|
||
│ └── maptr_assigner.py ★ 9KB Hungarian匹配
|
||
│
|
||
├── mmdetection3d/ # 依赖的mmdet3d
|
||
├── tools/ # 训练测试工具
|
||
│ └── maptr/
|
||
│ ├── test.py
|
||
│ └── vis_pred.py # 可视化工具
|
||
└── configs/ # 配置文件
|
||
└── maptr/
|
||
```
|
||
|
||
**总代码量**:3540行Python代码(核心部分)
|
||
|
||
---
|
||
|
||
## 🔍 核心组件详解
|
||
|
||
### 1. MapTRHead(★ 最核心)
|
||
|
||
**文件**:`projects/mmdet3d_plugin/maptr/dense_heads/maptr_head.py` (35KB)
|
||
|
||
#### 关键参数
|
||
```python
|
||
class MapTRHead(DETRHead):
|
||
def __init__(
|
||
self,
|
||
num_vec=20, # 预测的矢量数量
|
||
num_pts_per_vec=20, # 每个矢量的点数
|
||
num_pts_per_gt_vec=2, # GT矢量的固定点数
|
||
query_embed_type='all_pts', # Query类型
|
||
bev_h=30, # BEV高度(网格)
|
||
bev_w=30, # BEV宽度(网格)
|
||
loss_pts=dict( # 点损失(Chamfer Distance)
|
||
type='ChamferDistance',
|
||
loss_src_weight=1.0,
|
||
loss_dst_weight=1.0
|
||
),
|
||
loss_dir=dict( # 方向损失
|
||
type='PtsDirCosLoss',
|
||
loss_weight=2.0
|
||
),
|
||
...
|
||
)
|
||
```
|
||
|
||
#### 核心方法分析
|
||
|
||
**1.1 Forward方法**
|
||
```python
|
||
def forward(self, mlvl_feats, lidar_feat, img_metas, prev_bev=None):
|
||
"""
|
||
输入:
|
||
mlvl_feats: 多层级图像特征 (B, N, C, H, W)
|
||
lidar_feat: LiDAR特征(可选)
|
||
img_metas: 元数据
|
||
prev_bev: 前一帧BEV(时序)
|
||
|
||
输出:
|
||
bev_embed: BEV特征嵌入
|
||
hs: 隐藏状态 (num_layers, num_query, bs, embed_dims)
|
||
init_reference: 初始参考点
|
||
inter_references: 中间参考点
|
||
|
||
流程:
|
||
1. 构建Query Embedding
|
||
2. Transformer编码BEV特征
|
||
3. Transformer解码得到矢量预测
|
||
4. 多层输出(用于深度监督)
|
||
"""
|
||
# Query Embedding
|
||
object_query_embeds = self.query_embedding.weight
|
||
# num_query = num_vec × num_pts_per_vec
|
||
# 例如: 20个矢量 × 20个点 = 400个query
|
||
|
||
# Transformer
|
||
outputs = self.transformer(
|
||
mlvl_feats, # 图像特征
|
||
lidar_feat, # LiDAR特征
|
||
bev_queries, # BEV query
|
||
object_query_embeds, # 矢量query
|
||
...
|
||
)
|
||
|
||
# 解析输出
|
||
bev_embed, hs, init_reference, inter_references = outputs
|
||
|
||
# 分类和回归
|
||
for lvl in range(hs.shape[0]): # 每个decoder层
|
||
outputs_class = self.cls_branches[lvl](hs[lvl]) # 分类
|
||
outputs_coord = self.reg_branches[lvl](hs[lvl]) # 点坐标
|
||
|
||
return all_cls_scores, all_pts_preds
|
||
```
|
||
|
||
**1.2 Loss方法**
|
||
```python
|
||
def loss(self, gt_bboxes_list, gt_labels_list, ...):
|
||
"""
|
||
核心损失函数
|
||
|
||
包括:
|
||
1. 分类损失 (FocalLoss)
|
||
2. 点坐标损失 (Chamfer Distance)
|
||
3. 方向损失 (Cosine Loss)
|
||
4. Hungarian匹配
|
||
|
||
关键步骤:
|
||
1. Hungarian匹配预测和GT
|
||
2. 计算matched的loss
|
||
3. 处理unmatched(背景)
|
||
"""
|
||
# Hungarian匹配
|
||
cls_reg_targets = self.get_targets(
|
||
gt_bboxes_list, gt_labels_list, ...)
|
||
|
||
# 分类损失
|
||
loss_cls = self.loss_cls(
|
||
cls_scores, labels, ...)
|
||
|
||
# 点损失(Chamfer Distance)
|
||
loss_pts = self.loss_pts(
|
||
pts_preds, pts_targets, ...)
|
||
|
||
# 方向损失
|
||
loss_dir = self.loss_dir(
|
||
pts_preds, pts_targets, ...)
|
||
|
||
return loss_dict
|
||
```
|
||
|
||
---
|
||
|
||
### 2. Transformer结构
|
||
|
||
**文件**:`projects/mmdet3d_plugin/maptr/modules/`
|
||
|
||
#### 2.1 BEV Encoder
|
||
```python
|
||
# encoder.py
|
||
class BEVFormerEncoder(BaseModule):
|
||
"""
|
||
将多视角图像特征转换为BEV特征
|
||
|
||
支持多种方式:
|
||
- GKT (Geometry Kernel Attention)
|
||
- BEVFormer
|
||
- BEVPool (BEVFusion方式)
|
||
"""
|
||
```
|
||
|
||
#### 2.2 Decoder
|
||
```python
|
||
# decoder.py
|
||
class MapTRDecoder(TransformerLayerSequence):
|
||
"""
|
||
基于DETR的解码器
|
||
|
||
功能:
|
||
- Query-based检测
|
||
- 迭代refinement
|
||
- 输出多层预测(深度监督)
|
||
|
||
核心:
|
||
- 输入: query embedding
|
||
- 输出: 更新后的query(包含矢量信息)
|
||
"""
|
||
|
||
def forward(self, query, reference_points, reg_branches=None):
|
||
# 多层Transformer Decoder
|
||
for lid, layer in enumerate(self.layers):
|
||
output = layer(output, reference_points=reference_points)
|
||
|
||
# Iterative refinement
|
||
if reg_branches is not None:
|
||
tmp = reg_branches[lid](output)
|
||
new_reference_points = tmp + reference_points
|
||
reference_points = new_reference_points.sigmoid()
|
||
|
||
return output, reference_points
|
||
```
|
||
|
||
---
|
||
|
||
### 3. 损失函数
|
||
|
||
**文件**:`projects/mmdet3d_plugin/maptr/losses/map_loss.py` (26KB)
|
||
|
||
#### 3.1 Chamfer Distance Loss
|
||
```python
|
||
@LOSSES.register_module()
|
||
class ChamferDistance(nn.Module):
|
||
"""
|
||
Chamfer距离:点集之间的双向最近点距离
|
||
|
||
计算公式:
|
||
CD(P, Q) = Σ min||p-q|| + Σ min||q-p||
|
||
p∈P q∈Q q∈Q p∈P
|
||
|
||
用于:
|
||
- 衡量预测矢量和GT矢量的相似度
|
||
- 允许点的顺序不同
|
||
"""
|
||
def forward(self, src, tgt):
|
||
# 计算距离矩阵
|
||
dist = torch.cdist(src, tgt) # (N, M)
|
||
|
||
# 单向Chamfer
|
||
loss_src = dist.min(dim=1)[0].mean() # 从预测到GT
|
||
loss_tgt = dist.min(dim=0)[0].mean() # 从GT到预测
|
||
|
||
# 总损失
|
||
loss = loss_src * loss_src_weight + loss_tgt * loss_dst_weight
|
||
return loss
|
||
```
|
||
|
||
#### 3.2 方向损失
|
||
```python
|
||
@LOSSES.register_module()
|
||
class PtsDirCosLoss(nn.Module):
|
||
"""
|
||
点方向余弦损失
|
||
|
||
目的:
|
||
- 确保预测的矢量方向正确
|
||
- 使用余弦相似度
|
||
"""
|
||
def forward(self, pts_pred, pts_gt):
|
||
# 计算方向向量
|
||
dir_pred = pts_pred[:, 1:] - pts_pred[:, :-1]
|
||
dir_gt = pts_gt[:, 1:] - pts_gt[:, :-1]
|
||
|
||
# 余弦相似度
|
||
cos_sim = F.cosine_similarity(dir_pred, dir_gt, dim=-1)
|
||
loss = 1 - cos_sim
|
||
return loss.mean()
|
||
```
|
||
|
||
---
|
||
|
||
### 4. Hungarian匹配
|
||
|
||
**文件**:`projects/mmdet3d_plugin/maptr/assigners/maptr_assigner.py` (9KB)
|
||
|
||
```python
|
||
class MapTRAssigner:
|
||
"""
|
||
Hungarian匹配算法
|
||
为每个GT找到最佳匹配的预测
|
||
|
||
Cost矩阵 = 分类cost + 点坐标cost + 方向cost
|
||
"""
|
||
def assign(self, bbox_pred, cls_pred, gt_bboxes, gt_labels):
|
||
# 计算cost矩阵
|
||
cls_cost = self.cls_cost(cls_pred, gt_labels)
|
||
pts_cost = self.pts_cost(bbox_pred, gt_bboxes)
|
||
dir_cost = self.dir_cost(bbox_pred, gt_bboxes)
|
||
|
||
# 总cost
|
||
cost = cls_cost + pts_cost + dir_cost
|
||
|
||
# Hungarian算法
|
||
from scipy.optimize import linear_sum_assignment
|
||
matched_row_inds, matched_col_inds = linear_sum_assignment(cost.cpu())
|
||
|
||
return matched_row_inds, matched_col_inds
|
||
```
|
||
|
||
---
|
||
|
||
## 🔧 集成到BEVFusion的方案
|
||
|
||
### 关键发现
|
||
|
||
**MapTR的优势**:
|
||
1. ✅ **已支持BEVPool**:可以直接使用BEVFusion的BEV特征
|
||
2. ✅ **模块化设计**:Head可以独立使用
|
||
3. ✅ **多种BEV编码器**:GKT、BEVFormer、BEVPool都支持
|
||
|
||
**集成策略**:
|
||
```python
|
||
# 不需要MapTR的完整模型!
|
||
# 只需要提取MapTRHead部分
|
||
|
||
从MapTR提取:
|
||
├── MapTRHead (dense_heads/maptr_head.py)
|
||
├── MapTRDecoder (modules/decoder.py)
|
||
├── ChamferDistance Loss (losses/map_loss.py)
|
||
├── PtsDirCosLoss (losses/map_loss.py)
|
||
└── MapTRAssigner (assigners/maptr_assigner.py)
|
||
|
||
复用BEVFusion:
|
||
├── Camera Encoder ✅
|
||
├── LiDAR Encoder ✅
|
||
├── ConvFuser ✅
|
||
└── BEV Decoder ✅ → 直接输出给MapTRHead
|
||
```
|
||
|
||
---
|
||
|
||
## 💡 核心技术要点
|
||
|
||
### 1. Query Embedding设计
|
||
|
||
MapTR使用了创新的Query设计:
|
||
```python
|
||
# 方式1: all_pts (全部点作为独立query)
|
||
num_query = num_vec × num_pts_per_vec
|
||
# 例如: 20个矢量 × 20个点 = 400个query
|
||
|
||
# 方式2: instance_pts (矢量+点的组合embedding)
|
||
pts_embeds = self.pts_embedding.weight # (num_pts, dim)
|
||
instance_embeds = self.instance_embedding.weight # (num_vec, dim)
|
||
query_embeds = pts_embeds + instance_embeds # 广播相加
|
||
```
|
||
|
||
**优势**:
|
||
- 灵活表示不定数量的矢量
|
||
- 点集建模(permutation-equivariant)
|
||
- 支持可变长度矢量
|
||
|
||
---
|
||
|
||
### 2. 点集表示
|
||
|
||
**归一化坐标**:
|
||
```python
|
||
def normalize_2d_pts(pts, pc_range):
|
||
"""
|
||
将真实坐标归一化到[0,1]
|
||
|
||
pc_range: [-50, -50, -5, 50, 50, 3] # BEV范围
|
||
"""
|
||
patch_h = pc_range[4] - pc_range[1] # 100m
|
||
patch_w = pc_range[3] - pc_range[0] # 100m
|
||
|
||
normalized_pts = pts.clone()
|
||
normalized_pts[..., 0] = (pts[..., 0] - pc_range[0]) / patch_w
|
||
normalized_pts[..., 1] = (pts[..., 1] - pc_range[1]) / patch_h
|
||
|
||
return normalized_pts # [0, 1]范围
|
||
```
|
||
|
||
**反归一化**:
|
||
```python
|
||
def denormalize_2d_pts(pts, pc_range):
|
||
"""
|
||
[0,1] → 真实坐标(米)
|
||
"""
|
||
new_pts = pts.clone()
|
||
new_pts[..., 0] = pts[..., 0] * (pc_range[3] - pc_range[0]) + pc_range[0]
|
||
new_pts[..., 1] = pts[..., 1] * (pc_range[4] - pc_range[1]) + pc_range[1]
|
||
return new_pts
|
||
```
|
||
|
||
---
|
||
|
||
### 3. 数据格式
|
||
|
||
**GT矢量地图格式**:
|
||
```python
|
||
gt_vecs_list = [
|
||
{
|
||
'vectors': [
|
||
{
|
||
'pts': torch.Tensor([[x1,y1], [x2,y2], ...]), # N个点
|
||
'pts_num': N,
|
||
'type': 0, # 0:divider, 1:boundary, 2:ped_crossing
|
||
},
|
||
...
|
||
]
|
||
},
|
||
... # batch中的每个样本
|
||
]
|
||
```
|
||
|
||
**预测输出格式**:
|
||
```python
|
||
predictions = {
|
||
'all_cls_scores': (num_layers, bs, num_vec, num_classes),
|
||
'all_pts_preds': (num_layers, bs, num_vec×num_pts, 2), # (x,y)坐标
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 🎯 适配BEVFusion的要点
|
||
|
||
### 修改点1:输入接口
|
||
|
||
**MapTR原始**:
|
||
```python
|
||
class MapTR(MVXTwoStageDetector):
|
||
# 继承自mmdet3d的双阶段检测器
|
||
# 有自己的backbone、neck等
|
||
```
|
||
|
||
**适配BEVFusion**:
|
||
```python
|
||
# 只需要MapTRHead,不需要完整MapTR模型
|
||
# BEVFusion已经有backbone和BEV特征
|
||
|
||
class BEVFusion:
|
||
def __init__(self):
|
||
# ... 现有的encoder、fuser、decoder
|
||
|
||
# 新增MapTRHead
|
||
self.heads['vector_map'] = MapTRHead(
|
||
in_channels=256, # BEV decoder输出通道
|
||
num_vec=50, # 调整为BEVFusion适用
|
||
num_pts_per_vec=20,
|
||
...
|
||
)
|
||
|
||
def forward(self, ...):
|
||
# 获取BEV特征
|
||
bev_features = self.decoder(fused_features)
|
||
|
||
# MapTRHead forward
|
||
# 注意:MapTRHead期望的输入是mlvl_feats
|
||
# 需要适配为BEV特征
|
||
outputs = self.heads['vector_map'](
|
||
bev_features, # 适配输入
|
||
img_metas=metas
|
||
)
|
||
```
|
||
|
||
---
|
||
|
||
### 修改点2:BEV特征适配
|
||
|
||
**MapTR期望**:
|
||
```python
|
||
# 多层级特征 (multi-level features)
|
||
mlvl_feats = [feat1, feat2, feat3, ...]
|
||
```
|
||
|
||
**BEVFusion提供**:
|
||
```python
|
||
# 单层BEV特征
|
||
bev_features = (B, 256, 180, 180)
|
||
```
|
||
|
||
**适配方案**:
|
||
```python
|
||
# 方案1: 包装为list
|
||
mlvl_feats = [bev_features]
|
||
|
||
# 方案2: 修改MapTRHead,直接接受BEV特征
|
||
class MapTRHeadForBEVFusion(MapTRHead):
|
||
def forward(self, bev_features, img_metas):
|
||
# 直接使用BEV特征,跳过图像编码部分
|
||
mlvl_feats = [bev_features]
|
||
return super().forward(mlvl_feats, None, img_metas)
|
||
```
|
||
|
||
---
|
||
|
||
### 修改点3:数据Pipeline
|
||
|
||
**需要新增**:
|
||
```python
|
||
# mmdet3d/datasets/pipelines/loading.py
|
||
|
||
@PIPELINES.register_module()
|
||
class LoadVectorMapAnnotation:
|
||
"""
|
||
加载矢量地图标注
|
||
|
||
从nuScenes map API提取矢量元素
|
||
"""
|
||
def __call__(self, results):
|
||
# 提取车道线、边界等
|
||
vectors = extract_vectors_from_nuscenes(
|
||
results['sample_token'],
|
||
x_range=[-50, 50],
|
||
y_range=[-50, 50]
|
||
)
|
||
|
||
results['gt_vectors'] = vectors
|
||
return results
|
||
```
|
||
|
||
---
|
||
|
||
## 📝 集成实施步骤
|
||
|
||
### Step 1: 复制MapTR核心代码(1天)
|
||
|
||
```bash
|
||
# 创建目录
|
||
mkdir -p /workspace/bevfusion/mmdet3d/models/heads/vector_map
|
||
mkdir -p /workspace/bevfusion/mmdet3d/models/losses
|
||
|
||
# 复制文件
|
||
cp /workspace/MapTR/projects/mmdet3d_plugin/maptr/dense_heads/maptr_head.py \
|
||
/workspace/bevfusion/mmdet3d/models/heads/vector_map/
|
||
|
||
cp /workspace/MapTR/projects/mmdet3d_plugin/maptr/losses/map_loss.py \
|
||
/workspace/bevfusion/mmdet3d/models/losses/
|
||
|
||
cp /workspace/MapTR/projects/mmdet3d_plugin/maptr/assigners/maptr_assigner.py \
|
||
/workspace/bevfusion/mmdet3d/core/bbox/assigners/
|
||
|
||
cp /workspace/MapTR/projects/mmdet3d_plugin/maptr/modules/decoder.py \
|
||
/workspace/bevfusion/mmdet3d/models/utils/
|
||
```
|
||
|
||
### Step 2: 修改代码适配BEVFusion(2天)
|
||
|
||
**2.1 修改MapTRHead**
|
||
```python
|
||
# 简化输入接口
|
||
class MapTRHeadForBEVFusion(MapTRHead):
|
||
def forward(self, bev_features, img_metas):
|
||
"""
|
||
简化版forward,直接接受BEV特征
|
||
|
||
Args:
|
||
bev_features: (B, 256, 180, 180) BEV特征
|
||
img_metas: 元数据
|
||
"""
|
||
# 跳过图像编码,直接使用BEV特征
|
||
# ...省略复杂的多视角处理
|
||
|
||
# 构建Query
|
||
query_embeds = self.query_embedding.weight
|
||
|
||
# Decoder(保持不变)
|
||
hs, references = self.decoder(query_embeds, bev_features)
|
||
|
||
# 分类和回归(保持不变)
|
||
cls_scores = self.cls_head(hs)
|
||
pts_preds = self.reg_head(hs)
|
||
|
||
return cls_scores, pts_preds
|
||
```
|
||
|
||
**2.2 注册到BEVFusion**
|
||
```python
|
||
# mmdet3d/models/heads/__init__.py
|
||
from .vector_map import MapTRHeadForBEVFusion
|
||
|
||
__all__ = [
|
||
...,
|
||
'MapTRHeadForBEVFusion',
|
||
]
|
||
```
|
||
|
||
---
|
||
|
||
### Step 3: 数据准备(1天)
|
||
|
||
```bash
|
||
# 提取矢量地图
|
||
python tools/data_converter/extract_vector_map.py \
|
||
--root data/nuscenes \
|
||
--version v1.0-trainval \
|
||
--output data/nuscenes/vector_maps.pkl
|
||
|
||
# 验证数据
|
||
python tools/visualize_vector_map.py --samples 10
|
||
```
|
||
|
||
---
|
||
|
||
### Step 4: 配置文件(0.5天)
|
||
|
||
```yaml
|
||
# configs/nuscenes/three_tasks/bevfusion_det_seg_vec.yaml
|
||
|
||
model:
|
||
type: BEVFusion
|
||
|
||
# ... 现有的encoder、fuser、decoder
|
||
|
||
heads:
|
||
object: ${object_head} # 已有
|
||
map: ${map_head} # 已有
|
||
|
||
# 新增MapTR矢量地图head
|
||
vector_map:
|
||
type: MapTRHeadForBEVFusion
|
||
in_channels: 256
|
||
num_vec: 50
|
||
num_pts_per_vec: 20
|
||
num_classes: 3 # divider, boundary, ped_crossing
|
||
embed_dims: 256
|
||
num_decoder_layers: 6
|
||
loss_pts:
|
||
type: ChamferDistance
|
||
loss_src_weight: 1.0
|
||
loss_dst_weight: 1.0
|
||
loss_dir:
|
||
type: PtsDirCosLoss
|
||
loss_weight: 2.0
|
||
|
||
loss_scale:
|
||
object: 1.0
|
||
map: 1.0
|
||
vector_map: 1.0
|
||
|
||
# Pipeline
|
||
train_pipeline:
|
||
- type: LoadMultiViewImageFromFiles
|
||
- type: LoadPointsFromFile
|
||
- type: LoadAnnotations3D
|
||
- type: LoadVectorMapAnnotation 🆕
|
||
# ...
|
||
```
|
||
|
||
---
|
||
|
||
### Step 5: 训练(5-7天)
|
||
|
||
```bash
|
||
# 阶段1: 冻结前两个任务,训练MapTRHead(3 epochs)
|
||
torchpack dist-run -np 8 python tools/train.py \
|
||
configs/nuscenes/three_tasks/bevfusion_det_seg_vec.yaml \
|
||
--load_from runs/enhanced_from_epoch19/epoch_23.pth \
|
||
--freeze-heads object,map \
|
||
--cfg-options max_epochs=3
|
||
|
||
# 阶段2: 三任务联合fine-tune(5 epochs)
|
||
torchpack dist-run -np 8 python tools/train.py \
|
||
configs/nuscenes/three_tasks/bevfusion_det_seg_vec.yaml \
|
||
--load_from runs/three_tasks_stage1/epoch_3.pth \
|
||
--cfg-options max_epochs=5
|
||
```
|
||
|
||
---
|
||
|
||
## 📊 关键代码片段
|
||
|
||
### MapTRHead核心逻辑
|
||
|
||
```python
|
||
# 从maptr_head.py提取的核心流程
|
||
|
||
class MapTRHead:
|
||
def __init__(self, num_vec=20, num_pts_per_vec=20, ...):
|
||
# 总query数 = 矢量数 × 每矢量点数
|
||
self.num_query = num_vec * num_pts_per_vec # 400
|
||
|
||
# Query embedding
|
||
self.query_embedding = nn.Embedding(self.num_query, embed_dims)
|
||
|
||
# 分类分支(预测矢量类别)
|
||
self.cls_branches = nn.ModuleList([
|
||
nn.Linear(embed_dims, num_classes)
|
||
for _ in range(num_decoder_layers)
|
||
])
|
||
|
||
# 回归分支(预测点坐标)
|
||
self.reg_branches = nn.ModuleList([
|
||
nn.Linear(embed_dims, 2) # (x, y)
|
||
for _ in range(num_decoder_layers)
|
||
])
|
||
|
||
def forward(self, bev_features, img_metas):
|
||
# 1. Query
|
||
query = self.query_embedding.weight # (400, 256)
|
||
|
||
# 2. Decoder
|
||
hs = self.decoder(query, bev_features) # (6, 400, B, 256)
|
||
|
||
# 3. 预测(每层)
|
||
all_cls_scores = []
|
||
all_pts_preds = []
|
||
for layer_idx in range(6):
|
||
# 分类:(B, 400, 256) → (B, 20, 3)
|
||
# 每个矢量的分类(400个点 → 20个矢量)
|
||
cls = self.cls_branches[layer_idx](
|
||
hs[layer_idx].reshape(B, 20, 20, 256).mean(dim=2)
|
||
)
|
||
|
||
# 回归:(B, 400, 256) → (B, 400, 2)
|
||
pts = self.reg_branches[layer_idx](hs[layer_idx])
|
||
pts = pts.sigmoid() # 归一化到[0,1]
|
||
|
||
all_cls_scores.append(cls)
|
||
all_pts_preds.append(pts)
|
||
|
||
return all_cls_scores, all_pts_preds
|
||
|
||
def loss(self, cls_scores, pts_preds, gt_vectors):
|
||
# 1. Hungarian匹配
|
||
indices = self.assigner.assign(pts_preds, cls_scores, gt_vectors)
|
||
|
||
# 2. 分类损失
|
||
loss_cls = self.loss_cls(cls_scores, gt_labels, indices)
|
||
|
||
# 3. Chamfer Distance
|
||
loss_pts = self.loss_pts(pts_preds, gt_pts, indices)
|
||
|
||
# 4. 方向损失
|
||
loss_dir = self.loss_dir(pts_preds, gt_pts, indices)
|
||
|
||
return {
|
||
'loss_cls': loss_cls,
|
||
'loss_pts': loss_pts,
|
||
'loss_dir': loss_dir,
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 🔍 关键技术细节
|
||
|
||
### Chamfer Distance计算
|
||
|
||
```python
|
||
def chamfer_distance(pred_pts, gt_pts):
|
||
"""
|
||
pred_pts: (B, N, num_pts_pred, 2) # 预测点
|
||
gt_pts: (B, N, num_pts_gt, 2) # GT点
|
||
|
||
返回: 双向最近点距离之和
|
||
"""
|
||
# 计算距离矩阵 (B, N, num_pts_pred, num_pts_gt)
|
||
dist = torch.cdist(pred_pts, gt_pts)
|
||
|
||
# 预测→GT的最近距离
|
||
dist_pred_to_gt = dist.min(dim=-1)[0] # (B, N, num_pts_pred)
|
||
loss_forward = dist_pred_to_gt.mean()
|
||
|
||
# GT→预测的最近距离
|
||
dist_gt_to_pred = dist.min(dim=-2)[0] # (B, N, num_pts_gt)
|
||
loss_backward = dist_gt_to_pred.mean()
|
||
|
||
# 双向Chamfer
|
||
cd_loss = loss_forward + loss_backward
|
||
return cd_loss
|
||
```
|
||
|
||
**优势**:
|
||
- ✅ 允许点的数量不同
|
||
- ✅ 允许点的顺序不同
|
||
- ✅ 对点集建模友好
|
||
|
||
---
|
||
|
||
### Hungarian匹配算法
|
||
|
||
```python
|
||
from scipy.optimize import linear_sum_assignment
|
||
|
||
def hungarian_matching(cost_matrix):
|
||
"""
|
||
cost_matrix: (num_pred, num_gt)
|
||
|
||
返回: (matched_pred_idx, matched_gt_idx)
|
||
"""
|
||
# 计算cost
|
||
# Cost = α·cls_cost + β·pts_cost + γ·dir_cost
|
||
|
||
cost = (
|
||
2.0 * cls_cost + # 分类cost
|
||
5.0 * pts_cost + # 点坐标cost
|
||
1.0 * dir_cost # 方向cost
|
||
)
|
||
|
||
# Hungarian算法找到最优匹配
|
||
pred_idx, gt_idx = linear_sum_assignment(cost.cpu().numpy())
|
||
|
||
return pred_idx, gt_idx
|
||
```
|
||
|
||
---
|
||
|
||
## 💾 数据提取工具
|
||
|
||
MapTR提供了数据提取工具,我们可以参考:
|
||
|
||
```bash
|
||
# MapTR的数据准备
|
||
cd /workspace/MapTR
|
||
ls tools/maptrv2/
|
||
|
||
# 核心工具:
|
||
- gen_ann.py # 生成标注
|
||
- generate_*_info.py # 生成info文件
|
||
```
|
||
|
||
**我们需要的**:
|
||
```python
|
||
# /workspace/bevfusion/tools/data_converter/extract_vector_map.py
|
||
|
||
from nuscenes.nuscenes import NuScenes
|
||
from nuscenes.map_expansion.map_api import NuScenesMap
|
||
|
||
def extract_vector_map(sample_token, nusc, nusc_map):
|
||
"""
|
||
提取单个样本的矢量地图
|
||
|
||
返回:
|
||
vectors: 列表of矢量元素
|
||
"""
|
||
# 获取ego pose
|
||
sample = nusc.get('sample', sample_token)
|
||
sd_token = sample['data']['LIDAR_TOP']
|
||
sd_rec = nusc.get('sample_data', sd_token)
|
||
pose_rec = nusc.get('ego_pose', sd_rec['ego_pose_token'])
|
||
|
||
# 在ego周围提取矢量
|
||
vectors = []
|
||
|
||
# 1. 车道分隔线
|
||
lanes = nusc_map.get_records_in_patch(
|
||
[pose_rec['translation'][0]-50, pose_rec['translation'][1]-50,
|
||
pose_rec['translation'][0]+50, pose_rec['translation'][1]+50],
|
||
layer_names=['lane_divider'],
|
||
mode='intersect'
|
||
)
|
||
|
||
for lane_token in lanes:
|
||
line = nusc_map.extract_line(lane_token)
|
||
# 转换到ego坐标系
|
||
pts_global = np.array(line.coords)
|
||
pts_ego = transform_to_ego(pts_global, pose_rec)
|
||
|
||
vectors.append({
|
||
'pts': pts_ego,
|
||
'type': 0, # divider
|
||
})
|
||
|
||
# 2. 道路边界 ...
|
||
# 3. 人行横道 ...
|
||
|
||
return vectors
|
||
```
|
||
|
||
---
|
||
|
||
## 🎯 集成后的完整流程
|
||
|
||
```python
|
||
# BEVFusion + MapTR集成后的训练流程
|
||
|
||
class BEVFusionWithVectorMap(BEVFusion):
|
||
def forward_train(self, img, points, gt_bboxes_3d, gt_labels_3d,
|
||
gt_masks_bev, gt_vectors, img_metas):
|
||
"""
|
||
增加了gt_vectors参数
|
||
"""
|
||
# 1. 特征提取(不变)
|
||
camera_feat = self.extract_camera_features(img, img_metas)
|
||
lidar_feat = self.extract_lidar_features(points)
|
||
|
||
# 2. 融合(不变)
|
||
fused_feat = self.fuser([camera_feat, lidar_feat])
|
||
|
||
# 3. 解码(不变)
|
||
bev_feat = self.decoder(fused_feat)
|
||
|
||
# 4. 多任务Head
|
||
losses = {}
|
||
|
||
# Task 1: 检测
|
||
if 'object' in self.heads:
|
||
det_pred = self.heads['object'](bev_feat, img_metas)
|
||
det_loss = self.heads['object'].loss(det_pred, gt_bboxes_3d, gt_labels_3d)
|
||
for k, v in det_loss.items():
|
||
losses[f'loss/object/{k}'] = v
|
||
|
||
# Task 2: 分割
|
||
if 'map' in self.heads:
|
||
seg_loss = self.heads['map'](bev_feat, gt_masks_bev)
|
||
for k, v in seg_loss.items():
|
||
losses[f'loss/map/{k}'] = v
|
||
|
||
# Task 3: 矢量地图 🆕
|
||
if 'vector_map' in self.heads:
|
||
vec_cls, vec_pts = self.heads['vector_map'](bev_feat, img_metas)
|
||
vec_loss = self.heads['vector_map'].loss(vec_cls, vec_pts, gt_vectors)
|
||
for k, v in vec_loss.items():
|
||
losses[f'loss/vector_map/{k}'] = v
|
||
|
||
return losses
|
||
```
|
||
|
||
---
|
||
|
||
## 📚 学习资源
|
||
|
||
### 论文
|
||
- MapTR (ICLR 2023): https://arxiv.org/abs/2208.14437
|
||
- MapTRv2 (IJCV 2024): https://arxiv.org/abs/2308.05736
|
||
|
||
### 代码
|
||
- GitHub: https://github.com/hustvl/MapTR
|
||
- 本地路径: /workspace/MapTR
|
||
|
||
### 关键文件
|
||
- maptr_head.py: 35KB,核心Head实现
|
||
- map_loss.py: 26KB,损失函数
|
||
- decoder.py: 3KB,Transformer解码器
|
||
|
||
---
|
||
|
||
## 🎯 总结
|
||
|
||
### MapTR的核心价值
|
||
1. ✅ **Query-based矢量预测**:适合不定数量的地图元素
|
||
2. ✅ **点集建模**:灵活表示各种形状
|
||
3. ✅ **端到端**:直接从图像到矢量
|
||
4. ✅ **高性能**:mAP 50-73%
|
||
|
||
### 集成BEVFusion的优势
|
||
1. ✅ **复用BEV特征**:不需要重新训练backbone
|
||
2. ✅ **模块化**:只需要MapTRHead部分
|
||
3. ✅ **数据共享**:使用相同的nuScenes数据
|
||
4. ✅ **快速训练**:只训练15M新参数
|
||
|
||
### 预期效果
|
||
- 三任务联合性能
|
||
- 检测mAP: 64-66%
|
||
- 分割mIoU: 55-58%
|
||
- 矢量地图mAP: 50-55%
|
||
- 训练时间: 2-3天
|
||
|
||
---
|
||
|
||
**下一步**:是否开始实施MapTR集成?
|
||
|
||
|