1021 lines
27 KiB
Markdown
1021 lines
27 KiB
Markdown
|
|
# BEVFusion 扩展为自动驾驶全感知网络
|
|||
|
|
|
|||
|
|
**设计时间**:2025-10-22
|
|||
|
|
**目标**:从单一感知任务扩展为完整的感知+定位+地图系统
|
|||
|
|
**基础**:当前BEVFusion双任务模型(检测+分割)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 扩展目标
|
|||
|
|
|
|||
|
|
### 当前能力 ✅
|
|||
|
|
```
|
|||
|
|
BEVFusion 双任务系统
|
|||
|
|
├── 3D目标检测 (Object Detection)
|
|||
|
|
│ └── 输出: 3D边界框、类别、速度
|
|||
|
|
└── BEV地图分割 (Semantic Segmentation)
|
|||
|
|
└── 输出: 6类语义分割mask
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 扩展后能力 🚀
|
|||
|
|
```
|
|||
|
|
自动驾驶全感知网络 (Autonomous Driving Perception System)
|
|||
|
|
├── 1. 3D目标检测 ✅ 已有
|
|||
|
|
├── 2. BEV语义分割 ✅ 已有
|
|||
|
|
├── 3. 矢量地图预测 🆕 新增
|
|||
|
|
├── 4. 车辆定位 🆕 新增
|
|||
|
|
├── 5. 轨迹预测 🆕 新增(可选)
|
|||
|
|
└── 6. 占用网格 🆕 新增(可选)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🏗️ 完整架构设计
|
|||
|
|
|
|||
|
|
### 系统架构图
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ 多传感器输入层 │
|
|||
|
|
├─────────────────────────────────────────────────────────────────┤
|
|||
|
|
│ Camera (6视角) │ LiDAR点云 │ Radar │ IMU │ GPS/RTK │
|
|||
|
|
└────────┬────────────────┬────────────┬────────┬──────┬──────────┘
|
|||
|
|
│ │ │ │ │
|
|||
|
|
▼ ▼ ▼ │ │
|
|||
|
|
┌─────────────────────────────────────────────┐ │ │
|
|||
|
|
│ BEVFusion统一特征提取 │ │ │
|
|||
|
|
├─────────────────────────────────────────────┤ │ │
|
|||
|
|
│ Camera Encoder │ LiDAR Encoder │ Radar │ │ │
|
|||
|
|
│ ↓ ↓ ↓ │ │ │
|
|||
|
|
│ BEV Features (统一的鸟瞰图表示) │ │ │
|
|||
|
|
│ (B, 256, 180, 180) │ │ │
|
|||
|
|
└────────┬────────────────────────────────────┘ │ │
|
|||
|
|
│ │ │
|
|||
|
|
▼ ▼ ▼
|
|||
|
|
┌──────────────────────────────────────────────────────────────┐
|
|||
|
|
│ 多任务感知头层 │
|
|||
|
|
├──────────────────────────────────────────────────────────────┤
|
|||
|
|
│ │
|
|||
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
|
|||
|
|
│ │ Task 1 │ │ Task 2 │ │ Task 3 │ │
|
|||
|
|
│ │ 3D检测 │ │ BEV分割 │ │ 矢量地图 🆕 │ │
|
|||
|
|
│ │ TransFusion │ │ Enhanced │ │ MapTR Head │ │
|
|||
|
|
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────────────┘ │
|
|||
|
|
│ │ │ │ │
|
|||
|
|
│ ▼ ▼ ▼ │
|
|||
|
|
│ 3D Boxes Semantic Masks Vector Map │
|
|||
|
|
│ │
|
|||
|
|
│ ┌─────────────┐ ┌──────────────────┐ ┌────────────────┐ │
|
|||
|
|
│ │ Task 4 🆕 │ │ Task 5 🆕 │ │ Task 6 🆕 │ │
|
|||
|
|
│ │ 自车定位 │ │ 目标轨迹预测 │ │ 占用网格 │ │
|
|||
|
|
│ │ Localization│ │ Trajectory Pred │ │ Occupancy Grid │ │
|
|||
|
|
│ └──────┬──────┘ └──────┬───────────┘ └──────┬─────────┘ │
|
|||
|
|
│ │ │ │ │
|
|||
|
|
│ ▼ ▼ ▼ │
|
|||
|
|
│ Ego Pose Future Trajectory Occupancy Map │
|
|||
|
|
│ │
|
|||
|
|
└──────────────────────────────────────────────────────────────┘
|
|||
|
|
│ │ │
|
|||
|
|
▼ ▼ ▼
|
|||
|
|
┌──────────────────────────────────────────────────────────────┐
|
|||
|
|
│ 融合决策层 │
|
|||
|
|
├──────────────────────────────────────────────────────────────┤
|
|||
|
|
│ 综合所有任务输出,提供完整的场景理解 │
|
|||
|
|
│ → 用于路径规划、决策控制 │
|
|||
|
|
└──────────────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 扩展任务详解
|
|||
|
|
|
|||
|
|
### Task 3: 矢量地图预测 🆕
|
|||
|
|
|
|||
|
|
**目标**:预测高精度矢量地图元素
|
|||
|
|
|
|||
|
|
**输出内容**:
|
|||
|
|
- 车道线(Lane Lines)
|
|||
|
|
- 道路边界(Road Boundaries)
|
|||
|
|
- 人行横道(Pedestrian Crossings)
|
|||
|
|
- 停止线(Stop Lines)
|
|||
|
|
- 道路分隔线(Dividers)
|
|||
|
|
|
|||
|
|
**技术方案**:MapTRv2集成
|
|||
|
|
```python
|
|||
|
|
# 新增MapTRHead
|
|||
|
|
model:
|
|||
|
|
heads:
|
|||
|
|
vector_map:
|
|||
|
|
type: MapTRHead
|
|||
|
|
num_queries: 50
|
|||
|
|
num_points: 20
|
|||
|
|
embed_dims: 256
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**数据需求**:
|
|||
|
|
- nuScenes矢量地图标注
|
|||
|
|
- 车道线点集
|
|||
|
|
- 拓扑关系
|
|||
|
|
|
|||
|
|
**预期性能**:
|
|||
|
|
- 矢量地图mAP: 50-55%
|
|||
|
|
|
|||
|
|
**实施时间**:2周
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Task 4: 自车定位(Ego Localization)🆕
|
|||
|
|
|
|||
|
|
**目标**:估计自车在全局坐标系中的精确位置和姿态
|
|||
|
|
|
|||
|
|
**输出内容**:
|
|||
|
|
```python
|
|||
|
|
ego_pose = {
|
|||
|
|
'position': [x, y, z], # 全局坐标(米)
|
|||
|
|
'orientation': [qw, qx, qy, qz], # 四元数
|
|||
|
|
'velocity': [vx, vy, vz], # 速度(m/s)
|
|||
|
|
'uncertainty': covariance # 6×6协方差矩阵
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**技术方案选择**:
|
|||
|
|
|
|||
|
|
#### 方案A:基于地图匹配的定位(推荐)⭐⭐⭐⭐⭐
|
|||
|
|
```python
|
|||
|
|
# 利用BEV特征和地图进行定位
|
|||
|
|
class BEVLocalizationHead(nn.Module):
|
|||
|
|
"""
|
|||
|
|
基于BEV特征匹配预构建地图进行定位
|
|||
|
|
"""
|
|||
|
|
def __init__(self):
|
|||
|
|
# 1. 地图编码器
|
|||
|
|
self.map_encoder = MapEncoder()
|
|||
|
|
|
|||
|
|
# 2. 特征匹配网络
|
|||
|
|
self.matcher = FeatureMatcher()
|
|||
|
|
|
|||
|
|
# 3. 位姿回归
|
|||
|
|
self.pose_regressor = PoseRegressor()
|
|||
|
|
|
|||
|
|
def forward(self, bev_features, prior_map):
|
|||
|
|
# BEV特征 + 先验地图 → 匹配 → 位姿
|
|||
|
|
map_features = self.map_encoder(prior_map)
|
|||
|
|
similarity = self.matcher(bev_features, map_features)
|
|||
|
|
pose = self.pose_regressor(similarity)
|
|||
|
|
return pose
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**优势**:
|
|||
|
|
- ✅ 利用BEV表示天然适合
|
|||
|
|
- ✅ 精度高(厘米级)
|
|||
|
|
- ✅ 鲁棒性好
|
|||
|
|
|
|||
|
|
**数据需求**:
|
|||
|
|
- 预构建的BEV地图
|
|||
|
|
- GPS/IMU融合的ground truth位姿
|
|||
|
|
- 地图tile管理系统
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### 方案B:视觉惯性里程计(VIO)⭐⭐⭐⭐
|
|||
|
|
```python
|
|||
|
|
class VisualInertialOdometry(nn.Module):
|
|||
|
|
"""
|
|||
|
|
视觉+IMU融合的相对定位
|
|||
|
|
"""
|
|||
|
|
def __init__(self):
|
|||
|
|
# 1. 视觉里程计
|
|||
|
|
self.visual_odometry = VisualOdometry()
|
|||
|
|
|
|||
|
|
# 2. IMU预积分
|
|||
|
|
self.imu_integrator = IMUIntegrator()
|
|||
|
|
|
|||
|
|
# 3. 卡尔曼滤波融合
|
|||
|
|
self.ekf = ExtendedKalmanFilter()
|
|||
|
|
|
|||
|
|
def forward(self, images_seq, imu_data):
|
|||
|
|
# 相对位姿估计
|
|||
|
|
relative_pose = self.visual_odometry(images_seq)
|
|||
|
|
imu_pose = self.imu_integrator(imu_data)
|
|||
|
|
fused_pose = self.ekf.update(relative_pose, imu_pose)
|
|||
|
|
return fused_pose
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**优势**:
|
|||
|
|
- ✅ 不依赖预构建地图
|
|||
|
|
- ✅ 实时性好
|
|||
|
|
- ⚠️ 存在累积误差
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Task 5: 目标轨迹预测 🆕
|
|||
|
|
|
|||
|
|
**目标**:预测周围车辆和行人的未来轨迹
|
|||
|
|
|
|||
|
|
**输出内容**:
|
|||
|
|
```python
|
|||
|
|
trajectory = {
|
|||
|
|
'object_id': 123,
|
|||
|
|
'current_state': [x, y, vx, vy],
|
|||
|
|
'future_trajectory': [
|
|||
|
|
[x1, y1, t1], # 1秒后
|
|||
|
|
[x2, y2, t2], # 2秒后
|
|||
|
|
[x3, y3, t3], # 3秒后
|
|||
|
|
],
|
|||
|
|
'probability': [0.8, 0.7, 0.6], # 置信度
|
|||
|
|
'mode': 'turn_left', # 运动模式
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**技术方案**:
|
|||
|
|
```python
|
|||
|
|
class TrajectoryPredictionHead(nn.Module):
|
|||
|
|
"""
|
|||
|
|
基于检测结果预测未来轨迹
|
|||
|
|
"""
|
|||
|
|
def __init__(self):
|
|||
|
|
# 1. 历史轨迹编码器
|
|||
|
|
self.history_encoder = LSTMEncoder()
|
|||
|
|
|
|||
|
|
# 2. 场景上下文编码
|
|||
|
|
self.scene_encoder = SceneEncoder() # 使用BEV特征
|
|||
|
|
|
|||
|
|
# 3. 交互建模
|
|||
|
|
self.interaction_module = InteractionGraph()
|
|||
|
|
|
|||
|
|
# 4. 多模态轨迹解码器
|
|||
|
|
self.trajectory_decoder = MultiModalDecoder(num_modes=6)
|
|||
|
|
|
|||
|
|
def forward(self, objects, bev_features, history):
|
|||
|
|
# 历史 + 场景 + 交互 → 未来轨迹
|
|||
|
|
hist_feat = self.history_encoder(history)
|
|||
|
|
scene_feat = self.scene_encoder(bev_features)
|
|||
|
|
interaction = self.interaction_module(objects)
|
|||
|
|
trajectories = self.trajectory_decoder(hist_feat, scene_feat, interaction)
|
|||
|
|
return trajectories
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**数据需求**:
|
|||
|
|
- 历史轨迹数据(nuScenes提供)
|
|||
|
|
- 未来3秒ground truth
|
|||
|
|
- 交互场景标注
|
|||
|
|
|
|||
|
|
**参考模型**:
|
|||
|
|
- MTR (Motion Transformer)
|
|||
|
|
- Wayformer
|
|||
|
|
- MultiPath++
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Task 6: 占用网格预测 🆕
|
|||
|
|
|
|||
|
|
**目标**:预测3D空间的占用状态
|
|||
|
|
|
|||
|
|
**输出内容**:
|
|||
|
|
```python
|
|||
|
|
occupancy_grid = {
|
|||
|
|
'grid': (B, X, Y, Z), # 3D网格
|
|||
|
|
'values': [0, 1], # 0:空闲, 1:占用
|
|||
|
|
'semantics': [0-N], # 语义类别(可选)
|
|||
|
|
'resolution': 0.4m, # 网格分辨率
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**技术方案**:
|
|||
|
|
```python
|
|||
|
|
class OccupancyHead(nn.Module):
|
|||
|
|
"""
|
|||
|
|
3D占用网格预测
|
|||
|
|
"""
|
|||
|
|
def __init__(self):
|
|||
|
|
# BEV → 3D占用
|
|||
|
|
self.bev_to_3d = BEV3DUpsampler()
|
|||
|
|
self.occupancy_predictor = OccupancyPredictor()
|
|||
|
|
|
|||
|
|
def forward(self, bev_features):
|
|||
|
|
# BEV特征 → 3D占用网格
|
|||
|
|
features_3d = self.bev_to_3d(bev_features)
|
|||
|
|
occupancy = self.occupancy_predictor(features_3d)
|
|||
|
|
return occupancy
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**参考模型**:
|
|||
|
|
- MonoScene
|
|||
|
|
- TPVFormer
|
|||
|
|
- OccNet
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔧 完整扩展方案
|
|||
|
|
|
|||
|
|
### 阶段1:三任务系统(2周)
|
|||
|
|
|
|||
|
|
**任务组合**:
|
|||
|
|
```
|
|||
|
|
检测 + 分割 + 矢量地图
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**实施步骤**:
|
|||
|
|
1. 集成MapTRHead(基于现有MAPTR_INTEGRATION_PLAN.md)
|
|||
|
|
2. 提取矢量地图数据
|
|||
|
|
3. 三任务联合训练
|
|||
|
|
4. 性能评估
|
|||
|
|
|
|||
|
|
**预期性能**:
|
|||
|
|
- 检测mAP: 65-68%
|
|||
|
|
- 分割mIoU: 55-58%
|
|||
|
|
- 矢量地图mAP: 50-55%
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 阶段2:加入定位功能(1-2周)
|
|||
|
|
|
|||
|
|
**方案选择**:地图匹配定位(推荐)
|
|||
|
|
|
|||
|
|
**实施步骤**:
|
|||
|
|
|
|||
|
|
#### Step 1: 构建BEV地图数据库(3天)
|
|||
|
|
```python
|
|||
|
|
# tools/build_bev_map_database.py
|
|||
|
|
|
|||
|
|
def build_bev_map_database(nusc_root, output_dir):
|
|||
|
|
"""
|
|||
|
|
为每个地图区域构建BEV地图tile
|
|||
|
|
"""
|
|||
|
|
maps = ['boston-seaport', 'singapore-onenorth',
|
|||
|
|
'singapore-hollandvillage', 'singapore-queenstown']
|
|||
|
|
|
|||
|
|
for map_name in maps:
|
|||
|
|
nusc_map = NuScenesMap(nusc_root, map_name)
|
|||
|
|
|
|||
|
|
# 划分为100m×100m的tile
|
|||
|
|
tiles = create_map_tiles(nusc_map, tile_size=100)
|
|||
|
|
|
|||
|
|
for tile in tiles:
|
|||
|
|
# 渲染为BEV表示
|
|||
|
|
bev_map = render_bev_map(tile, resolution=0.3)
|
|||
|
|
# 保存
|
|||
|
|
save_tile(bev_map, output_dir, map_name, tile.id)
|
|||
|
|
|
|||
|
|
print(f"✅ BEV地图数据库构建完成")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Step 2: 实现定位Head(2天)
|
|||
|
|
```python
|
|||
|
|
# mmdet3d/models/heads/localization/bev_localization_head.py
|
|||
|
|
|
|||
|
|
@HEADS.register_module()
|
|||
|
|
class BEVLocalizationHead(nn.Module):
|
|||
|
|
"""
|
|||
|
|
基于BEV特征匹配的定位head
|
|||
|
|
"""
|
|||
|
|
def __init__(
|
|||
|
|
self,
|
|||
|
|
in_channels=256,
|
|||
|
|
map_embedding_dim=128,
|
|||
|
|
pose_dims=6, # x, y, z, roll, pitch, yaw
|
|||
|
|
):
|
|||
|
|
super().__init__()
|
|||
|
|
|
|||
|
|
# 1. BEV特征编码器
|
|||
|
|
self.bev_encoder = nn.Sequential(
|
|||
|
|
nn.Conv2d(in_channels, 128, 3, padding=1),
|
|||
|
|
nn.BatchNorm2d(128),
|
|||
|
|
nn.ReLU(),
|
|||
|
|
nn.Conv2d(128, map_embedding_dim, 3, padding=1),
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 2. 地图编码器(预训练)
|
|||
|
|
self.map_encoder = nn.Sequential(
|
|||
|
|
nn.Conv2d(3, 64, 3, padding=1),
|
|||
|
|
nn.ReLU(),
|
|||
|
|
nn.Conv2d(64, map_embedding_dim, 3, padding=1),
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 3. 相关性计算
|
|||
|
|
self.correlator = SpatialCorrelation()
|
|||
|
|
|
|||
|
|
# 4. 位姿回归
|
|||
|
|
self.pose_head = nn.Sequential(
|
|||
|
|
nn.Linear(map_embedding_dim * 4, 256),
|
|||
|
|
nn.ReLU(),
|
|||
|
|
nn.Dropout(0.2),
|
|||
|
|
nn.Linear(256, pose_dims + 6*6), # pose + covariance
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
def forward(self, bev_features, prior_map_tile, gps_hint=None):
|
|||
|
|
"""
|
|||
|
|
Args:
|
|||
|
|
bev_features: (B, 256, H, W) 当前BEV特征
|
|||
|
|
prior_map_tile: (B, 3, H, W) 先验地图tile
|
|||
|
|
gps_hint: (B, 2) GPS粗略位置(可选)
|
|||
|
|
|
|||
|
|
Returns:
|
|||
|
|
pose: (B, 6) [x, y, z, roll, pitch, yaw]
|
|||
|
|
uncertainty: (B, 6, 6) 协方差矩阵
|
|||
|
|
"""
|
|||
|
|
# 编码BEV和地图
|
|||
|
|
bev_emb = self.bev_encoder(bev_features)
|
|||
|
|
map_emb = self.map_encoder(prior_map_tile)
|
|||
|
|
|
|||
|
|
# 计算相关性(匹配)
|
|||
|
|
correlation = self.correlator(bev_emb, map_emb)
|
|||
|
|
|
|||
|
|
# 全局池化
|
|||
|
|
pooled = F.adaptive_avg_pool2d(correlation, 1).flatten(1)
|
|||
|
|
|
|||
|
|
# 回归位姿
|
|||
|
|
output = self.pose_head(pooled)
|
|||
|
|
pose = output[:, :6]
|
|||
|
|
covariance_vec = output[:, 6:]
|
|||
|
|
|
|||
|
|
# 重构协方差矩阵
|
|||
|
|
uncertainty = self.reconstruct_covariance(covariance_vec)
|
|||
|
|
|
|||
|
|
return pose, uncertainty
|
|||
|
|
|
|||
|
|
def loss(self, pred_pose, gt_pose, pred_cov):
|
|||
|
|
"""
|
|||
|
|
定位损失函数
|
|||
|
|
"""
|
|||
|
|
# 1. 位姿L1损失
|
|||
|
|
loss_pose = F.l1_loss(pred_pose, gt_pose)
|
|||
|
|
|
|||
|
|
# 2. 不确定性损失(负对数似然)
|
|||
|
|
loss_uncertainty = self.nll_loss(pred_pose, gt_pose, pred_cov)
|
|||
|
|
|
|||
|
|
return {
|
|||
|
|
'loss_localization': loss_pose,
|
|||
|
|
'loss_uncertainty': loss_uncertainty,
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Step 3: 数据准备(1天)
|
|||
|
|
```python
|
|||
|
|
# 为每个样本准备:
|
|||
|
|
1. BEV地图tile(从数据库查询)
|
|||
|
|
2. Ground truth位姿(从nuScenes)
|
|||
|
|
3. GPS粗略位置(初始化)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### Step 4: 训练策略(5-7天)
|
|||
|
|
```yaml
|
|||
|
|
# 阶段1: 冻结其他head,只训练定位head(3 epochs)
|
|||
|
|
model:
|
|||
|
|
freeze_heads: ['object', 'map', 'vector_map']
|
|||
|
|
|
|||
|
|
# 阶段2: 联合fine-tune(5 epochs)
|
|||
|
|
model:
|
|||
|
|
freeze_heads: []
|
|||
|
|
loss_scale:
|
|||
|
|
object: 1.0
|
|||
|
|
map: 1.0
|
|||
|
|
vector_map: 1.0
|
|||
|
|
localization: 2.0 # 定位权重稍高
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Task 5: 轨迹预测(可选)🆕
|
|||
|
|
|
|||
|
|
**目标**:预测其他交通参与者的未来轨迹
|
|||
|
|
|
|||
|
|
**技术方案**:
|
|||
|
|
```python
|
|||
|
|
@HEADS.register_module()
|
|||
|
|
class TrajectoryPredictionHead(nn.Module):
|
|||
|
|
"""
|
|||
|
|
多模态轨迹预测
|
|||
|
|
基于检测结果和场景上下文
|
|||
|
|
"""
|
|||
|
|
def __init__(
|
|||
|
|
self,
|
|||
|
|
history_steps=10, # 历史10帧(2秒)
|
|||
|
|
future_steps=30, # 未来30帧(6秒)
|
|||
|
|
num_modes=6, # 6种可能模式
|
|||
|
|
hidden_dim=256,
|
|||
|
|
):
|
|||
|
|
super().__init__()
|
|||
|
|
|
|||
|
|
# 1. 历史轨迹编码
|
|||
|
|
self.history_encoder = nn.LSTM(4, hidden_dim, 2) # (x,y,vx,vy)
|
|||
|
|
|
|||
|
|
# 2. Agent特征提取(从BEV)
|
|||
|
|
self.agent_encoder = AgentROIExtractor()
|
|||
|
|
|
|||
|
|
# 3. 场景上下文(road graph)
|
|||
|
|
self.scene_encoder = VectorizedSceneEncoder()
|
|||
|
|
|
|||
|
|
# 4. 交互建模(agent-agent)
|
|||
|
|
self.interaction = MultiAgentInteraction(hidden_dim)
|
|||
|
|
|
|||
|
|
# 5. 多模态解码器
|
|||
|
|
self.decoder = nn.ModuleList([
|
|||
|
|
TrajectoryDecoder(hidden_dim, future_steps)
|
|||
|
|
for _ in range(num_modes)
|
|||
|
|
])
|
|||
|
|
|
|||
|
|
# 6. 模式分类器
|
|||
|
|
self.mode_classifier = nn.Linear(hidden_dim, num_modes)
|
|||
|
|
|
|||
|
|
def forward(self, detected_objects, bev_features, history_tracks):
|
|||
|
|
"""
|
|||
|
|
预测所有检测到的目标的未来轨迹
|
|||
|
|
"""
|
|||
|
|
# 编码历史
|
|||
|
|
hist_features, _ = self.history_encoder(history_tracks)
|
|||
|
|
|
|||
|
|
# 提取agent特征
|
|||
|
|
agent_features = self.agent_encoder(bev_features, detected_objects)
|
|||
|
|
|
|||
|
|
# 编码场景(车道、路口等)
|
|||
|
|
scene_features = self.scene_encoder(bev_features)
|
|||
|
|
|
|||
|
|
# 建模交互
|
|||
|
|
interaction_features = self.interaction(agent_features)
|
|||
|
|
|
|||
|
|
# 融合所有特征
|
|||
|
|
fused = hist_features + agent_features + scene_features + interaction_features
|
|||
|
|
|
|||
|
|
# 多模态预测
|
|||
|
|
trajectories = []
|
|||
|
|
for decoder in self.decoder:
|
|||
|
|
traj = decoder(fused)
|
|||
|
|
trajectories.append(traj)
|
|||
|
|
|
|||
|
|
# 模式概率
|
|||
|
|
mode_probs = F.softmax(self.mode_classifier(fused), dim=-1)
|
|||
|
|
|
|||
|
|
return trajectories, mode_probs
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**数据需求**:
|
|||
|
|
- 历史轨迹(从tracking得到)
|
|||
|
|
- 未来轨迹ground truth
|
|||
|
|
- 场景图(道路拓扑)
|
|||
|
|
|
|||
|
|
**参考数据集**:
|
|||
|
|
- nuScenes Prediction Challenge
|
|||
|
|
- Waymo Motion Dataset
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Task 6: 占用网格预测(可选)🆕
|
|||
|
|
|
|||
|
|
**目标**:预测3D空间占用状态
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
@HEADS.register_module()
|
|||
|
|
class OccupancyHead(nn.Module):
|
|||
|
|
"""
|
|||
|
|
3D占用网格预测
|
|||
|
|
输出: (B, X, Y, Z) 3D网格
|
|||
|
|
"""
|
|||
|
|
def __init__(
|
|||
|
|
self,
|
|||
|
|
in_channels=256,
|
|||
|
|
grid_size=[200, 200, 16], # X, Y, Z
|
|||
|
|
num_classes=2, # 空闲/占用 或 多类语义
|
|||
|
|
):
|
|||
|
|
super().__init__()
|
|||
|
|
|
|||
|
|
# BEV → 3D特征
|
|||
|
|
self.bev_to_3d = nn.Sequential(
|
|||
|
|
nn.Conv2d(in_channels, 128, 3, padding=1),
|
|||
|
|
nn.ReLU(),
|
|||
|
|
# 上采样到3D
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 3D卷积预测
|
|||
|
|
self.occupancy_3d = nn.Sequential(
|
|||
|
|
nn.Conv3d(128, 64, 3, padding=1),
|
|||
|
|
nn.ReLU(),
|
|||
|
|
nn.Conv3d(64, num_classes, 1),
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
def forward(self, bev_features):
|
|||
|
|
# BEV → 3D占用
|
|||
|
|
features_3d = self.bev_to_3d(bev_features)
|
|||
|
|
occupancy = self.occupancy_3d(features_3d)
|
|||
|
|
return occupancy
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**应用场景**:
|
|||
|
|
- 碰撞检测
|
|||
|
|
- 路径规划
|
|||
|
|
- 安全距离计算
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 完整系统对比
|
|||
|
|
|
|||
|
|
### 功能对比
|
|||
|
|
|
|||
|
|
| 功能 | 当前BEVFusion | 三任务系统 | 全感知系统 |
|
|||
|
|
|------|--------------|-----------|-----------|
|
|||
|
|
| 3D检测 | ✅ | ✅ | ✅ |
|
|||
|
|
| BEV分割 | ✅ | ✅ | ✅ |
|
|||
|
|
| 矢量地图 | ❌ | ✅ | ✅ |
|
|||
|
|
| 自车定位 | ❌ | ❌ | ✅ |
|
|||
|
|
| 轨迹预测 | ❌ | ❌ | ✅ |
|
|||
|
|
| 占用网格 | ❌ | ❌ | ✅ |
|
|||
|
|
|
|||
|
|
### 性能对比
|
|||
|
|
|
|||
|
|
| 系统 | 参数量 | 推理时间 | 输出 | 应用 |
|
|||
|
|
|------|--------|----------|------|------|
|
|||
|
|
| 双任务 | 110M | 90ms | 检测+分割 | 基础感知 |
|
|||
|
|
| 三任务 | 125M | 110ms | +矢量地图 | 高精地图 |
|
|||
|
|
| 五任务 | 150M | 140ms | +定位+轨迹 | 规划决策 |
|
|||
|
|
| 六任务 | 170M | 160ms | +占用网格 | 完整感知 |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 实施路线图
|
|||
|
|
|
|||
|
|
### 方案A:渐进式扩展(推荐)⭐⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
当前: 双任务(检测+分割)✅
|
|||
|
|
↓ +2周
|
|||
|
|
阶段1: 三任务(+矢量地图)
|
|||
|
|
↓ +1周
|
|||
|
|
阶段2: 四任务(+定位)
|
|||
|
|
↓ +1周(可选)
|
|||
|
|
阶段3: 五任务(+轨迹预测)
|
|||
|
|
↓ +1周(可选)
|
|||
|
|
阶段4: 六任务(+占用网格)
|
|||
|
|
|
|||
|
|
总时间: 5-7周
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**优势**:
|
|||
|
|
- ✅ 风险可控
|
|||
|
|
- ✅ 每阶段独立验证
|
|||
|
|
- ✅ 可以根据需求停止扩展
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 方案B:核心三任务+定位(实用)⭐⭐⭐⭐
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
当前: 双任务 ✅
|
|||
|
|
↓ +2周
|
|||
|
|
三任务: 检测+分割+矢量地图
|
|||
|
|
↓ +1周
|
|||
|
|
四任务: +定位
|
|||
|
|
|
|||
|
|
总时间: 3周
|
|||
|
|
输出: 核心感知+定位系统
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**优势**:
|
|||
|
|
- ✅ 时间可控
|
|||
|
|
- ✅ 覆盖核心需求
|
|||
|
|
- ✅ 适合部署
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📅 详细时间表
|
|||
|
|
|
|||
|
|
### Week 1-2:当前训练(进行中)
|
|||
|
|
- 🔄 完成增强版训练(Epoch 23)
|
|||
|
|
- ⏳ 预计10-29完成
|
|||
|
|
- 🎯 目标mIoU: 60-65%
|
|||
|
|
|
|||
|
|
### Week 3-4:MapTR集成
|
|||
|
|
**11-01 ~ 11-14**
|
|||
|
|
|
|||
|
|
| 天数 | 任务 | 时间 |
|
|||
|
|
|------|------|------|
|
|||
|
|
| Day 1-2 | 研究MapTR代码 | 2天 |
|
|||
|
|
| Day 3 | 提取矢量地图数据 | 1天 |
|
|||
|
|
| Day 4-5 | 实现MapTRHead | 2天 |
|
|||
|
|
| Day 6 | 集成测试 | 1天 |
|
|||
|
|
| Day 7-9 | 三任务训练 | 3天 |
|
|||
|
|
| Day 10 | 评估 | 1天 |
|
|||
|
|
|
|||
|
|
**交付**:
|
|||
|
|
- 三任务模型(检测+分割+矢量)
|
|||
|
|
- 矢量地图mAP: 50-55%
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Week 5:定位功能开发
|
|||
|
|
**11-15 ~ 11-21**
|
|||
|
|
|
|||
|
|
| 天数 | 任务 | 时间 |
|
|||
|
|
|------|------|------|
|
|||
|
|
| Day 1-3 | 构建BEV地图数据库 | 3天 |
|
|||
|
|
| Day 4-5 | 实现定位Head | 2天 |
|
|||
|
|
| Day 6 | 数据准备和测试 | 1天 |
|
|||
|
|
| Day 7 | 集成到系统 | 1天 |
|
|||
|
|
|
|||
|
|
**交付**:
|
|||
|
|
- BEV地图数据库
|
|||
|
|
- 定位Head实现
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Week 6:四任务训练
|
|||
|
|
**11-22 ~ 11-28**
|
|||
|
|
|
|||
|
|
| 天数 | 任务 | 时间 |
|
|||
|
|
|------|------|------|
|
|||
|
|
| Day 1-2 | 准备训练数据 | 2天 |
|
|||
|
|
| Day 3-7 | 四任务联合训练 | 5天 |
|
|||
|
|
|
|||
|
|
**训练策略**:
|
|||
|
|
```yaml
|
|||
|
|
# 阶段1: 冻结前三个任务,训练定位(3 epochs)
|
|||
|
|
model:
|
|||
|
|
freeze_heads: ['object', 'map', 'vector_map']
|
|||
|
|
|
|||
|
|
# 阶段2: 四任务联合fine-tune(5 epochs)
|
|||
|
|
model:
|
|||
|
|
freeze_heads: []
|
|||
|
|
loss_scale:
|
|||
|
|
object: 1.0
|
|||
|
|
map: 1.0
|
|||
|
|
vector_map: 1.0
|
|||
|
|
localization: 2.0
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**交付**:
|
|||
|
|
- 四任务模型
|
|||
|
|
- 定位精度: <0.5m(目标)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Week 7(可选):轨迹预测
|
|||
|
|
**11-29 ~ 12-05**
|
|||
|
|
|
|||
|
|
- 实现TrajectoryHead
|
|||
|
|
- 准备轨迹数据
|
|||
|
|
- 五任务训练
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### Week 8(可选):占用网格
|
|||
|
|
**12-06 ~ 12-12**
|
|||
|
|
|
|||
|
|
- 实现OccupancyHead
|
|||
|
|
- 准备占用数据
|
|||
|
|
- 六任务训练
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💾 数据需求
|
|||
|
|
|
|||
|
|
### 三任务(检测+分割+矢量)
|
|||
|
|
- ✅ nuScenes原始数据(已有)
|
|||
|
|
- 🆕 矢量地图标注(需提取)
|
|||
|
|
- 来源:nuScenes map API
|
|||
|
|
- 大小:~500MB
|
|||
|
|
- 时间:30分钟提取
|
|||
|
|
|
|||
|
|
### 四任务(+定位)
|
|||
|
|
- 🆕 BEV地图数据库
|
|||
|
|
- 4个地图区域
|
|||
|
|
- 每个区域100-200个tile
|
|||
|
|
- 总大小:~5GB
|
|||
|
|
- 构建时间:1-2天
|
|||
|
|
|
|||
|
|
- 🆕 位姿ground truth
|
|||
|
|
- 来源:nuScenes ego_pose
|
|||
|
|
- 精度:GPS/IMU融合(~1m)
|
|||
|
|
- 目标:提升到<0.5m
|
|||
|
|
|
|||
|
|
### 五任务(+轨迹)
|
|||
|
|
- 🆕 历史轨迹数据
|
|||
|
|
- 来源:nuScenes tracking
|
|||
|
|
- 时间跨度:过去2秒
|
|||
|
|
- 未来预测:3秒
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 最终系统能力
|
|||
|
|
|
|||
|
|
### 输出完整场景理解
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 一次前向传播输出
|
|||
|
|
perception_output = model(images, points, imu, gps)
|
|||
|
|
|
|||
|
|
outputs = {
|
|||
|
|
# Task 1: 3D检测
|
|||
|
|
'objects': [
|
|||
|
|
{
|
|||
|
|
'bbox_3d': [x, y, z, w, h, l, yaw],
|
|||
|
|
'class': 'car',
|
|||
|
|
'score': 0.95,
|
|||
|
|
'velocity': [vx, vy],
|
|||
|
|
'track_id': 123,
|
|||
|
|
},
|
|||
|
|
...
|
|||
|
|
],
|
|||
|
|
|
|||
|
|
# Task 2: BEV分割
|
|||
|
|
'segmentation': {
|
|||
|
|
'mask': (200, 200), # 语义分割
|
|||
|
|
'classes': ['drivable', 'walkway', ...],
|
|||
|
|
},
|
|||
|
|
|
|||
|
|
# Task 3: 矢量地图
|
|||
|
|
'vector_map': [
|
|||
|
|
{
|
|||
|
|
'type': 'lane_line',
|
|||
|
|
'points': [[x1,y1], [x2,y2], ...],
|
|||
|
|
'confidence': 0.9,
|
|||
|
|
},
|
|||
|
|
...
|
|||
|
|
],
|
|||
|
|
|
|||
|
|
# Task 4: 自车定位
|
|||
|
|
'ego_pose': {
|
|||
|
|
'position': [x, y, z],
|
|||
|
|
'orientation': [qw, qx, qy, qz],
|
|||
|
|
'uncertainty': covariance_matrix,
|
|||
|
|
},
|
|||
|
|
|
|||
|
|
# Task 5: 轨迹预测(可选)
|
|||
|
|
'trajectories': {
|
|||
|
|
123: { # track_id
|
|||
|
|
'future': [[x1,y1,t1], [x2,y2,t2], ...],
|
|||
|
|
'modes': ['turn_left', 'go_straight', ...],
|
|||
|
|
'probabilities': [0.6, 0.3, 0.1],
|
|||
|
|
},
|
|||
|
|
...
|
|||
|
|
},
|
|||
|
|
|
|||
|
|
# Task 6: 占用网格(可选)
|
|||
|
|
'occupancy': {
|
|||
|
|
'grid_3d': (200, 200, 16), # X, Y, Z
|
|||
|
|
'resolution': 0.4, # 米
|
|||
|
|
},
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔧 技术实现细节
|
|||
|
|
|
|||
|
|
### 共享Backbone策略
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
所有任务共享:
|
|||
|
|
├── Camera Encoder (60M参数)
|
|||
|
|
├── LiDAR Encoder (10M参数)
|
|||
|
|
├── Fuser (2M参数)
|
|||
|
|
└── BEV Decoder (20M参数)
|
|||
|
|
|
|||
|
|
任务特定Head:
|
|||
|
|
├── Object Head (8M) ← 已有
|
|||
|
|
├── Map Head (10M) ← 已有
|
|||
|
|
├── VectorMap Head (15M) ← 新增
|
|||
|
|
├── Localization Head (5M) ← 新增
|
|||
|
|
├── Trajectory Head (8M) ← 新增(可选)
|
|||
|
|
└── Occupancy Head (7M) ← 新增(可选)
|
|||
|
|
|
|||
|
|
总参数量:
|
|||
|
|
- 三任务: 125M
|
|||
|
|
- 四任务: 130M
|
|||
|
|
- 五任务: 138M
|
|||
|
|
- 六任务: 145M
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💡 关键技术点
|
|||
|
|
|
|||
|
|
### 1. 多任务Loss平衡
|
|||
|
|
```yaml
|
|||
|
|
loss_scale:
|
|||
|
|
object: 1.0 # 检测
|
|||
|
|
map: 1.0 # 分割
|
|||
|
|
vector_map: 1.0 # 矢量地图
|
|||
|
|
localization: 2.0 # 定位(权重稍高)
|
|||
|
|
trajectory: 0.5 # 轨迹(权重较低)
|
|||
|
|
occupancy: 0.8 # 占用
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 训练策略
|
|||
|
|
```
|
|||
|
|
阶段1: 冻结前N个任务,训练新任务(3-5 epochs)
|
|||
|
|
阶段2: 所有任务联合fine-tune(5-8 epochs)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 数据增强
|
|||
|
|
```python
|
|||
|
|
# 保持空间一致性
|
|||
|
|
augmentation:
|
|||
|
|
- RandomFlip3D: 同时翻转所有任务的GT
|
|||
|
|
- GlobalRotScaleTrans: 同时变换
|
|||
|
|
- 确保位姿、轨迹、地图的一致性
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 性能目标
|
|||
|
|
|
|||
|
|
### 最小目标
|
|||
|
|
- 检测mAP: >63%
|
|||
|
|
- 分割mIoU: >55%
|
|||
|
|
- 矢量地图mAP: >48%
|
|||
|
|
- 定位误差: <1.0m
|
|||
|
|
- 轨迹ADE: <1.5m(3秒)
|
|||
|
|
|
|||
|
|
### 理想目标
|
|||
|
|
- 检测mAP: >65%
|
|||
|
|
- 分割mIoU: >60%
|
|||
|
|
- 矢量地图mAP: >52%
|
|||
|
|
- 定位误差: <0.5m
|
|||
|
|
- 轨迹ADE: <1.0m(3秒)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 快速开始方案
|
|||
|
|
|
|||
|
|
### 最小可行系统(MVP)
|
|||
|
|
|
|||
|
|
**推荐配置**:三任务 + 定位
|
|||
|
|
```
|
|||
|
|
核心能力:
|
|||
|
|
├── 3D检测 ✅
|
|||
|
|
├── BEV分割 ✅
|
|||
|
|
├── 矢量地图 🆕
|
|||
|
|
└── 自车定位 🆕
|
|||
|
|
|
|||
|
|
时间: 3-4周
|
|||
|
|
参数: 130M
|
|||
|
|
推理: 120ms
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**应用场景**:
|
|||
|
|
- ✅ 完整的感知
|
|||
|
|
- ✅ 精确的定位
|
|||
|
|
- ✅ 高精地图构建
|
|||
|
|
- ✅ 满足L2+自动驾驶需求
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 立即可做的准备工作
|
|||
|
|
|
|||
|
|
### 1. MapTR代码研究(4小时)
|
|||
|
|
```bash
|
|||
|
|
cd /workspace
|
|||
|
|
git clone https://github.com/hustvl/MapTR.git
|
|||
|
|
# 研究MapTRHead实现
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 定位方案设计(2小时)
|
|||
|
|
```
|
|||
|
|
- 选择定位方案(地图匹配 vs VIO)
|
|||
|
|
- 设计数据流
|
|||
|
|
- 准备BEV地图tile方案
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 数据需求分析(1小时)
|
|||
|
|
```
|
|||
|
|
- 矢量地图标注
|
|||
|
|
- BEV地图数据库
|
|||
|
|
- 位姿ground truth
|
|||
|
|
- 存储空间需求
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 建议实施路径
|
|||
|
|
|
|||
|
|
### 路径1:全功能系统(7-8周)⭐⭐⭐⭐⭐
|
|||
|
|
```
|
|||
|
|
Week 1-2: 当前训练完成 ✅
|
|||
|
|
Week 3-4: 三任务(+矢量地图)
|
|||
|
|
Week 5: 四任务(+定位)
|
|||
|
|
Week 6: 五任务(+轨迹)可选
|
|||
|
|
Week 7: 优化和部署
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 路径2:核心系统(4-5周)⭐⭐⭐⭐⭐
|
|||
|
|
```
|
|||
|
|
Week 1-2: 当前训练完成 ✅
|
|||
|
|
Week 3-4: 三任务(+矢量地图)
|
|||
|
|
Week 5: 四任务(+定位)
|
|||
|
|
↓
|
|||
|
|
直接进入部署和优化
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 路径3:快速验证(3周)⭐⭐⭐
|
|||
|
|
```
|
|||
|
|
Week 1-2: 当前训练完成 ✅
|
|||
|
|
Week 3: 跳过MapTR,直接实现简化定位
|
|||
|
|
↓
|
|||
|
|
快速部署验证
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**下一步决策**:需要确定扩展范围和时间预算!
|
|||
|
|
|
|||
|
|
|