# BEVFusion 多任务多头支持指南 ## ✅ 答案:完全支持! BEVFusion **完全支持同时进行3D目标检测和BEV地图分割**,这是该框架的核心设计特点之一。 ## 架构设计 ### 多头结构 ```python BEVFusion ├── Encoders (多模态编码器) │ ├── Camera Encoder │ └── LiDAR Encoder ├── Fuser (特征融合) ├── Decoder (BEV解码器) └── Heads (多任务头) ★ ├── object: 3D目标检测头 (TransFusion/CenterPoint) └── map: BEV地图分割头 (BEVSegmentationHead) ``` ### 代码实现(bevfusion.py) ```python class BEVFusion(Base3DFusionModel): def __init__(self, encoders, fuser, decoder, heads, **kwargs): # 初始化多个任务头 self.heads = nn.ModuleDict() for name in heads: if heads[name] is not None: self.heads[name] = build_head(heads[name]) # 为每个任务设置损失权重 self.loss_scale = dict() for name in heads: if heads[name] is not None: self.loss_scale[name] = 1.0 def forward_single(self, ...): # 1. 多模态特征提取和融合 features = [] for sensor in self.encoders: feature = self.extract_features(...) features.append(feature) # 2. 特征融合 x = self.fuser(features) # 3. BEV解码 x = self.decoder["backbone"](x) x = self.decoder["neck"](x) # 4. 多任务头处理 if self.training: outputs = {} for type, head in self.heads.items(): if type == "object": # 3D目标检测 pred_dict = head(x, metas) losses = head.loss(gt_bboxes_3d, gt_labels_3d, pred_dict) elif type == "map": # BEV地图分割 losses = head(x, gt_masks_bev) # 收集损失 for name, val in losses.items(): outputs[f"loss/{type}/{name}"] = val * self.loss_scale[type] return outputs else: # 推理模式:同时输出检测和分割结果 outputs = [{} for _ in range(batch_size)] for type, head in self.heads.items(): if type == "object": pred_dict = head(x, metas) bboxes = head.get_bboxes(pred_dict, metas) for k, (boxes, scores, labels) in enumerate(bboxes): outputs[k].update({ "boxes_3d": boxes.to("cpu"), "scores_3d": scores.cpu(), "labels_3d": labels.cpu(), }) elif type == "map": logits = head(x) for k in range(batch_size): outputs[k].update({ "masks_bev": logits[k].cpu(), "gt_masks_bev": gt_masks_bev[k].cpu(), }) return outputs ``` ## 配置文件示例 ### 方案1:仅检测(configs/nuscenes/det/default.yaml) ```yaml model: type: BEVFusion heads: object: # 启用检测头 type: TransFusionHead # ... 检测头配置 map: null # 禁用分割头 ``` ### 方案2:仅分割(configs/nuscenes/seg/default.yaml) ```yaml model: type: BEVFusion heads: object: null # 禁用检测头 map: # 启用分割头 type: BEVSegmentationHead # ... 分割头配置 ``` ### 方案3:多任务(检测 + 分割)✨ ```yaml model: type: BEVFusion encoders: camera: backbone: type: SwinTransformer # ... camera配置 neck: type: GeneralizedLSSFPN # ... neck配置 vtransform: type: LSSTransform # ... vtransform配置 lidar: voxelize: # ... 体素化配置 backbone: type: SparseEncoder # ... lidar backbone配置 fuser: type: ConvFuser in_channels: [80, 256] out_channels: 256 decoder: backbone: type: SECOND in_channels: 256 out_channels: [128, 256] # ... decoder配置 neck: type: SECONDFPN in_channels: [128, 256] out_channels: [256, 256] # ... neck配置 heads: # 任务1:3D目标检测 object: type: TransFusionHead num_proposals: 200 auxiliary: true in_channels: 512 num_classes: 10 num_heads: 8 nms_kernel_size: 3 ffn_channel: 256 dropout: 0.1 common_heads: center: [2, 2] height: [1, 2] dim: [3, 2] rot: [2, 2] vel: [2, 2] bbox_coder: type: TransFusionBBoxCoder pc_range: [-54.0, -54.0] post_center_range: [-61.2, -61.2, -10.0, 61.2, 61.2, 10.0] voxel_size: [0.075, 0.075] loss_cls: type: FocalLoss use_sigmoid: true gamma: 2.0 alpha: 0.25 reduction: mean loss_bbox: type: L1Loss reduction: mean loss_weight: 0.25 loss_iou: type: GIoULoss reduction: mean loss_weight: 0.0 # 任务2:BEV地图分割 map: type: BEVSegmentationHead in_channels: 512 grid_transform: input_scope: [[-54.0, 54.0, 0.8], [-54.0, 54.0, 0.8]] output_scope: [[-50, 50, 0.5], [-50, 50, 0.5]] classes: ['drivable_area', 'ped_crossing', 'walkway', 'stop_line', 'carpark_area', 'divider'] loss: type: FocalLoss # 或 CrossEntropyLoss use_sigmoid: true gamma: 2.0 alpha: 0.25 # 可选:为不同任务设置不同的损失权重 loss_scale: object: 1.0 # 检测损失权重 map: 1.0 # 分割损失权重 ``` ## 创建多任务配置 ### 步骤1:创建配置文件 创建 `configs/nuscenes/multitask/fusion-det-seg.yaml`: ```yaml # 继承基础配置 _base_: - ../default.yaml # 模型配置 model: type: BEVFusion # 编码器(复用检测的配置) encoders: camera: backbone: type: SwinTransformer embed_dims: 96 depths: [2, 2, 6, 2] num_heads: [3, 6, 12, 24] window_size: 7 mlp_ratio: 4 qkv_bias: true qk_scale: null drop_rate: 0. attn_drop_rate: 0. drop_path_rate: 0.2 patch_norm: true out_indices: [1, 2, 3] with_cp: false convert_weights: true init_cfg: type: Pretrained checkpoint: pretrained/swint-nuimages-pretrained.pth neck: type: GeneralizedLSSFPN in_channels: [192, 384, 768] out_channels: 256 start_level: 0 num_outs: 3 vtransform: type: LSSTransform in_channels: 256 out_channels: 80 image_size: [256, 704] feature_size: [32, 88] xbound: [-54.0, 54.0, 0.3] ybound: [-54.0, 54.0, 0.3] zbound: [-10.0, 10.0, 20.0] dbound: [1.0, 60.0, 0.5] downsample: 2 lidar: voxelize: max_num_points: 10 point_cloud_range: [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0] voxel_size: [0.075, 0.075, 0.2] max_voxels: [120000, 160000] backbone: type: SparseEncoder in_channels: 5 sparse_shape: [1440, 1440, 41] output_channels: 128 order: [conv, norm, act] encoder_channels: - [16, 16, 32] - [32, 32, 64] - [64, 64, 128] - [128, 128] encoder_paddings: - [0, 0, 1] - [0, 0, 1] - [0, 0, [1, 1, 0]] - [0, 0] block_type: basicblock # 融合器 fuser: type: ConvFuser in_channels: [80, 256] out_channels: 256 # 解码器 decoder: backbone: type: SECOND in_channels: 256 out_channels: [128, 256] layer_nums: [5, 5] layer_strides: [1, 2] neck: type: SECONDFPN in_channels: [128, 256] out_channels: [256, 256] upsample_strides: [1, 2] # 多任务头 heads: # 3D目标检测 object: type: TransFusionHead in_channels: 512 num_proposals: 200 auxiliary: true num_classes: 10 num_heads: 8 nms_kernel_size: 3 ffn_channel: 256 dropout: 0.1 common_heads: center: [2, 2] height: [1, 2] dim: [3, 2] rot: [2, 2] vel: [2, 2] loss_cls: type: FocalLoss use_sigmoid: true gamma: 2.0 alpha: 0.25 loss_bbox: type: L1Loss loss_weight: 0.25 # BEV地图分割 map: type: BEVSegmentationHead in_channels: 512 classes: ['drivable_area', 'ped_crossing', 'walkway', 'stop_line', 'carpark_area', 'divider'] loss: focal # 损失权重(可选) loss_scale: object: 1.0 map: 1.0 # 训练配置 optimizer: type: AdamW lr: 2.0e-4 # 多任务可能需要调整学习率 weight_decay: 0.01 lr_config: policy: CosineAnnealing warmup: linear warmup_iters: 500 warmup_ratio: 0.33333333 min_lr_ratio: 1.0e-3 runner: type: EpochBasedRunner max_epochs: 20 # 评估配置 evaluation: interval: 1 pipeline: # 同时评估检测和分割 - type: DetEval metric: bbox - type: SegEval metric: map ``` ### 步骤2:训练命令 ```bash # 多任务训练 torchpack dist-run -np 8 python tools/train.py \ configs/nuscenes/multitask/fusion-det-seg.yaml \ --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \ --load_from pretrained/lidar-only-det.pth ``` ### 步骤3:测试/推理 ```bash # 多任务测试(同时评估检测和分割) torchpack dist-run -np 8 python tools/test.py \ configs/nuscenes/multitask/fusion-det-seg.yaml \ runs/multitask/latest.pth \ --eval bbox map ``` ## 输出结果格式 ### 训练时输出(损失) ```python { 'loss/object/heatmap': 0.234, 'loss/object/bbox': 0.456, 'loss/object/iou': 0.123, 'loss/map/seg': 0.345, 'loss/depth': 0.089, # 如果使用BEVDepth 'stats/object/...': ..., 'stats/map/...': ... } ``` ### 推理时输出(预测结果) ```python # 每个样本的输出 [ { # 3D检测结果 'boxes_3d': LiDARInstance3DBoxes(...), # 形状: (N, 9) 'scores_3d': tensor([...]), # 形状: (N,) 'labels_3d': tensor([...]), # 形状: (N,) # BEV分割结果 'masks_bev': tensor([[...]]), # 形状: (C, H, W) 'gt_masks_bev': tensor([[...]]) # 形状: (C, H, W) - 如果有GT }, # ... 更多样本 ] ``` ## 可视化多任务结果 ```python import torch import matplotlib.pyplot as plt from mmdet3d.core.bbox import LiDARInstance3DBoxes def visualize_multitask_results(data, prediction): """可视化多任务输出""" # 1. 可视化3D检测框(BEV视图) boxes_3d = prediction['boxes_3d'] scores_3d = prediction['scores_3d'] labels_3d = prediction['labels_3d'] # 2. 可视化BEV分割 masks_bev = prediction['masks_bev'] # (C, H, W) fig, axes = plt.subplots(1, 2, figsize=(15, 7)) # 左图:3D检测 ax = axes[0] # 绘制BEV平面和检测框 for box, score, label in zip(boxes_3d.tensor, scores_3d, labels_3d): # 绘制框 (简化示例) corners = boxes_3d.corners[[i]] # ... 绘制逻辑 ax.set_title('3D Object Detection') # 右图:BEV分割 ax = axes[1] seg_map = torch.argmax(masks_bev, dim=0) # (H, W) im = ax.imshow(seg_map.cpu().numpy()) ax.set_title('BEV Map Segmentation') plt.colorbar(im, ax=ax) plt.tight_layout() plt.savefig('multitask_result.png') ``` ## 性能和资源消耗 ### 单任务 vs 多任务对比 | 配置 | 显存/GPU | 训练时间 | 性能 | |------|----------|----------|------| | 仅检测 | ~18GB | 20-24h | mAP: 68-70% | | 仅分割 | ~14GB | 12-15h | mIoU: 62-63% | | **多任务** | **~22GB** | **28-32h** | **mAP: 67-69%
mIoU: 61-62%** | 注意事项: - 多任务训练显存消耗略高(增加约4GB) - 训练时间约为两个单任务之和 - 性能可能略低于单独训练,但共享特征提取带来效率提升 - 推理时可以同时输出两种结果,无需多次forward ### 优化建议 1. **调整损失权重** ```yaml loss_scale: object: 1.0 # 可以调整为 0.5-2.0 map: 1.0 # 可以调整为 0.5-2.0 ``` 2. **渐进式训练策略** ```bash # 阶段1:先训练检测(冻结分割头) # 阶段2:再训练分割(冻结检测头) # 阶段3:联合fine-tuning ``` 3. **使用更大的batch size** ```yaml data: samples_per_gpu: 2 # 如果显存允许 ``` ## 实际应用场景 ### 1. 自动驾驶完整感知 ``` 多任务输出: ├── 3D目标检测 → 车辆、行人、障碍物 └── BEV分割 → 可行驶区域、人行横道、停车区域 优势: - 统一的BEV表示 - 共享特征提取 - 一次推理获得完整场景理解 ``` ### 2. 实时系统部署 ``` 检测 + 分割 (多任务) vs 两个单独模型 ├── 推理时间:1x vs 1.8x ├── 显存占用:1x vs 1.6x └── 参数量:1x vs 1.7x ``` ### 3. 端到端训练 ``` 优势: - 两个任务互相促进 - 分割帮助检测理解场景结构 - 检测帮助分割关注重要区域 ``` ## 常见问题 ### Q1: 多任务训练会影响单个任务的性能吗? **A**: 可能会有轻微影响(1-2%),但: - 共享特征提取带来的效率提升 - 两个任务可以互相促进 - 实际应用中往往需要同时获得两种结果 ### Q2: 可以只推理其中一个任务吗? **A**: 可以!在配置文件中设置: ```yaml heads: object: {...} # 保留 map: null # 禁用 ``` ### Q3: 如何平衡两个任务的损失? **A**: 调整 `loss_scale`: ```yaml loss_scale: object: 2.0 # 更关注检测 map: 1.0 ``` ### Q4: 多任务训练需要什么数据? **A**: 需要同时包含: - 3D检测标注 (gt_bboxes_3d, gt_labels_3d) - BEV分割标注 (gt_masks_bev) nuScenes数据集同时提供这两种标注。 ### Q5: 可以添加更多任务头吗? **A**: 完全可以!例如添加速度预测、轨迹预测等: ```python heads: object: {...} map: {...} velocity: {...} # 自定义任务头 trajectory: {...} # 自定义任务头 ``` ## 总结 ✅ **BEVFusion完全支持多任务多头输出** - ✅ 同时进行3D检测和BEV分割 - ✅ 共享特征提取和BEV表示 - ✅ 统一的训练和推理流程 - ✅ 灵活的配置系统 - ✅ 可扩展到更多任务 🚀 **推荐使用多任务配置** - 提高推理效率 - 任务间互相促进 - 更完整的场景理解 - 适合实际应用部署 --- 生成时间: 2025-10-16