bev-project/project/docs/CUSTOM_SENSOR_MIGRATION_GUI...

41 KiB
Raw Blame History

BEVFusion迁移到自定义传感器配置指南

🎯 目标配置

nuScenes配置 (原始)

LiDAR: 1个32线旋转式LiDAR
Camera: 6个环视相机
  - Front (前)
  - Front Left (左前)
  - Front Right (右前)
  - Back (后)
  - Back Left (左后)
  - Back Right (右后)

您的配置 (目标)

LiDAR: 1个80线360度激光雷达 ✅ 更高分辨率
Camera: 4路相机
  - Front Wide (前视广角)
  - Front Tele (前视长焦) ← 新增
  - Front Left (左前)
  - Front Right (右前)

关键差异:

  • LiDAR升级32线→80线点云密度更高
  • ⚠️ 相机减少6个→4个减少后向覆盖
  • ⚠️ 新增长焦:需要特殊处理

📋 迁移步骤总览

步骤1: 数据格式转换 → 转为mmdet3d格式
步骤2: 标定参数处理 → 相机内外参、LiDAR标定
步骤3: 数据集类定义 → 自定义Dataset
步骤4: 配置文件修改 → 适配4相机+80线LiDAR
步骤5: Pipeline调整 → 数据增强和预处理
步骤6: 训练和调优 → 开始训练

步骤1: 数据格式转换

1.1 原始数据组织

建议的目录结构:

data/custom_dataset/
├── lidar/                    LiDAR点云数据
│   ├── scene_001/
│   │   ├── 000000.bin       (N, 4) 格式x,y,z,intensity
│   │   ├── 000001.bin
│   │   └── ...
│   └── scene_002/
│       └── ...
├── camera/                   相机图片
│   ├── scene_001/
│   │   ├── front_wide/
│   │   │   ├── 000000.jpg
│   │   │   └── ...
│   │   ├── front_tele/
│   │   │   └── ...
│   │   ├── front_left/
│   │   │   └── ...
│   │   └── front_right/
│   │       └── ...
├── calibration/              标定数据
│   ├── scene_001_calib.json
│   └── ...
├── annotations/              标注数据
│   ├── scene_001_anno.json  3D框标注
│   └── scene_001_seg.png    BEV分割标注
└── splits/                   数据集划分
    ├── train.txt
    ├── val.txt
    └── test.txt

1.2 标定文件格式

{
    "scene_id": "scene_001",
    "timestamp": 1634567890,
    
    "lidar_to_ego": {
        "translation": [0.0, 0.0, 1.8],
        "rotation": [1.0, 0.0, 0.0, 0.0]
    },
    
    "cameras": {
        "front_wide": {
            "intrinsic": [
                [fx, 0, cx],
                [0, fy, cy],
                [0, 0, 1]
            ],
            "extrinsic": {
                "translation": [1.5, 0.0, 1.5],
                "rotation": [1.0, 0.0, 0.0, 0.0]
            },
            "distortion": [k1, k2, p1, p2, k3],
            "image_size": [1920, 1080]
        },
        "front_tele": {
            "intrinsic": [
                [fx_tele, 0, cx_tele],
                [0, fy_tele, cy_tele],
                [0, 0, 1]
            ],
            "extrinsic": {
                "translation": [1.5, 0.0, 1.5],
                "rotation": [1.0, 0.0, 0.0, 0.0]
            },
            "distortion": [...],
            "image_size": [1920, 1080],
            "fov": 30.0
        },
        "front_left": {...},
        "front_right": {...}
    },
    
    "annotations": {
        "boxes_3d": [
            {
                "center": [x, y, z],
                "size": [w, l, h],
                "rotation": yaw,
                "velocity": [vx, vy],
                "class": "car",
                "track_id": 1
            },
            ...
        ],
        "segmentation": {
            "file": "annotations/scene_001_seg.png",
            "classes": {
                "0": "background",
                "1": "drivable_area",
                "2": "lane",
                ...
            }
        }
    }
}

1.3 数据转换脚本

# tools/data_converter/custom_to_mmdet3d.py

import numpy as np
import pickle
import json
from pathlib import Path

def convert_custom_to_mmdet3d(data_root, output_dir):
    """
    将自定义数据集转换为mmdet3d格式
    """
    
    data_infos = []
    
    # 读取数据列表
    scenes = sorted(Path(data_root).glob('*/'))
    
    for scene_dir in scenes:
        # 加载标定
        calib = load_calibration(scene_dir / 'calibration.json')
        
        # 遍历帧
        lidar_files = sorted((scene_dir / 'lidar').glob('*.bin'))
        
        for frame_idx, lidar_file in enumerate(lidar_files):
            timestamp = int(lidar_file.stem)
            
            # 构建info字典
            info = {
                'lidar_path': str(lidar_file),
                'token': f"{scene_dir.name}_{timestamp}",
                'timestamp': timestamp,
                
                # 相机信息4个相机
                'cams': {
                    'FRONT_WIDE': {
                        'data_path': str(scene_dir / f'camera/front_wide/{timestamp:06d}.jpg'),
                        'type': 'camera',
                        'sample_data_token': f'cam_front_wide_{timestamp}',
                        'sensor2ego_translation': calib['cameras']['front_wide']['translation'],
                        'sensor2ego_rotation': calib['cameras']['front_wide']['rotation'],
                        'ego2global_translation': [0, 0, 0],
                        'ego2global_rotation': [1, 0, 0, 0],
                        'timestamp': timestamp,
                        'camera_intrinsic': calib['cameras']['front_wide']['intrinsic'],
                        'width': 1920,
                        'height': 1080,
                    },
                    'FRONT_TELE': {
                        'data_path': str(scene_dir / f'camera/front_tele/{timestamp:06d}.jpg'),
                        'type': 'camera',
                        'sample_data_token': f'cam_front_tele_{timestamp}',
                        'sensor2ego_translation': calib['cameras']['front_tele']['translation'],
                        'sensor2ego_rotation': calib['cameras']['front_tele']['rotation'],
                        'ego2global_translation': [0, 0, 0],
                        'ego2global_rotation': [1, 0, 0, 0],
                        'timestamp': timestamp,
                        'camera_intrinsic': calib['cameras']['front_tele']['intrinsic'],
                        'width': 1920,
                        'height': 1080,
                        'is_tele': True,  # 标记为长焦相机
                    },
                    'FRONT_LEFT': {...},
                    'FRONT_RIGHT': {...},
                },
                
                # LiDAR信息
                'lidar2ego_translation': calib['lidar_to_ego']['translation'],
                'lidar2ego_rotation': calib['lidar_to_ego']['rotation'],
                'ego2global_translation': [0, 0, 0],
                'ego2global_rotation': [1, 0, 0, 0],
                
                # 标注信息
                'gt_boxes': load_annotations(scene_dir / f'annotations/{timestamp:06d}.json'),
                'gt_names': [...],
                'gt_velocity': [...],
                'num_lidar_pts': [...],
                'num_radar_pts': [0] * len(gt_boxes),  # 无radar
                'valid_flag': [True] * len(gt_boxes),
            }
            
            data_infos.append(info)
    
    # 保存为pkl文件
    output_file = Path(output_dir) / 'custom_infos_train.pkl'
    with open(output_file, 'wb') as f:
        pickle.dump(data_infos, f)
    
    print(f"转换完成!生成{len(data_infos)}个样本")
    print(f"保存到: {output_file}")
    
    return data_infos


def load_calibration(calib_file):
    """加载标定文件"""
    with open(calib_file, 'r') as f:
        calib = json.load(f)
    return calib


def load_annotations(anno_file):
    """加载3D框标注"""
    with open(anno_file, 'r') as f:
        anno = json.load(f)
    
    boxes = []
    for obj in anno['objects']:
        box = np.array([
            obj['center'][0],
            obj['center'][1],
            obj['center'][2],
            obj['size'][0],  # w
            obj['size'][1],  # l
            obj['size'][2],  # h
            obj['rotation'],
        ])
        boxes.append(box)
    
    return np.array(boxes)


# 使用方法
if __name__ == '__main__':
    convert_custom_to_mmdet3d(
        data_root='data/custom_dataset',
        output_dir='data/custom_dataset'
    )

步骤2: 自定义Dataset类

# mmdet3d/datasets/custom_dataset.py

from .nuscenes_dataset import NuScenesDataset
from mmdet.datasets import DATASETS

@DATASETS.register_module()
class CustomDataset(NuScenesDataset):
    """自定义数据集4相机+80线LiDAR"""
    
    # 定义类别(根据您的标注)
    CLASSES = (
        'car', 'truck', 'bus', 'motorcycle', 'bicycle',
        'pedestrian', 'traffic_cone', 'barrier'
    )
    
    # 相机名称4个相机
    CAM_SENSORS = [
        'FRONT_WIDE',   # 前视广角
        'FRONT_TELE',   # 前视长焦
        'FRONT_LEFT',   # 左前
        'FRONT_RIGHT',  # 右前
    ]
    
    def __init__(
        self,
        ann_file,
        pipeline=None,
        dataset_root=None,
        object_classes=None,
        map_classes=None,
        modality=None,
        box_type_3d='LiDAR',
        filter_empty_gt=True,
        test_mode=False,
        **kwargs
    ):
        # 设置相机数量
        self.num_cams = 4  # 修改为4原来是6
        
        super().__init__(
            ann_file=ann_file,
            pipeline=pipeline,
            dataset_root=dataset_root,
            object_classes=object_classes,
            map_classes=map_classes,
            modality=modality,
            box_type_3d=box_type_3d,
            filter_empty_gt=filter_empty_gt,
            test_mode=test_mode,
            **kwargs
        )
    
    def get_data_info(self, index):
        """获取数据信息"""
        info = self.data_infos[index]
        
        # 准备相机数据4个相机
        image_paths = []
        lidar2img_rts = []
        lidar2cam_rts = []
        cam_intrinsics = []
        
        for cam_name in self.CAM_SENSORS:
            cam_info = info['cams'][cam_name]
            
            # 图片路径
            image_paths.append(cam_info['data_path'])
            
            # 计算变换矩阵
            lidar2cam_r, lidar2cam_t = self.get_lidar2cam(info, cam_name)
            lidar2cam_rt = np.eye(4)
            lidar2cam_rt[:3, :3] = lidar2cam_r
            lidar2cam_rt[:3, 3] = lidar2cam_t
            
            # 相机内参
            intrinsic = np.array(cam_info['camera_intrinsic'])
            viewpad = np.eye(4)
            viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic
            
            # lidar2img变换
            lidar2img_rt = viewpad @ lidar2cam_rt
            
            lidar2img_rts.append(lidar2img_rt)
            lidar2cam_rts.append(lidar2cam_rt)
            cam_intrinsics.append(viewpad)
        
        # 构建输入字典
        input_dict = {
            'sample_idx': index,
            'pts_filename': info['lidar_path'],
            'sweeps': [],  # 如果有多帧点云sweep
            'timestamp': info['timestamp'],
            'img_filename': image_paths,
            'lidar2img': lidar2img_rts,
            'cam_intrinsic': cam_intrinsics,
            'lidar2cam': lidar2cam_rts,
        }
        
        # 添加标注(如果不是测试模式)
        if not self.test_mode:
            annos = self.get_ann_info(index)
            input_dict['ann_info'] = annos
        
        return input_dict
    
    def handle_tele_camera(self, data):
        """
        处理长焦相机的特殊逻辑
        
        长焦相机的特点:
          - FOV小如30度 vs 广角的120度
          - 分辨率高
          - 适合远距离检测
        
        处理方式:
          1. 单独的resize策略
          2. 不同的crop范围
          3. 可能需要单独的backbone分支
        """
        # 检测是否是长焦相机
        for i, cam_name in enumerate(self.CAM_SENSORS):
            if 'TELE' in cam_name:
                # 长焦相机特殊处理
                # 例如:使用更大的输入分辨率
                data['img'][i] = resize_keep_ratio(data['img'][i], (512, 1408))
        
        return data

1.4 注册Dataset

# mmdet3d/datasets/__init__.py

from .custom_dataset import CustomDataset

__all__ = [
    ...,
    'CustomDataset',
]

步骤3: 配置文件修改

3.1 基础配置

# configs/custom/default.yaml

dataset_type: CustomDataset
dataset_root: data/custom_dataset/

# LiDAR配置80线更高分辨率
reduce_beams: 80  # 从32改为80
load_dim: 4       # x,y,z,intensity
use_dim: 4        # 使用全部维度

# 点云范围(根据您的车辆调整)
point_cloud_range: [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0]

# 体素大小可以更小利用80线的高分辨率
voxel_size: [0.05, 0.05, 0.2]  # 从0.075改为0.05(更精细)

# 相机配置4个相机
image_size: [512, 1408]  # 可以根据需要调整

# 相机名称映射
cam_names: 
  - FRONT_WIDE
  - FRONT_TELE
  - FRONT_LEFT
  - FRONT_RIGHT

# 类别定义
object_classes:
  - car
  - truck
  - bus
  - motorcycle
  - bicycle
  - pedestrian
  - traffic_cone
  - barrier

map_classes:
  - drivable_area
  - lane
  - ped_crossing
  - boundary

# 数据增强参数
augment2d:
  resize: [[0.4, 0.6], [0.5, 0.5]]
  rotate: [-5.4, 5.4]
  gridmask:
    prob: 0.0
    fixed_prob: true

augment3d:
  scale: [0.95, 1.05]  # 更保守80线LiDAR更精确
  rotate: [-0.78539816, 0.78539816]
  translate: 0.5

# 模态配置
input_modality:
  use_lidar: true
  use_camera: true
  use_radar: false
  use_map: false
  use_external: false

3.2 模型配置适配4相机

# configs/custom/bevfusion_4cam_80lidar.yaml

_base_: ./default.yaml

model:
  type: BEVFusion
  
  encoders:
    camera:
      backbone:
        type: SwinTransformer
        embed_dims: 96
        depths: [2, 2, 6, 2]
        num_heads: [3, 6, 12, 24]
        window_size: 7
        # ... SwinTransformer配置
      
      neck:
        type: GeneralizedLSSFPN
        in_channels: [192, 384, 768]
        out_channels: 256
        start_level: 0
        num_outs: 3
      
      vtransform:
        type: DepthLSSTransform
        in_channels: 256
        out_channels: 80
        image_size: ${image_size}
        feature_size: ${[image_size[0] // 8, image_size[1] // 8]}
        xbound: [-54.0, 54.0, 0.3]
        ybound: [-54.0, 54.0, 0.3]
        zbound: [-10.0, 10.0, 20.0]
        dbound: [1.0, 60.0, 0.5]
        downsample: 2
        # 特殊处理长焦相机
        camera_aware: true  # 启用相机感知(不同相机不同处理)
    
    lidar:
      voxelize:
        max_num_points: 20  # 从10改为2080线点更多
        point_cloud_range: ${point_cloud_range}
        voxel_size: ${voxel_size}
        max_voxels: [180000, 240000]  # 增加(更精细的体素)
      
      backbone:
        type: SparseEncoder
        in_channels: 4  # x,y,z,intensity
        sparse_shape: [2160, 2160, 41]  # 适配0.05体素大小
        output_channels: 256  # 增加输出通道(更强的特征)
        encoder_channels:
          - [16, 16, 32]
          - [32, 32, 64]
          - [64, 64, 128]
          - [128, 128, 256]  # 增加一层
        encoder_paddings:
          - [0, 0, 1]
          - [0, 0, 1]
          - [0, 0, [1, 1, 0]]
          - [0, 0]
        block_type: basicblock
  
  fuser:
    type: ConvFuser
    in_channels: [80, 256]  # camera和lidar的输出通道
    out_channels: 256
  
  decoder:
    backbone:
      type: SECOND
      in_channels: 256
      out_channels: [128, 256]
      layer_nums: [5, 5]
      layer_strides: [1, 2]
    
    neck:
      type: SECONDFPN
      in_channels: [128, 256]
      out_channels: [256, 256]
      upsample_strides: [1, 2]
  
  heads:
    # 3D检测
    object:
      type: TransFusionHead
      in_channels: 512
      num_proposals: 200
      num_classes: 8  # 您的类别数
      # ... 其他配置
    
    # BEV分割
    map:
      type: BEVSegmentationHead
      in_channels: 512
      classes: ${map_classes}
  
  loss_scale:
    object: 1.0
    map: 1.0

# 数据配置
data:
  samples_per_gpu: 1  # 4相机内存占用较少可以增大
  workers_per_gpu: 0  # 根据实际调整
  
  train:
    type: CBGSDataset
    dataset:
      type: ${dataset_type}
      dataset_root: ${dataset_root}
      ann_file: ${dataset_root + "custom_infos_train.pkl"}
      pipeline: ${train_pipeline}
      object_classes: ${object_classes}
      map_classes: ${map_classes}
      modality: ${input_modality}
      test_mode: false
      box_type_3d: LiDAR
  
  val:
    type: ${dataset_type}
    dataset_root: ${dataset_root}
    ann_file: ${dataset_root + "custom_infos_val.pkl"}
    pipeline: ${test_pipeline}
    object_classes: ${object_classes}
    map_classes: ${map_classes}
    modality: ${input_modality}
    test_mode: true
    box_type_3d: LiDAR

# 训练配置
max_epochs: 24
optimizer:
  type: AdamW
  lr: 2.0e-4
  weight_decay: 0.01

步骤4: 数据Pipeline调整

4.1 修改LoadMultiViewImageFromFiles

# mmdet3d/datasets/pipelines/loading.py

@PIPELINES.register_module()
class LoadMultiViewImageFromFiles:
    """加载多视角图像支持4相机+长焦)"""
    
    def __init__(self, to_float32=False, color_type='color', num_views=4):
        self.to_float32 = to_float32
        self.color_type = color_type
        self.num_views = num_views  # 设置为4
    
    def __call__(self, results):
        """
        读取4个相机的图像
        
        特殊处理:
          - front_tele相机可能需要不同的预处理
        """
        filename = results['img_filename']
        images = []
        
        for i, name in enumerate(filename):
            img = mmcv.imread(name, self.color_type)
            
            # 检查是否是长焦相机
            if 'tele' in name.lower():
                # 长焦相机特殊处理
                # 例如:不同的归一化参数
                pass
            
            if self.to_float32:
                img = img.astype(np.float32)
            
            images.append(img)
        
        results['img'] = images
        results['img_shape'] = [img.shape for img in images]
        results['ori_shape'] = [img.shape for img in images]
        
        # 设置为4相机
        results['num_views'] = self.num_views
        
        return results

4.2 ImageAug3D调整

# configs/custom/default.yaml 中的pipeline

train_pipeline:
  - type: LoadMultiViewImageFromFiles
    to_float32: true
    num_views: 4  # ← 修改为4
  
  - type: LoadPointsFromFile
    coord_type: LIDAR
    load_dim: 4   # x,y,z,intensity
    use_dim: 4
    reduce_beams: 80  # ← 80线LiDAR
  
  # 如果有多帧点云
  - type: LoadPointsFromMultiSweeps
    sweeps_num: 9
    load_dim: 4
    use_dim: 4
    reduce_beams: 80
    pad_empty_sweeps: true
    remove_close: true
  
  - type: LoadAnnotations3D
    with_bbox_3d: true
    with_label_3d: true
  
  # 数据增强
  - type: ImageAug3D
    final_dim: ${image_size}
    resize_lim: ${augment2d.resize[0]}
    bot_pct_lim: [0.0, 0.0]
    rot_lim: ${augment2d.rotate}
    rand_flip: true
    is_train: true
    num_views: 4  # ← 4个相机
  
  - type: GlobalRotScaleTrans
    resize_lim: ${augment3d.scale}
    rot_lim: ${augment3d.rotate}
    trans_lim: ${augment3d.translate}
    is_train: true
  
  - type: LoadBEVSegmentation
    dataset_root: ${dataset_root}
    xbound: [-50.0, 50.0, 0.5]
    ybound: [-50.0, 50.0, 0.5]
    classes: ${map_classes}
  
  - type: RandomFlip3D
  
  - type: PointsRangeFilter
    point_cloud_range: ${point_cloud_range}
  
  - type: ObjectRangeFilter
    point_cloud_range: ${point_cloud_range}
  
  - type: ObjectNameFilter
    classes: ${object_classes}
  
  - type: ImageNormalize
    mean: [0.485, 0.456, 0.406]
    std: [0.229, 0.224, 0.225]
  
  - type: DefaultFormatBundle3D
    classes: ${object_classes}
  
  - type: Collect3D
    keys:
      - img               # (4, C, H, W) ← 4个相机
      - points
      - gt_bboxes_3d
      - gt_labels_3d
      - gt_masks_bev
    meta_keys:
      - camera_intrinsics
      - camera2ego
      - lidar2ego
      - lidar2camera
      - camera2lidar
      - lidar2image
      - img_aug_matrix
      - lidar_aug_matrix

步骤5: 长焦相机特殊处理

5.1 为什么需要特殊处理?

前视广角相机:
  - FOV: 120度
  - 焦距: 短
  - 擅长: 近距离、大范围感知
  - 分辨率需求: 中等

前视长焦相机:
  - FOV: 30度 ← 窄
  - 焦距: 长
  - 擅长: 远距离、小物体检测
  - 分辨率需求: 高

问题:
  如果用同样的处理方式:
    - 长焦的远距离信息会被浪费
    - 广角的近距离覆盖会不足

5.2 方案A: Dual-Branch处理推荐

# mmdet3d/models/vtransforms/dual_cam_lss.py

class DualCameraLSS(nn.Module):
    """
    双分支处理广角和长焦相机
    """
    
    def __init__(self, ...):
        # 广角相机分支(近距离)
        self.wide_branch = LSSTransform(
            xbound=[-54.0, 54.0, 0.3],  # 大范围
            ybound=[-54.0, 54.0, 0.3],
            dbound=[1.0, 60.0, 0.5],     # 近到远
        )
        
        # 长焦相机分支(远距离)
        self.tele_branch = LSSTransform(
            xbound=[-30.0, 30.0, 0.15],  # 窄范围,更精细
            ybound=[10.0, 100.0, 0.3],   # 前方远距离
            dbound=[30.0, 150.0, 1.0],   # 只关注远处
        )
    
    def forward(self, x, camera_types, ...):
        """
        Args:
            x: (B, N, C, H, W) - N=4个相机
            camera_types: ['wide', 'tele', 'wide', 'wide']
        """
        bev_features = []
        
        for i, cam_type in enumerate(camera_types):
            cam_feat = x[:, i]  # (B, C, H, W)
            
            if cam_type == 'wide':
                bev = self.wide_branch(cam_feat, ...)
            elif cam_type == 'tele':
                bev = self.tele_branch(cam_feat, ...)
            
            bev_features.append(bev)
        
        # 融合4个相机的BEV
        combined_bev = self.combine_multi_cam_bev(bev_features)
        
        return combined_bev
    
    def combine_multi_cam_bev(self, bev_list):
        """
        融合4个相机的BEV特征
        
        策略:
          - 广角相机:贡献近距离区域
          - 长焦相机:贡献远距离区域
          - 使用距离加权融合
        """
        B, C, H, W = bev_list[0].shape
        combined = torch.zeros(B, C, H, W).to(bev_list[0].device)
        
        # 距离权重
        y_coords = torch.arange(H).float().to(combined.device)
        
        for i, bev in enumerate(bev_list):
            if i == 1:  # front_tele
                # 长焦:远距离权重高
                weight = (y_coords / H).view(1, 1, H, 1)
            else:  # 广角
                # 广角:近距离权重高
                weight = (1 - y_coords / H).view(1, 1, H, 1)
            
            combined += bev * weight
        
        return combined

5.3 方案B: 统一处理+注意力机制

class CameraAwareLSS(nn.Module):
    """
    相机感知的LSS
    
    为每个相机学习不同的处理权重
    """
    
    def __init__(self, num_cameras=4, ...):
        super().__init__()
        
        # 统一的LSS
        self.lss = LSSTransform(...)
        
        # 相机特定的adapter
        self.camera_adapters = nn.ModuleList([
            nn.Sequential(
                nn.Conv2d(256, 256, 1),
                nn.BatchNorm2d(256),
                nn.ReLU(),
            ) for _ in range(num_cameras)
        ])
        
        # 相机类型embedding
        self.camera_type_embed = nn.Embedding(2, 256)  # 0:wide, 1:tele
    
    def forward(self, x, camera_types, ...):
        B, N, C, H, W = x.shape  # N=4
        
        bev_features = []
        for i in range(N):
            # 相机特征
            cam_feat = x[:, i]  # (B, C, H, W)
            
            # 添加相机类型信息
            cam_type_id = 1 if camera_types[i] == 'tele' else 0
            type_embed = self.camera_type_embed(
                torch.tensor(cam_type_id).to(cam_feat.device)
            )
            # 融入特征
            cam_feat = cam_feat + type_embed.view(1, -1, 1, 1)
            
            # 相机特定处理
            cam_feat = self.camera_adapters[i](cam_feat)
            
            # LSS转换到BEV
            bev = self.lss(cam_feat, ...)
            bev_features.append(bev)
        
        # 融合
        combined = torch.stack(bev_features, dim=1).sum(dim=1)
        return combined

步骤6: 80线LiDAR优化

6.1 利用更高分辨率

# 更精细的体素化
lidar:
  voxelize:
    voxel_size: [0.05, 0.05, 0.2]  # 从0.075→0.05
    max_num_points: 20             # 从10→20
    max_voxels: [180000, 240000]   # 增加容量
  
  backbone:
    sparse_shape: [2160, 2160, 41]  # 对应0.05体素大小
    # 108m范围 / 0.05m = 2160

6.2 多sweep融合

# 利用80线的高密度可以用更多sweep
LoadPointsFromMultiSweeps:
  sweeps_num: 9  # 可以增加到15-20
  # 80线LiDAR每帧点更多多sweep信息更丰富

步骤7: 训练策略

7.1 从nuScenes预训练迁移学习

# 阶段1: 在nuScenes上预训练已有模型
# 使用现有的bevfusion-det.pth或当前训练的模型

# 阶段2: 在自定义数据上fine-tune
export PATH=/opt/conda/bin:$PATH
cd /workspace/bevfusion

torchpack dist-run -np 8 python tools/train.py \
  configs/custom/bevfusion_4cam_80lidar.yaml \
  --load_from runs/run-326653dc-74184412/epoch_5.pth \
  --data.workers_per_gpu 0

# 关键:
# --load_from: 加载在nuScenes上训练的模型
# 大部分参数可以复用encoder/fuser/decoder
# 只需要fine-tune task head类别可能不同

7.2 调整学习率和训练策略

# 迁移学习配置
optimizer:
  type: AdamW
  lr: 5.0e-5  # 更小的学习率fine-tuning
  weight_decay: 0.01
  paramwise_cfg:
    custom_keys:
      # backbone用更小的学习率
      encoders:
        lr_mult: 0.1
      # head用正常学习率
      heads:
        lr_mult: 1.0

lr_config:
  policy: CosineAnnealing
  warmup: linear
  warmup_iters: 500
  warmup_ratio: 0.1
  min_lr_ratio: 1.0e-4

# 训练epochfine-tuning通常需要较少
max_epochs: 12

步骤8: 处理4相机覆盖范围问题

8.1 覆盖范围分析

nuScenes (6相机):
  360度全覆盖

您的配置 (4相机):
  前方: 2个相机广角+长焦)✅ 覆盖加强
  左前: 1个相机 ✅
  右前: 1个相机 ✅
  后方: 无相机 ❌ 盲区
  
BEV范围建议
  前方: [-54, 54] × [0, 108]    全覆盖
  左右: [-54, 54] × [-54, 0]    部分覆盖
  后方: 依赖LiDAR

8.2 调整BEV范围配置

# 方案A: 前向BEV推荐
vtransform:
  xbound: [-54.0, 54.0, 0.3]   # 左右方向
  ybound: [0.0, 108.0, 0.3]     # 只关注前方
  zbound: [-5.0, 5.0, 20.0]
  dbound: [1.0, 100.0, 0.5]

point_cloud_range: [-54.0, 0.0, -5.0, 54.0, 108.0, 3.0]

# 方案B: 保持360度后方依赖LiDAR
vtransform:
  xbound: [-54.0, 54.0, 0.3]
  ybound: [-54.0, 54.0, 0.3]  # 保持360
  # 但后方区域主要靠LiDAR

8.3 LiDAR权重调整

# 在后方区域增加LiDAR的融合权重
fuser:
  type: AdaptiveConvFuser  # 自适应融合
  in_channels: [80, 256]
  out_channels: 256
  # 后方区域增加LiDAR权重
  # 前方区域平衡Camera和LiDAR

步骤9: 实现脚本

9.1 数据转换

# tools/convert_custom_data.sh

#!/bin/bash
export PATH=/opt/conda/bin:$PATH
cd /workspace/bevfusion

# 转换训练数据
python tools/data_converter/custom_to_mmdet3d.py \
  --dataroot data/custom_dataset \
  --split train \
  --output data/custom_dataset/custom_infos_train.pkl

# 转换验证数据
python tools/data_converter/custom_to_mmdet3d.py \
  --dataroot data/custom_dataset \
  --split val \
  --output data/custom_dataset/custom_infos_val.pkl

echo "数据转换完成!"

9.2 训练脚本

# scripts/train_custom_dataset.sh

#!/bin/bash
export PATH=/opt/conda/bin:$PATH
cd /workspace/bevfusion

echo "========================================"
echo "自定义数据集训练"
echo "传感器: 4相机 + 80线LiDAR"
echo "========================================"

# 从nuScenes预训练模型fine-tune
torchpack dist-run -np 8 python tools/train.py \
  configs/custom/bevfusion_4cam_80lidar.yaml \
  --load_from runs/run-326653dc-74184412/epoch_5.pth \
  --data.workers_per_gpu 0

echo "训练完成!"

步骤10: 常见问题和解决方案

Q1: 4个相机的特征如何处理

A: 修改模型输入:

# mmdet3d/models/fusion_models/bevfusion.py

def extract_camera_features(self, x, ...):
    B, N, C, H, W = x.size()
    # N从6改为4
    assert N == 4, f"Expected 4 cameras, got {N}"
    
    x = x.view(B * N, C, H, W)  # (B*4, C, H, W)
    
    x = self.encoders["camera"]["backbone"](x)
    x = self.encoders["camera"]["neck"](x)
    
    # ... 后续处理

Q2: 长焦相机如何单独处理?

A: 添加相机类型标记:

# 在forward时传入相机类型
camera_types = ['wide', 'tele', 'wide', 'wide']

# VTransform根据类型选择处理策略
def vtransform_with_cam_type(features, camera_types):
    for i, cam_type in enumerate(camera_types):
        if cam_type == 'tele':
            # 长焦:关注远距离
            features[i] = process_tele(features[i])
        else:
            # 广角:关注近距离
            features[i] = process_wide(features[i])

Q3: 后方盲区怎么办?

A: 三种方案:

方案1: 调整BEV范围只预测前方
  point_cloud_range: [-54, 0, -5, 54, 108, 3]
  
方案2: 后方完全依赖LiDAR
  在fuser中后方区域只用LiDAR特征
  
方案3: 添加后向相机(硬件升级)
  增加2个后向相机 → 6相机配置

Q4: 80线LiDAR的点太多内存不够

A: 优化策略:

# 1. 动态体素化(不限制点数)
lidar:
  voxelize:
    max_num_points: -1  # 动态模式
    type: DynamicScatter

# 2. 增加体素大小
voxel_size: [0.075, 0.075, 0.2]  # 如果0.05太密

# 3. 限制点云范围
point_cloud_range: [-50, -50, -5, 50, 50, 3]  # 减小范围

# 4. 下采样
LoadPointsFromFile:
  load_dim: 4
  use_dim: 4
  reduce_beams: 40  # 从80降采样到40

步骤11: 完整实施流程

第一阶段数据准备1-2天

# 1. 组织数据目录
mkdir -p data/custom_dataset/{lidar,camera,calibration,annotations}

# 2. 转换标定格式
python tools/convert_calibration.py

# 3. 生成info文件
python tools/data_converter/custom_to_mmdet3d.py

# 4. 验证数据
python tools/visualize_custom_data.py

第二阶段代码修改2-3天

# 1. 创建CustomDataset
vim mmdet3d/datasets/custom_dataset.py

# 2. 修改pipeline处理4相机
vim mmdet3d/datasets/pipelines/loading.py

# 3. 创建配置文件
vim configs/custom/bevfusion_4cam_80lidar.yaml

# 4. (可选)添加长焦处理
vim mmdet3d/models/vtransforms/dual_cam_lss.py

第三阶段训练3-5天

# 1. 小规模验证100个样本
python tools/train.py configs/custom/test_100samples.yaml

# 2. 完整训练从nuScenes模型fine-tune
torchpack dist-run -np 8 python tools/train.py \
  configs/custom/bevfusion_4cam_80lidar.yaml \
  --load_from pretrained/bevfusion-det.pth

# 3. 调优
# 根据验证集性能调整超参数

📊 预期性能

与nuScenes对比

指标 nuScenes (6相机+32线) 您的配置 (4相机+80线)
LiDAR点云密度 32线 80线 (+150%)
相机覆盖 360度 ~240度 ⚠️
远距离检测 一般 长焦加强
近距离检测
后方检测 依赖LiDAR ⚠️
预期mAP 68% 65-70%
预期mIoU 60% 55-65%

分析:

  • 80线LiDAR会提升性能点云更密集
  • 长焦相机提升远距离检测
  • ⚠️ 4相机可能在后方和侧方略低
  • 🎯 总体性能预期相当甚至更好

📝 配置文件模板

我为您创建了完整的配置模板,可以直接使用:

# 创建自定义配置目录
mkdir -p /workspace/bevfusion/configs/custom

# 配置文件清单
configs/custom/
├── default.yaml                    基础配置
├── bevfusion_4cam_80lidar.yaml    完整模型配置
├── test_100samples.yaml            小规模测试配置
└── README.md                       使用说明

🚀 快速开始(当前训练完成后)

1. 等待当前训练完成

当前进度: Epoch 6/20 (30%)
预计完成: 2天后

2. 准备您的数据

# 按照上述格式组织数据
# 编写标定转换脚本
# 生成info文件

3. 测试数据加载

# 验证数据格式正确
from mmdet3d.datasets import CustomDataset

dataset = CustomDataset(
    ann_file='data/custom_dataset/custom_infos_val.pkl',
    pipeline=[...],
)

# 测试加载一个样本
data = dataset[0]
print(data.keys())
# 应该包含: img (4个相机), points, gt_bboxes_3d, gt_labels_3d

4. 开始fine-tuning

# 使用当前多任务模型作为初始化
torchpack dist-run -np 8 python tools/train.py \
  configs/custom/bevfusion_4cam_80lidar.yaml \
  --load_from runs/run-326653dc-74184412/latest.pth \
  --data.workers_per_gpu 0

💡 关键注意事项

1. 标定精度

❗ 最重要!标定不准会严重影响性能

必须准确标定:
  - 相机内参(畸变参数)
  - 相机外参(相对车身位置)
  - LiDAR到车身的变换
  - 时间同步

验证方法:
  - 投影LiDAR点到图像检查对齐
  - 多帧一致性检查

2. 长焦相机处理

不推荐:
  ❌ 和广角相机完全相同处理
  
推荐:
  ✅ 不同的depth范围
  ✅ 不同的BEV范围
  ✅ 或使用dual-branch

3. 数据增强

需要调整:
  - 4相机的flip策略不能左右flip会导致相机不匹配
  - rotation范围根据您的应用场景
  - scale范围80线LiDAR更精确可以更保守

4. 类别映射

如果您的类别与nuScenes不同
  - 修改object_classes定义
  - 调整num_classes
  - 重新训练分类head
  - 检测head可以从nuScenes初始化但需要调整最后一层

🔧 工具脚本

可视化工具

# tools/visualize_custom_data.py

def visualize_4cam_lidar(data_info):
    """可视化4相机+LiDAR数据"""
    import matplotlib.pyplot as plt
    
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    
    # 4个相机
    for i, cam_name in enumerate(['FRONT_WIDE', 'FRONT_TELE', 'FRONT_LEFT', 'FRONT_RIGHT']):
        ax = axes[i // 2, i % 2]
        
        # 加载图像
        img = load_image(data_info['cams'][cam_name]['data_path'])
        
        # 投影LiDAR点
        lidar_points = load_lidar(data_info['lidar_path'])
        projected = project_lidar_to_cam(lidar_points, data_info, cam_name)
        
        # 绘制
        ax.imshow(img)
        ax.scatter(projected[:, 0], projected[:, 1], c=projected[:, 2], s=1)
        ax.set_title(f'{cam_name}')
    
    # BEV视图
    ax = axes[1, 2]
    plot_bev(lidar_points, data_info['gt_boxes'], ax)
    ax.set_title('BEV View')
    
    plt.tight_layout()
    plt.savefig('visualization.png')
    print("可视化已保存到 visualization.png")

标定验证工具

# tools/verify_calibration.py

def verify_calibration(data_info):
    """验证标定准确性"""
    
    lidar_points = load_lidar(data_info['lidar_path'])
    
    errors = []
    for cam_name in ['FRONT_WIDE', 'FRONT_TELE', 'FRONT_LEFT', 'FRONT_RIGHT']:
        # 投影LiDAR到相机
        projected = project_lidar_to_cam(lidar_points, data_info, cam_name)
        
        # 检查投影点是否在图像内
        h, w = data_info['cams'][cam_name]['height'], data_info['cams'][cam_name]['width']
        valid_mask = (
            (projected[:, 0] >= 0) & (projected[:, 0] < w) &
            (projected[:, 1] >= 0) & (projected[:, 1] < h) &
            (projected[:, 2] > 0)  # 深度为正
        )
        
        valid_ratio = valid_mask.sum() / len(projected)
        print(f"{cam_name}: {valid_ratio*100:.1f}% 点有效")
        
        if valid_ratio < 0.1:
            errors.append(f"{cam_name}标定可能有问题")
    
    if errors:
        print("警告:", errors)
    else:
        print("✅ 标定验证通过!")

📖 完整配置示例

configs/custom/bevfusion_4cam_80lidar.yaml

# 自定义数据集BEVFusion配置
_base_: ../nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml

# 数据集配置
dataset_type: CustomDataset
dataset_root: data/custom_dataset/

# LiDAR配置80线
reduce_beams: 80
load_dim: 4
use_dim: 4
voxel_size: [0.05, 0.05, 0.2]
point_cloud_range: [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0]

# 相机配置4个
num_cameras: 4
camera_names: ['FRONT_WIDE', 'FRONT_TELE', 'FRONT_LEFT', 'FRONT_RIGHT']
image_size: [512, 1408]

# 模型配置
model:
  encoders:
    camera:
      # 相机数量从6改为4
      num_views: 4
      
      vtransform:
        # 适配4相机的BEV范围
        xbound: [-54.0, 54.0, 0.3]
        ybound: [-54.0, 54.0, 0.3]
        
        # 长焦相机特殊处理
        camera_aware: true
        tele_camera_idx: 1  # 第2个相机是长焦
    
    lidar:
      voxelize:
        max_num_points: 20  # 80线点更多
        point_cloud_range: ${point_cloud_range}
        voxel_size: ${voxel_size}
        max_voxels: [180000, 240000]
      
      backbone:
        sparse_shape: [2160, 2160, 41]  # 适配0.05体素
        output_channels: 256  # 可以增加
  
  heads:
    object:
      num_classes: 8  # 根据您的类别数
    map:
      classes: ${map_classes}

# 训练配置fine-tuning
optimizer:
  lr: 5.0e-5  # 更小的学习率
  paramwise_cfg:
    custom_keys:
      encoders:
        lr_mult: 0.1  # backbone用10%的学习率

max_epochs: 12

# 数据配置
data:
  train:
    type: ${dataset_type}
    dataset_root: ${dataset_root}
    ann_file: ${dataset_root + "custom_infos_train.pkl"}
    # ...

🎯 实施时间表

第一周:数据准备

  • Day 1-2: 组织数据,转换格式
  • Day 3: 标定验证
  • Day 4: 生成info文件和标注
  • Day 5: 数据可视化验证

第二周:代码开发

  • Day 6-7: 实现CustomDataset
  • Day 8: 修改pipeline
  • Day 9: 配置文件编写
  • Day 10: 小规模测试

第三周:训练调优

  • Day 11-13: 完整训练fine-tuning
  • Day 14-15: 性能调优
  • Day 16-17: 评估和可视化

总计: 约3周完成迁移


💻 立即可用的代码模板

我可以为您创建:

  1. 数据转换脚本 (tools/data_converter/custom_to_mmdet3d.py)
  2. CustomDataset类 (mmdet3d/datasets/custom_dataset.py)
  3. 配置文件 (configs/custom/bevfusion_4cam_80lidar.yaml)
  4. 可视化工具 (tools/visualize_custom_data.py)
  5. 训练脚本 (scripts/train_custom.sh)

🌟 优化建议

利用80线LiDAR的优势

# 1. 更精细的体素化
voxel_size: [0.05, 0.05, 0.2]  # nuScenes用0.075

# 2. 更强的LiDAR backbone
lidar:
  backbone:
    output_channels: 256  # nuScenes用128
    encoder_channels:
      - [32, 32, 64]      # 加倍通道数
      - [64, 64, 128]
      - [128, 128, 256]

# 3. 调整融合权重LiDAR权重增加
fuser:
  type: ConvFuser
  in_channels: [80, 256]
  # 或使用AddFuser可以设置不同权重

利用长焦相机的优势

# 专门的远距离检测分支
heads:
  object:
    # 增加远距离小物体的anchor
    anchor_generator:
      ranges: [[0, -40.0, ..., 40.0, 100.0, ...]]  # 扩展到100米
  
  # 或添加专门的长距离检测head
  object_long_range:
    type: TransFusionHead
    point_cloud_range: [0, 50, -5, 50, 150, 3]  # 只关注前方远距离

迁移检查清单

迁移前请确认:

  • 数据已按照mmdet3d格式组织
  • 标定文件已准备(内参+外参)
  • 时间戳同步相机和LiDAR
  • 3D框标注格式正确LiDAR坐标系
  • BEV分割标注准备如果需要
  • 数据集划分完成train/val/test
  • CustomDataset类已实现
  • 配置文件已适配4相机
  • Pipeline已修改
  • 可视化验证通过
  • 小规模测试通过

🎓 总结

您的传感器配置优势:

  • 80线LiDAR点云密度是nuScenes的2.5倍
  • 长焦相机:远距离检测能力更强
  • 前向覆盖更好2个前视相机

需要注意:

  • ⚠️ 后方盲区需要调整BEV范围或增强LiDAR
  • ⚠️ 长焦相机:需要特殊处理逻辑
  • ⚠️ 数据标定:必须精确

预期效果:

  • 前方检测可能优于nuScenes长焦+80线
  • 近距离与nuScenes相当
  • 后方略低于nuScenes无后向相机
  • 整体65-70% mAP55-65% mIoU

需要我帮您:

  1. 创建完整的代码模板?
  2. 编写数据转换脚本?
  3. 设计长焦相机处理方案?

请告诉我下一步需要什么!😊


生成时间: 2025-10-17