# BEVFusion迁移到自定义传感器配置指南 ## 🎯 目标配置 ### nuScenes配置 (原始) ``` LiDAR: 1个32线旋转式LiDAR Camera: 6个环视相机 - Front (前) - Front Left (左前) - Front Right (右前) - Back (后) - Back Left (左后) - Back Right (右后) ``` ### 您的配置 (目标) ``` LiDAR: 1个80线360度激光雷达 ✅ 更高分辨率 Camera: 4路相机 - Front Wide (前视广角) - Front Tele (前视长焦) ← 新增 - Front Left (左前) - Front Right (右前) ``` **关键差异**: - ✅ LiDAR升级:32线→80线(点云密度更高) - ⚠️ 相机减少:6个→4个(减少后向覆盖) - ⚠️ 新增长焦:需要特殊处理 --- ## 📋 迁移步骤总览 ``` 步骤1: 数据格式转换 → 转为mmdet3d格式 步骤2: 标定参数处理 → 相机内外参、LiDAR标定 步骤3: 数据集类定义 → 自定义Dataset 步骤4: 配置文件修改 → 适配4相机+80线LiDAR 步骤5: Pipeline调整 → 数据增强和预处理 步骤6: 训练和调优 → 开始训练 ``` --- ## 步骤1: 数据格式转换 ### 1.1 原始数据组织 建议的目录结构: ``` data/custom_dataset/ ├── lidar/ LiDAR点云数据 │ ├── scene_001/ │ │ ├── 000000.bin (N, 4) 格式:x,y,z,intensity │ │ ├── 000001.bin │ │ └── ... │ └── scene_002/ │ └── ... ├── camera/ 相机图片 │ ├── scene_001/ │ │ ├── front_wide/ │ │ │ ├── 000000.jpg │ │ │ └── ... │ │ ├── front_tele/ │ │ │ └── ... │ │ ├── front_left/ │ │ │ └── ... │ │ └── front_right/ │ │ └── ... ├── calibration/ 标定数据 │ ├── scene_001_calib.json │ └── ... ├── annotations/ 标注数据 │ ├── scene_001_anno.json 3D框标注 │ └── scene_001_seg.png BEV分割标注 └── splits/ 数据集划分 ├── train.txt ├── val.txt └── test.txt ``` ### 1.2 标定文件格式 ```json { "scene_id": "scene_001", "timestamp": 1634567890, "lidar_to_ego": { "translation": [0.0, 0.0, 1.8], "rotation": [1.0, 0.0, 0.0, 0.0] }, "cameras": { "front_wide": { "intrinsic": [ [fx, 0, cx], [0, fy, cy], [0, 0, 1] ], "extrinsic": { "translation": [1.5, 0.0, 1.5], "rotation": [1.0, 0.0, 0.0, 0.0] }, "distortion": [k1, k2, p1, p2, k3], "image_size": [1920, 1080] }, "front_tele": { "intrinsic": [ [fx_tele, 0, cx_tele], [0, fy_tele, cy_tele], [0, 0, 1] ], "extrinsic": { "translation": [1.5, 0.0, 1.5], "rotation": [1.0, 0.0, 0.0, 0.0] }, "distortion": [...], "image_size": [1920, 1080], "fov": 30.0 }, "front_left": {...}, "front_right": {...} }, "annotations": { "boxes_3d": [ { "center": [x, y, z], "size": [w, l, h], "rotation": yaw, "velocity": [vx, vy], "class": "car", "track_id": 1 }, ... ], "segmentation": { "file": "annotations/scene_001_seg.png", "classes": { "0": "background", "1": "drivable_area", "2": "lane", ... } } } } ``` ### 1.3 数据转换脚本 ```python # tools/data_converter/custom_to_mmdet3d.py import numpy as np import pickle import json from pathlib import Path def convert_custom_to_mmdet3d(data_root, output_dir): """ 将自定义数据集转换为mmdet3d格式 """ data_infos = [] # 读取数据列表 scenes = sorted(Path(data_root).glob('*/')) for scene_dir in scenes: # 加载标定 calib = load_calibration(scene_dir / 'calibration.json') # 遍历帧 lidar_files = sorted((scene_dir / 'lidar').glob('*.bin')) for frame_idx, lidar_file in enumerate(lidar_files): timestamp = int(lidar_file.stem) # 构建info字典 info = { 'lidar_path': str(lidar_file), 'token': f"{scene_dir.name}_{timestamp}", 'timestamp': timestamp, # 相机信息(4个相机) 'cams': { 'FRONT_WIDE': { 'data_path': str(scene_dir / f'camera/front_wide/{timestamp:06d}.jpg'), 'type': 'camera', 'sample_data_token': f'cam_front_wide_{timestamp}', 'sensor2ego_translation': calib['cameras']['front_wide']['translation'], 'sensor2ego_rotation': calib['cameras']['front_wide']['rotation'], 'ego2global_translation': [0, 0, 0], 'ego2global_rotation': [1, 0, 0, 0], 'timestamp': timestamp, 'camera_intrinsic': calib['cameras']['front_wide']['intrinsic'], 'width': 1920, 'height': 1080, }, 'FRONT_TELE': { 'data_path': str(scene_dir / f'camera/front_tele/{timestamp:06d}.jpg'), 'type': 'camera', 'sample_data_token': f'cam_front_tele_{timestamp}', 'sensor2ego_translation': calib['cameras']['front_tele']['translation'], 'sensor2ego_rotation': calib['cameras']['front_tele']['rotation'], 'ego2global_translation': [0, 0, 0], 'ego2global_rotation': [1, 0, 0, 0], 'timestamp': timestamp, 'camera_intrinsic': calib['cameras']['front_tele']['intrinsic'], 'width': 1920, 'height': 1080, 'is_tele': True, # 标记为长焦相机 }, 'FRONT_LEFT': {...}, 'FRONT_RIGHT': {...}, }, # LiDAR信息 'lidar2ego_translation': calib['lidar_to_ego']['translation'], 'lidar2ego_rotation': calib['lidar_to_ego']['rotation'], 'ego2global_translation': [0, 0, 0], 'ego2global_rotation': [1, 0, 0, 0], # 标注信息 'gt_boxes': load_annotations(scene_dir / f'annotations/{timestamp:06d}.json'), 'gt_names': [...], 'gt_velocity': [...], 'num_lidar_pts': [...], 'num_radar_pts': [0] * len(gt_boxes), # 无radar 'valid_flag': [True] * len(gt_boxes), } data_infos.append(info) # 保存为pkl文件 output_file = Path(output_dir) / 'custom_infos_train.pkl' with open(output_file, 'wb') as f: pickle.dump(data_infos, f) print(f"转换完成!生成{len(data_infos)}个样本") print(f"保存到: {output_file}") return data_infos def load_calibration(calib_file): """加载标定文件""" with open(calib_file, 'r') as f: calib = json.load(f) return calib def load_annotations(anno_file): """加载3D框标注""" with open(anno_file, 'r') as f: anno = json.load(f) boxes = [] for obj in anno['objects']: box = np.array([ obj['center'][0], obj['center'][1], obj['center'][2], obj['size'][0], # w obj['size'][1], # l obj['size'][2], # h obj['rotation'], ]) boxes.append(box) return np.array(boxes) # 使用方法 if __name__ == '__main__': convert_custom_to_mmdet3d( data_root='data/custom_dataset', output_dir='data/custom_dataset' ) ``` --- ## 步骤2: 自定义Dataset类 ```python # mmdet3d/datasets/custom_dataset.py from .nuscenes_dataset import NuScenesDataset from mmdet.datasets import DATASETS @DATASETS.register_module() class CustomDataset(NuScenesDataset): """自定义数据集(4相机+80线LiDAR)""" # 定义类别(根据您的标注) CLASSES = ( 'car', 'truck', 'bus', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone', 'barrier' ) # 相机名称(4个相机) CAM_SENSORS = [ 'FRONT_WIDE', # 前视广角 'FRONT_TELE', # 前视长焦 'FRONT_LEFT', # 左前 'FRONT_RIGHT', # 右前 ] def __init__( self, ann_file, pipeline=None, dataset_root=None, object_classes=None, map_classes=None, modality=None, box_type_3d='LiDAR', filter_empty_gt=True, test_mode=False, **kwargs ): # 设置相机数量 self.num_cams = 4 # 修改为4(原来是6) super().__init__( ann_file=ann_file, pipeline=pipeline, dataset_root=dataset_root, object_classes=object_classes, map_classes=map_classes, modality=modality, box_type_3d=box_type_3d, filter_empty_gt=filter_empty_gt, test_mode=test_mode, **kwargs ) def get_data_info(self, index): """获取数据信息""" info = self.data_infos[index] # 准备相机数据(4个相机) image_paths = [] lidar2img_rts = [] lidar2cam_rts = [] cam_intrinsics = [] for cam_name in self.CAM_SENSORS: cam_info = info['cams'][cam_name] # 图片路径 image_paths.append(cam_info['data_path']) # 计算变换矩阵 lidar2cam_r, lidar2cam_t = self.get_lidar2cam(info, cam_name) lidar2cam_rt = np.eye(4) lidar2cam_rt[:3, :3] = lidar2cam_r lidar2cam_rt[:3, 3] = lidar2cam_t # 相机内参 intrinsic = np.array(cam_info['camera_intrinsic']) viewpad = np.eye(4) viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic # lidar2img变换 lidar2img_rt = viewpad @ lidar2cam_rt lidar2img_rts.append(lidar2img_rt) lidar2cam_rts.append(lidar2cam_rt) cam_intrinsics.append(viewpad) # 构建输入字典 input_dict = { 'sample_idx': index, 'pts_filename': info['lidar_path'], 'sweeps': [], # 如果有多帧点云sweep 'timestamp': info['timestamp'], 'img_filename': image_paths, 'lidar2img': lidar2img_rts, 'cam_intrinsic': cam_intrinsics, 'lidar2cam': lidar2cam_rts, } # 添加标注(如果不是测试模式) if not self.test_mode: annos = self.get_ann_info(index) input_dict['ann_info'] = annos return input_dict def handle_tele_camera(self, data): """ 处理长焦相机的特殊逻辑 长焦相机的特点: - FOV小(如30度 vs 广角的120度) - 分辨率高 - 适合远距离检测 处理方式: 1. 单独的resize策略 2. 不同的crop范围 3. 可能需要单独的backbone分支 """ # 检测是否是长焦相机 for i, cam_name in enumerate(self.CAM_SENSORS): if 'TELE' in cam_name: # 长焦相机特殊处理 # 例如:使用更大的输入分辨率 data['img'][i] = resize_keep_ratio(data['img'][i], (512, 1408)) return data ``` ### 1.4 注册Dataset ```python # mmdet3d/datasets/__init__.py from .custom_dataset import CustomDataset __all__ = [ ..., 'CustomDataset', ] ``` --- ## 步骤3: 配置文件修改 ### 3.1 基础配置 ```yaml # configs/custom/default.yaml dataset_type: CustomDataset dataset_root: data/custom_dataset/ # LiDAR配置(80线,更高分辨率) reduce_beams: 80 # 从32改为80 load_dim: 4 # x,y,z,intensity use_dim: 4 # 使用全部维度 # 点云范围(根据您的车辆调整) point_cloud_range: [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0] # 体素大小(可以更小,利用80线的高分辨率) voxel_size: [0.05, 0.05, 0.2] # 从0.075改为0.05(更精细) # 相机配置(4个相机) image_size: [512, 1408] # 可以根据需要调整 # 相机名称映射 cam_names: - FRONT_WIDE - FRONT_TELE - FRONT_LEFT - FRONT_RIGHT # 类别定义 object_classes: - car - truck - bus - motorcycle - bicycle - pedestrian - traffic_cone - barrier map_classes: - drivable_area - lane - ped_crossing - boundary # 数据增强参数 augment2d: resize: [[0.4, 0.6], [0.5, 0.5]] rotate: [-5.4, 5.4] gridmask: prob: 0.0 fixed_prob: true augment3d: scale: [0.95, 1.05] # 更保守(80线LiDAR更精确) rotate: [-0.78539816, 0.78539816] translate: 0.5 # 模态配置 input_modality: use_lidar: true use_camera: true use_radar: false use_map: false use_external: false ``` ### 3.2 模型配置(适配4相机) ```yaml # configs/custom/bevfusion_4cam_80lidar.yaml _base_: ./default.yaml model: type: BEVFusion encoders: camera: backbone: type: SwinTransformer embed_dims: 96 depths: [2, 2, 6, 2] num_heads: [3, 6, 12, 24] window_size: 7 # ... SwinTransformer配置 neck: type: GeneralizedLSSFPN in_channels: [192, 384, 768] out_channels: 256 start_level: 0 num_outs: 3 vtransform: type: DepthLSSTransform in_channels: 256 out_channels: 80 image_size: ${image_size} feature_size: ${[image_size[0] // 8, image_size[1] // 8]} xbound: [-54.0, 54.0, 0.3] ybound: [-54.0, 54.0, 0.3] zbound: [-10.0, 10.0, 20.0] dbound: [1.0, 60.0, 0.5] downsample: 2 # 特殊处理长焦相机 camera_aware: true # 启用相机感知(不同相机不同处理) lidar: voxelize: max_num_points: 20 # 从10改为20(80线点更多) point_cloud_range: ${point_cloud_range} voxel_size: ${voxel_size} max_voxels: [180000, 240000] # 增加(更精细的体素) backbone: type: SparseEncoder in_channels: 4 # x,y,z,intensity sparse_shape: [2160, 2160, 41] # 适配0.05体素大小 output_channels: 256 # 增加输出通道(更强的特征) encoder_channels: - [16, 16, 32] - [32, 32, 64] - [64, 64, 128] - [128, 128, 256] # 增加一层 encoder_paddings: - [0, 0, 1] - [0, 0, 1] - [0, 0, [1, 1, 0]] - [0, 0] block_type: basicblock fuser: type: ConvFuser in_channels: [80, 256] # camera和lidar的输出通道 out_channels: 256 decoder: backbone: type: SECOND in_channels: 256 out_channels: [128, 256] layer_nums: [5, 5] layer_strides: [1, 2] neck: type: SECONDFPN in_channels: [128, 256] out_channels: [256, 256] upsample_strides: [1, 2] heads: # 3D检测 object: type: TransFusionHead in_channels: 512 num_proposals: 200 num_classes: 8 # 您的类别数 # ... 其他配置 # BEV分割 map: type: BEVSegmentationHead in_channels: 512 classes: ${map_classes} loss_scale: object: 1.0 map: 1.0 # 数据配置 data: samples_per_gpu: 1 # 4相机内存占用较少,可以增大 workers_per_gpu: 0 # 根据实际调整 train: type: CBGSDataset dataset: type: ${dataset_type} dataset_root: ${dataset_root} ann_file: ${dataset_root + "custom_infos_train.pkl"} pipeline: ${train_pipeline} object_classes: ${object_classes} map_classes: ${map_classes} modality: ${input_modality} test_mode: false box_type_3d: LiDAR val: type: ${dataset_type} dataset_root: ${dataset_root} ann_file: ${dataset_root + "custom_infos_val.pkl"} pipeline: ${test_pipeline} object_classes: ${object_classes} map_classes: ${map_classes} modality: ${input_modality} test_mode: true box_type_3d: LiDAR # 训练配置 max_epochs: 24 optimizer: type: AdamW lr: 2.0e-4 weight_decay: 0.01 ``` --- ## 步骤4: 数据Pipeline调整 ### 4.1 修改LoadMultiViewImageFromFiles ```python # mmdet3d/datasets/pipelines/loading.py @PIPELINES.register_module() class LoadMultiViewImageFromFiles: """加载多视角图像(支持4相机+长焦)""" def __init__(self, to_float32=False, color_type='color', num_views=4): self.to_float32 = to_float32 self.color_type = color_type self.num_views = num_views # 设置为4 def __call__(self, results): """ 读取4个相机的图像 特殊处理: - front_tele相机可能需要不同的预处理 """ filename = results['img_filename'] images = [] for i, name in enumerate(filename): img = mmcv.imread(name, self.color_type) # 检查是否是长焦相机 if 'tele' in name.lower(): # 长焦相机特殊处理 # 例如:不同的归一化参数 pass if self.to_float32: img = img.astype(np.float32) images.append(img) results['img'] = images results['img_shape'] = [img.shape for img in images] results['ori_shape'] = [img.shape for img in images] # 设置为4相机 results['num_views'] = self.num_views return results ``` ### 4.2 ImageAug3D调整 ```python # configs/custom/default.yaml 中的pipeline train_pipeline: - type: LoadMultiViewImageFromFiles to_float32: true num_views: 4 # ← 修改为4 - type: LoadPointsFromFile coord_type: LIDAR load_dim: 4 # x,y,z,intensity use_dim: 4 reduce_beams: 80 # ← 80线LiDAR # 如果有多帧点云 - type: LoadPointsFromMultiSweeps sweeps_num: 9 load_dim: 4 use_dim: 4 reduce_beams: 80 pad_empty_sweeps: true remove_close: true - type: LoadAnnotations3D with_bbox_3d: true with_label_3d: true # 数据增强 - type: ImageAug3D final_dim: ${image_size} resize_lim: ${augment2d.resize[0]} bot_pct_lim: [0.0, 0.0] rot_lim: ${augment2d.rotate} rand_flip: true is_train: true num_views: 4 # ← 4个相机 - type: GlobalRotScaleTrans resize_lim: ${augment3d.scale} rot_lim: ${augment3d.rotate} trans_lim: ${augment3d.translate} is_train: true - type: LoadBEVSegmentation dataset_root: ${dataset_root} xbound: [-50.0, 50.0, 0.5] ybound: [-50.0, 50.0, 0.5] classes: ${map_classes} - type: RandomFlip3D - type: PointsRangeFilter point_cloud_range: ${point_cloud_range} - type: ObjectRangeFilter point_cloud_range: ${point_cloud_range} - type: ObjectNameFilter classes: ${object_classes} - type: ImageNormalize mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] - type: DefaultFormatBundle3D classes: ${object_classes} - type: Collect3D keys: - img # (4, C, H, W) ← 4个相机 - points - gt_bboxes_3d - gt_labels_3d - gt_masks_bev meta_keys: - camera_intrinsics - camera2ego - lidar2ego - lidar2camera - camera2lidar - lidar2image - img_aug_matrix - lidar_aug_matrix ``` --- ## 步骤5: 长焦相机特殊处理 ### 5.1 为什么需要特殊处理? ``` 前视广角相机: - FOV: 120度 - 焦距: 短 - 擅长: 近距离、大范围感知 - 分辨率需求: 中等 前视长焦相机: - FOV: 30度 ← 窄 - 焦距: 长 - 擅长: 远距离、小物体检测 - 分辨率需求: 高 问题: 如果用同样的处理方式: - 长焦的远距离信息会被浪费 - 广角的近距离覆盖会不足 ``` ### 5.2 方案A: Dual-Branch处理(推荐) ```python # mmdet3d/models/vtransforms/dual_cam_lss.py class DualCameraLSS(nn.Module): """ 双分支处理广角和长焦相机 """ def __init__(self, ...): # 广角相机分支(近距离) self.wide_branch = LSSTransform( xbound=[-54.0, 54.0, 0.3], # 大范围 ybound=[-54.0, 54.0, 0.3], dbound=[1.0, 60.0, 0.5], # 近到远 ) # 长焦相机分支(远距离) self.tele_branch = LSSTransform( xbound=[-30.0, 30.0, 0.15], # 窄范围,更精细 ybound=[10.0, 100.0, 0.3], # 前方远距离 dbound=[30.0, 150.0, 1.0], # 只关注远处 ) def forward(self, x, camera_types, ...): """ Args: x: (B, N, C, H, W) - N=4个相机 camera_types: ['wide', 'tele', 'wide', 'wide'] """ bev_features = [] for i, cam_type in enumerate(camera_types): cam_feat = x[:, i] # (B, C, H, W) if cam_type == 'wide': bev = self.wide_branch(cam_feat, ...) elif cam_type == 'tele': bev = self.tele_branch(cam_feat, ...) bev_features.append(bev) # 融合4个相机的BEV combined_bev = self.combine_multi_cam_bev(bev_features) return combined_bev def combine_multi_cam_bev(self, bev_list): """ 融合4个相机的BEV特征 策略: - 广角相机:贡献近距离区域 - 长焦相机:贡献远距离区域 - 使用距离加权融合 """ B, C, H, W = bev_list[0].shape combined = torch.zeros(B, C, H, W).to(bev_list[0].device) # 距离权重 y_coords = torch.arange(H).float().to(combined.device) for i, bev in enumerate(bev_list): if i == 1: # front_tele # 长焦:远距离权重高 weight = (y_coords / H).view(1, 1, H, 1) else: # 广角 # 广角:近距离权重高 weight = (1 - y_coords / H).view(1, 1, H, 1) combined += bev * weight return combined ``` ### 5.3 方案B: 统一处理+注意力机制 ```python class CameraAwareLSS(nn.Module): """ 相机感知的LSS 为每个相机学习不同的处理权重 """ def __init__(self, num_cameras=4, ...): super().__init__() # 统一的LSS self.lss = LSSTransform(...) # 相机特定的adapter self.camera_adapters = nn.ModuleList([ nn.Sequential( nn.Conv2d(256, 256, 1), nn.BatchNorm2d(256), nn.ReLU(), ) for _ in range(num_cameras) ]) # 相机类型embedding self.camera_type_embed = nn.Embedding(2, 256) # 0:wide, 1:tele def forward(self, x, camera_types, ...): B, N, C, H, W = x.shape # N=4 bev_features = [] for i in range(N): # 相机特征 cam_feat = x[:, i] # (B, C, H, W) # 添加相机类型信息 cam_type_id = 1 if camera_types[i] == 'tele' else 0 type_embed = self.camera_type_embed( torch.tensor(cam_type_id).to(cam_feat.device) ) # 融入特征 cam_feat = cam_feat + type_embed.view(1, -1, 1, 1) # 相机特定处理 cam_feat = self.camera_adapters[i](cam_feat) # LSS转换到BEV bev = self.lss(cam_feat, ...) bev_features.append(bev) # 融合 combined = torch.stack(bev_features, dim=1).sum(dim=1) return combined ``` --- ## 步骤6: 80线LiDAR优化 ### 6.1 利用更高分辨率 ```yaml # 更精细的体素化 lidar: voxelize: voxel_size: [0.05, 0.05, 0.2] # 从0.075→0.05 max_num_points: 20 # 从10→20 max_voxels: [180000, 240000] # 增加容量 backbone: sparse_shape: [2160, 2160, 41] # 对应0.05体素大小 # 108m范围 / 0.05m = 2160 ``` ### 6.2 多sweep融合 ```yaml # 利用80线的高密度,可以用更多sweep LoadPointsFromMultiSweeps: sweeps_num: 9 # 可以增加到15-20 # 80线LiDAR每帧点更多,多sweep信息更丰富 ``` --- ## 步骤7: 训练策略 ### 7.1 从nuScenes预训练迁移学习 ```bash # 阶段1: 在nuScenes上预训练(已有模型) # 使用现有的bevfusion-det.pth或当前训练的模型 # 阶段2: 在自定义数据上fine-tune export PATH=/opt/conda/bin:$PATH cd /workspace/bevfusion torchpack dist-run -np 8 python tools/train.py \ configs/custom/bevfusion_4cam_80lidar.yaml \ --load_from runs/run-326653dc-74184412/epoch_5.pth \ --data.workers_per_gpu 0 # 关键: # --load_from: 加载在nuScenes上训练的模型 # 大部分参数可以复用(encoder/fuser/decoder) # 只需要fine-tune task head(类别可能不同) ``` ### 7.2 调整学习率和训练策略 ```yaml # 迁移学习配置 optimizer: type: AdamW lr: 5.0e-5 # 更小的学习率(fine-tuning) weight_decay: 0.01 paramwise_cfg: custom_keys: # backbone用更小的学习率 encoders: lr_mult: 0.1 # head用正常学习率 heads: lr_mult: 1.0 lr_config: policy: CosineAnnealing warmup: linear warmup_iters: 500 warmup_ratio: 0.1 min_lr_ratio: 1.0e-4 # 训练epoch(fine-tuning通常需要较少) max_epochs: 12 ``` --- ## 步骤8: 处理4相机覆盖范围问题 ### 8.1 覆盖范围分析 ``` nuScenes (6相机): 360度全覆盖 您的配置 (4相机): 前方: 2个相机(广角+长焦)✅ 覆盖加强 左前: 1个相机 ✅ 右前: 1个相机 ✅ 后方: 无相机 ❌ 盲区 BEV范围建议: 前方: [-54, 54] × [0, 108] 全覆盖 左右: [-54, 54] × [-54, 0] 部分覆盖 后方: 依赖LiDAR ``` ### 8.2 调整BEV范围配置 ```yaml # 方案A: 前向BEV(推荐) vtransform: xbound: [-54.0, 54.0, 0.3] # 左右方向 ybound: [0.0, 108.0, 0.3] # 只关注前方 zbound: [-5.0, 5.0, 20.0] dbound: [1.0, 100.0, 0.5] point_cloud_range: [-54.0, 0.0, -5.0, 54.0, 108.0, 3.0] # 方案B: 保持360度,后方依赖LiDAR vtransform: xbound: [-54.0, 54.0, 0.3] ybound: [-54.0, 54.0, 0.3] # 保持360 # 但后方区域主要靠LiDAR ``` ### 8.3 LiDAR权重调整 ```yaml # 在后方区域增加LiDAR的融合权重 fuser: type: AdaptiveConvFuser # 自适应融合 in_channels: [80, 256] out_channels: 256 # 后方区域:增加LiDAR权重 # 前方区域:平衡Camera和LiDAR ``` --- ## 步骤9: 实现脚本 ### 9.1 数据转换 ```bash # tools/convert_custom_data.sh #!/bin/bash export PATH=/opt/conda/bin:$PATH cd /workspace/bevfusion # 转换训练数据 python tools/data_converter/custom_to_mmdet3d.py \ --dataroot data/custom_dataset \ --split train \ --output data/custom_dataset/custom_infos_train.pkl # 转换验证数据 python tools/data_converter/custom_to_mmdet3d.py \ --dataroot data/custom_dataset \ --split val \ --output data/custom_dataset/custom_infos_val.pkl echo "数据转换完成!" ``` ### 9.2 训练脚本 ```bash # scripts/train_custom_dataset.sh #!/bin/bash export PATH=/opt/conda/bin:$PATH cd /workspace/bevfusion echo "========================================" echo "自定义数据集训练" echo "传感器: 4相机 + 80线LiDAR" echo "========================================" # 从nuScenes预训练模型fine-tune torchpack dist-run -np 8 python tools/train.py \ configs/custom/bevfusion_4cam_80lidar.yaml \ --load_from runs/run-326653dc-74184412/epoch_5.pth \ --data.workers_per_gpu 0 echo "训练完成!" ``` --- ## 步骤10: 常见问题和解决方案 ### Q1: 4个相机的特征如何处理? **A**: 修改模型输入: ```python # mmdet3d/models/fusion_models/bevfusion.py def extract_camera_features(self, x, ...): B, N, C, H, W = x.size() # N从6改为4 assert N == 4, f"Expected 4 cameras, got {N}" x = x.view(B * N, C, H, W) # (B*4, C, H, W) x = self.encoders["camera"]["backbone"](x) x = self.encoders["camera"]["neck"](x) # ... 后续处理 ``` ### Q2: 长焦相机如何单独处理? **A**: 添加相机类型标记: ```python # 在forward时传入相机类型 camera_types = ['wide', 'tele', 'wide', 'wide'] # VTransform根据类型选择处理策略 def vtransform_with_cam_type(features, camera_types): for i, cam_type in enumerate(camera_types): if cam_type == 'tele': # 长焦:关注远距离 features[i] = process_tele(features[i]) else: # 广角:关注近距离 features[i] = process_wide(features[i]) ``` ### Q3: 后方盲区怎么办? **A**: 三种方案: ``` 方案1: 调整BEV范围,只预测前方 point_cloud_range: [-54, 0, -5, 54, 108, 3] 方案2: 后方完全依赖LiDAR 在fuser中:后方区域只用LiDAR特征 方案3: 添加后向相机(硬件升级) 增加2个后向相机 → 6相机配置 ``` ### Q4: 80线LiDAR的点太多,内存不够? **A**: 优化策略: ```yaml # 1. 动态体素化(不限制点数) lidar: voxelize: max_num_points: -1 # 动态模式 type: DynamicScatter # 2. 增加体素大小 voxel_size: [0.075, 0.075, 0.2] # 如果0.05太密 # 3. 限制点云范围 point_cloud_range: [-50, -50, -5, 50, 50, 3] # 减小范围 # 4. 下采样 LoadPointsFromFile: load_dim: 4 use_dim: 4 reduce_beams: 40 # 从80降采样到40 ``` --- ## 步骤11: 完整实施流程 ### 第一阶段:数据准备(1-2天) ```bash # 1. 组织数据目录 mkdir -p data/custom_dataset/{lidar,camera,calibration,annotations} # 2. 转换标定格式 python tools/convert_calibration.py # 3. 生成info文件 python tools/data_converter/custom_to_mmdet3d.py # 4. 验证数据 python tools/visualize_custom_data.py ``` ### 第二阶段:代码修改(2-3天) ```bash # 1. 创建CustomDataset vim mmdet3d/datasets/custom_dataset.py # 2. 修改pipeline(处理4相机) vim mmdet3d/datasets/pipelines/loading.py # 3. 创建配置文件 vim configs/custom/bevfusion_4cam_80lidar.yaml # 4. (可选)添加长焦处理 vim mmdet3d/models/vtransforms/dual_cam_lss.py ``` ### 第三阶段:训练(3-5天) ```bash # 1. 小规模验证(100个样本) python tools/train.py configs/custom/test_100samples.yaml # 2. 完整训练(从nuScenes模型fine-tune) torchpack dist-run -np 8 python tools/train.py \ configs/custom/bevfusion_4cam_80lidar.yaml \ --load_from pretrained/bevfusion-det.pth # 3. 调优 # 根据验证集性能调整超参数 ``` --- ## 📊 预期性能 ### 与nuScenes对比 | 指标 | nuScenes (6相机+32线) | 您的配置 (4相机+80线) | |------|----------------------|---------------------| | LiDAR点云密度 | 32线 | 80线 (+150%) ✅ | | 相机覆盖 | 360度 | ~240度 ⚠️ | | 远距离检测 | 一般 | 长焦加强 ✅ | | 近距离检测 | 好 | 好 ✅ | | 后方检测 | 好 | 依赖LiDAR ⚠️ | | **预期mAP** | 68% | **65-70%** | | **预期mIoU** | 60% | **55-65%** | **分析**: - ✅ 80线LiDAR会提升性能(点云更密集) - ✅ 长焦相机提升远距离检测 - ⚠️ 4相机可能在后方和侧方略低 - 🎯 总体性能预期相当甚至更好 --- ## 📝 配置文件模板 我为您创建了完整的配置模板,可以直接使用: ```bash # 创建自定义配置目录 mkdir -p /workspace/bevfusion/configs/custom # 配置文件清单 configs/custom/ ├── default.yaml 基础配置 ├── bevfusion_4cam_80lidar.yaml 完整模型配置 ├── test_100samples.yaml 小规模测试配置 └── README.md 使用说明 ``` --- ## 🚀 快速开始(当前训练完成后) ### 1. 等待当前训练完成 ``` 当前进度: Epoch 6/20 (30%) 预计完成: 2天后 ``` ### 2. 准备您的数据 ```bash # 按照上述格式组织数据 # 编写标定转换脚本 # 生成info文件 ``` ### 3. 测试数据加载 ```python # 验证数据格式正确 from mmdet3d.datasets import CustomDataset dataset = CustomDataset( ann_file='data/custom_dataset/custom_infos_val.pkl', pipeline=[...], ) # 测试加载一个样本 data = dataset[0] print(data.keys()) # 应该包含: img (4个相机), points, gt_bboxes_3d, gt_labels_3d ``` ### 4. 开始fine-tuning ```bash # 使用当前多任务模型作为初始化 torchpack dist-run -np 8 python tools/train.py \ configs/custom/bevfusion_4cam_80lidar.yaml \ --load_from runs/run-326653dc-74184412/latest.pth \ --data.workers_per_gpu 0 ``` --- ## 💡 关键注意事项 ### 1. 标定精度 ``` ❗ 最重要!标定不准会严重影响性能 必须准确标定: - 相机内参(畸变参数) - 相机外参(相对车身位置) - LiDAR到车身的变换 - 时间同步 验证方法: - 投影LiDAR点到图像,检查对齐 - 多帧一致性检查 ``` ### 2. 长焦相机处理 ``` 不推荐: ❌ 和广角相机完全相同处理 推荐: ✅ 不同的depth范围 ✅ 不同的BEV范围 ✅ 或使用dual-branch ``` ### 3. 数据增强 ``` 需要调整: - 4相机的flip策略(不能左右flip,会导致相机不匹配) - rotation范围(根据您的应用场景) - scale范围(80线LiDAR更精确,可以更保守) ``` ### 4. 类别映射 ``` 如果您的类别与nuScenes不同: - 修改object_classes定义 - 调整num_classes - 重新训练分类head - 检测head可以从nuScenes初始化,但需要调整最后一层 ``` --- ## 🔧 工具脚本 ### 可视化工具 ```python # tools/visualize_custom_data.py def visualize_4cam_lidar(data_info): """可视化4相机+LiDAR数据""" import matplotlib.pyplot as plt fig, axes = plt.subplots(2, 3, figsize=(18, 12)) # 4个相机 for i, cam_name in enumerate(['FRONT_WIDE', 'FRONT_TELE', 'FRONT_LEFT', 'FRONT_RIGHT']): ax = axes[i // 2, i % 2] # 加载图像 img = load_image(data_info['cams'][cam_name]['data_path']) # 投影LiDAR点 lidar_points = load_lidar(data_info['lidar_path']) projected = project_lidar_to_cam(lidar_points, data_info, cam_name) # 绘制 ax.imshow(img) ax.scatter(projected[:, 0], projected[:, 1], c=projected[:, 2], s=1) ax.set_title(f'{cam_name}') # BEV视图 ax = axes[1, 2] plot_bev(lidar_points, data_info['gt_boxes'], ax) ax.set_title('BEV View') plt.tight_layout() plt.savefig('visualization.png') print("可视化已保存到 visualization.png") ``` ### 标定验证工具 ```python # tools/verify_calibration.py def verify_calibration(data_info): """验证标定准确性""" lidar_points = load_lidar(data_info['lidar_path']) errors = [] for cam_name in ['FRONT_WIDE', 'FRONT_TELE', 'FRONT_LEFT', 'FRONT_RIGHT']: # 投影LiDAR到相机 projected = project_lidar_to_cam(lidar_points, data_info, cam_name) # 检查投影点是否在图像内 h, w = data_info['cams'][cam_name]['height'], data_info['cams'][cam_name]['width'] valid_mask = ( (projected[:, 0] >= 0) & (projected[:, 0] < w) & (projected[:, 1] >= 0) & (projected[:, 1] < h) & (projected[:, 2] > 0) # 深度为正 ) valid_ratio = valid_mask.sum() / len(projected) print(f"{cam_name}: {valid_ratio*100:.1f}% 点有效") if valid_ratio < 0.1: errors.append(f"{cam_name}标定可能有问题") if errors: print("警告:", errors) else: print("✅ 标定验证通过!") ``` --- ## 📖 完整配置示例 ### configs/custom/bevfusion_4cam_80lidar.yaml ```yaml # 自定义数据集BEVFusion配置 _base_: ../nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml # 数据集配置 dataset_type: CustomDataset dataset_root: data/custom_dataset/ # LiDAR配置(80线) reduce_beams: 80 load_dim: 4 use_dim: 4 voxel_size: [0.05, 0.05, 0.2] point_cloud_range: [-54.0, -54.0, -5.0, 54.0, 54.0, 3.0] # 相机配置(4个) num_cameras: 4 camera_names: ['FRONT_WIDE', 'FRONT_TELE', 'FRONT_LEFT', 'FRONT_RIGHT'] image_size: [512, 1408] # 模型配置 model: encoders: camera: # 相机数量从6改为4 num_views: 4 vtransform: # 适配4相机的BEV范围 xbound: [-54.0, 54.0, 0.3] ybound: [-54.0, 54.0, 0.3] # 长焦相机特殊处理 camera_aware: true tele_camera_idx: 1 # 第2个相机是长焦 lidar: voxelize: max_num_points: 20 # 80线点更多 point_cloud_range: ${point_cloud_range} voxel_size: ${voxel_size} max_voxels: [180000, 240000] backbone: sparse_shape: [2160, 2160, 41] # 适配0.05体素 output_channels: 256 # 可以增加 heads: object: num_classes: 8 # 根据您的类别数 map: classes: ${map_classes} # 训练配置(fine-tuning) optimizer: lr: 5.0e-5 # 更小的学习率 paramwise_cfg: custom_keys: encoders: lr_mult: 0.1 # backbone用10%的学习率 max_epochs: 12 # 数据配置 data: train: type: ${dataset_type} dataset_root: ${dataset_root} ann_file: ${dataset_root + "custom_infos_train.pkl"} # ... ``` --- ## 🎯 实施时间表 ### 第一周:数据准备 - Day 1-2: 组织数据,转换格式 - Day 3: 标定验证 - Day 4: 生成info文件和标注 - Day 5: 数据可视化验证 ### 第二周:代码开发 - Day 6-7: 实现CustomDataset - Day 8: 修改pipeline - Day 9: 配置文件编写 - Day 10: 小规模测试 ### 第三周:训练调优 - Day 11-13: 完整训练(fine-tuning) - Day 14-15: 性能调优 - Day 16-17: 评估和可视化 **总计**: 约3周完成迁移 --- ## 💻 立即可用的代码模板 我可以为您创建: 1. **数据转换脚本** (`tools/data_converter/custom_to_mmdet3d.py`) 2. **CustomDataset类** (`mmdet3d/datasets/custom_dataset.py`) 3. **配置文件** (`configs/custom/bevfusion_4cam_80lidar.yaml`) 4. **可视化工具** (`tools/visualize_custom_data.py`) 5. **训练脚本** (`scripts/train_custom.sh`) --- ## 🌟 优化建议 ### 利用80线LiDAR的优势 ```yaml # 1. 更精细的体素化 voxel_size: [0.05, 0.05, 0.2] # nuScenes用0.075 # 2. 更强的LiDAR backbone lidar: backbone: output_channels: 256 # nuScenes用128 encoder_channels: - [32, 32, 64] # 加倍通道数 - [64, 64, 128] - [128, 128, 256] # 3. 调整融合权重(LiDAR权重增加) fuser: type: ConvFuser in_channels: [80, 256] # 或使用AddFuser,可以设置不同权重 ``` ### 利用长焦相机的优势 ```yaml # 专门的远距离检测分支 heads: object: # 增加远距离小物体的anchor anchor_generator: ranges: [[0, -40.0, ..., 40.0, 100.0, ...]] # 扩展到100米 # 或添加专门的长距离检测head object_long_range: type: TransFusionHead point_cloud_range: [0, 50, -5, 50, 150, 3] # 只关注前方远距离 ``` --- ## ✅ 迁移检查清单 迁移前请确认: - [ ] 数据已按照mmdet3d格式组织 - [ ] 标定文件已准备(内参+外参) - [ ] 时间戳同步(相机和LiDAR) - [ ] 3D框标注格式正确(LiDAR坐标系) - [ ] BEV分割标注准备(如果需要) - [ ] 数据集划分完成(train/val/test) - [ ] CustomDataset类已实现 - [ ] 配置文件已适配4相机 - [ ] Pipeline已修改 - [ ] 可视化验证通过 - [ ] 小规模测试通过 --- ## 🎓 总结 **您的传感器配置优势**: - ✅ 80线LiDAR:点云密度是nuScenes的2.5倍 - ✅ 长焦相机:远距离检测能力更强 - ✅ 前向覆盖更好:2个前视相机 **需要注意**: - ⚠️ 后方盲区:需要调整BEV范围或增强LiDAR - ⚠️ 长焦相机:需要特殊处理逻辑 - ⚠️ 数据标定:必须精确 **预期效果**: - 前方检测:可能优于nuScenes(长焦+80线) - 近距离:与nuScenes相当 - 后方:略低于nuScenes(无后向相机) - **整体:65-70% mAP,55-65% mIoU** --- 需要我帮您: 1. 创建完整的代码模板? 2. 编写数据转换脚本? 3. 设计长焦相机处理方案? 请告诉我下一步需要什么!😊 --- 生成时间: 2025-10-17