bev-project/project/docs/nuScenes数据格式与实车标注指南.md

26 KiB
Raw Blame History

nuScenes数据格式与实车部署标注指南

📋 目录

  1. nuScenes GT数据格式详解
  2. BEVFusion输入输出格式
  3. 实车相机配置变动方案
  4. 数据采集与标注流程
  5. 工具与代码示例

1. nuScenes GT数据格式详解

1.1 数据组织结构

nuScenes数据集按照以下层级组织:

nuscenes/
├── v1.0-trainval/          # 元数据JSON格式
│   ├── sample.json         # 关键帧样本
│   ├── sample_data.json    # 传感器数据
│   ├── scene.json          # 场景信息
│   ├── calibrated_sensor.json  # 传感器标定
│   ├── ego_pose.json       # 车辆位姿
│   ├── instance.json       # 目标实例
│   ├── category.json       # 目标类别
│   └── attribute.json      # 目标属性
├── samples/                # 关键帧数据
│   ├── CAM_FRONT/
│   ├── CAM_FRONT_LEFT/
│   ├── CAM_FRONT_RIGHT/
│   ├── CAM_BACK/
│   ├── CAM_BACK_LEFT/
│   ├── CAM_BACK_RIGHT/
│   ├── LIDAR_TOP/
│   └── RADAR_FRONT/
├── sweeps/                 # 中间帧数据
└── maps/                   # HD地图数据
    ├── expansion/
    └── basemap/

1.2 单个Sample的数据结构

BEVFusion使用的处理后的info格式 (pkl文件):

{
    'token': 'ca9a282c9e77460f8360f564131a8af5',  # 样本唯一ID
    'timestamp': 1532402927647951,                  # 时间戳(微秒)
    'scene_token': 'scene-0001',                    # 场景ID
    
    # ========== LiDAR数据 ==========
    'lidar_path': 'samples/LIDAR_TOP/n015-2018-07-24-11-22-45+0800__LIDAR_TOP__1532402927647951.pcd.bin',
    'sweeps': [                                     # 多帧LiDAR数据
        {
            'data_path': 'sweeps/LIDAR_TOP/...',
            'sensor2lidar_rotation': [[...], [...], [...]],  # 3x3旋转矩阵
            'sensor2lidar_translation': [x, y, z],
            'timestamp': 1532402927547951
        },
        # ... 共9帧sweep
    ],
    
    # ========== LiDAR位姿 ==========
    'lidar2ego_translation': [x, y, z],             # LiDAR到车体坐标系
    'lidar2ego_rotation': [w, x, y, z],             # 四元数
    'ego2global_translation': [x, y, z],            # 车体到全局坐标系
    'ego2global_rotation': [w, x, y, z],
    
    # ========== 相机数据 (6个相机) ==========
    'cams': {
        'CAM_FRONT': {
            'data_path': 'samples/CAM_FRONT/...',
            'cam_intrinsic': [                      # 3x3内参矩阵
                [fx, 0,  cx],
                [0,  fy, cy],
                [0,  0,  1]
            ],
            'sensor2ego_translation': [x, y, z],    # 相机到车体
            'sensor2ego_rotation': [w, x, y, z],
            'sensor2lidar_rotation': [[...], [...], [...]],  # 相机到LiDAR
            'sensor2lidar_translation': [x, y, z],
            'timestamp': 1532402927612460
        },
        'CAM_FRONT_RIGHT': { ... },
        'CAM_FRONT_LEFT': { ... },
        'CAM_BACK': { ... },
        'CAM_BACK_LEFT': { ... },
        'CAM_BACK_RIGHT': { ... }
    },
    
    # ========== 3D目标检测标注 ==========
    'gt_boxes': [                                   # N x 9数组
        # [x, y, z, w, l, h, yaw, vx, vy]
        # x,y,z: 中心点(全局坐标)
        # w,l,h: 宽度,长度,高度(米)
        # yaw: 偏航角(弧度)
        # vx,vy: 速度(m/s)
    ],
    'gt_names': ['car', 'pedestrian', ...],         # 类别名称
    'gt_velocity': [[vx, vy], ...],                 # 速度(m/s)
    'num_lidar_pts': [34, 12, ...],                 # 每个box内的点数
    'num_radar_pts': [5, 2, ...],                   # 雷达点数
    'valid_flag': [True, True, ...],                # 是否有效
    
    # ========== 目标类别(10类) ==========
    # car, truck, construction_vehicle, bus, trailer,
    # barrier, motorcycle, bicycle, pedestrian, traffic_cone
}

1.3 BEV语义分割标注

BEV分割标注从nuScenes地图数据动态生成:

# 在LoadBEVSegmentation中实时生成
gt_masks_bev: np.ndarray              # (num_classes, H, W)
    # H=200, W=200 (对应100m x 100m, 分辨率0.5m)
    # num_classes=6:
    #   [0] drivable_area   - 可行驶区域
    #   [1] ped_crossing    - 人行横道
    #   [2] walkway         - 人行道
    #   [3] stop_line       - 停止线
    #   [4] carpark_area    - 停车区
    #   [5] divider         - 分隔线
    # 值: 0/1 (二值mask)

生成过程:

  1. 从nuScenes Map API获取当前位置的地图数据
  2. 根据LiDAR位姿确定BEV范围 (默认50m x 50m)
  3. 渲染6个语义类别到200x200网格
  4. 进行坐标变换和旋转对齐

5. 工具与代码示例

5.1 数据转换工具集

5.1.1 自定义数据转nuScenes格式

#!/usr/bin/env python3
# tools/data_converter/custom_to_nuscenes.py

import os
import pickle
import numpy as np
from tqdm import tqdm
from pyquaternion import Quaternion
from pathlib import Path
import json

class CustomDataConverter:
    """将实车采集数据转换为nuScenes格式"""
    
    def __init__(
        self,
        custom_data_root: str,
        output_root: str,
        camera_names: list = None,
    ):
        self.custom_root = Path(custom_data_root)
        self.output_root = Path(output_root)
        
        # 默认相机配置
        self.camera_names = camera_names or [
            'CAM_FRONT',
            'CAM_FRONT_LEFT', 
            'CAM_FRONT_RIGHT',
            'CAM_BACK'
        ]
        
        # 加载标定数据
        self.calib = self.load_calibration()
        
    def load_calibration(self):
        """加载相机和LiDAR标定参数"""
        calib_file = self.custom_root / 'calibration' / 'sensors.json'
        with open(calib_file, 'r') as f:
            calib = json.load(f)
        return calib
    
    def convert_sample(self, sample_id: str):
        """转换单个样本"""
        
        # 1. 加载LiDAR数据
        lidar_path = self.custom_root / 'lidar' / f'{sample_id}.pcd'
        
        # 2. 加载相机图像
        cam_data = {}
        for cam_name in self.camera_names:
            img_path = self.custom_root / 'camera' / cam_name / f'{sample_id}.jpg'
            
            # 获取该相机的标定参数
            cam_calib = self.calib['cameras'][cam_name]
            
            # 计算变换矩阵
            cam2ego = self._build_transform_matrix(
                cam_calib['translation'],
                cam_calib['rotation']
            )
            
            ego2lidar_inv = self._build_transform_matrix(
                self.calib['lidar']['translation'],
                self.calib['lidar']['rotation']
            )
            ego2lidar = np.linalg.inv(ego2lidar_inv)
            
            cam2lidar = ego2lidar @ cam2ego
            
            cam_data[cam_name] = {
                'data_path': str(img_path.relative_to(self.output_root)),
                'cam_intrinsic': np.array(cam_calib['intrinsic']),
                'sensor2ego_translation': cam_calib['translation'],
                'sensor2ego_rotation': cam_calib['rotation'],
                'sensor2lidar_rotation': cam2lidar[:3, :3],
                'sensor2lidar_translation': cam2lidar[:3, 3],
                'timestamp': self._get_timestamp(sample_id)
            }
        
        # 3. 加载标注 (如果存在)
        anno_file = self.custom_root / 'annotations' / f'{sample_id}.json'
        if anno_file.exists():
            with open(anno_file, 'r') as f:
                annos = json.load(f)
            
            gt_boxes, gt_names, gt_velocity = self._parse_annotations(annos)
        else:
            gt_boxes = np.zeros((0, 7))
            gt_names = []
            gt_velocity = np.zeros((0, 2))
        
        # 4. 构建info字典
        info = {
            'token': sample_id,
            'timestamp': self._get_timestamp(sample_id),
            'lidar_path': str(lidar_path.relative_to(self.output_root)),
            'sweeps': [],  # 如果有多帧sweep在这里添加
            'cams': cam_data,
            
            # LiDAR位姿
            'lidar2ego_translation': self.calib['lidar']['translation'],
            'lidar2ego_rotation': self.calib['lidar']['rotation'],
            'ego2global_translation': [0, 0, 0],  # 如果有GPS数据
            'ego2global_rotation': [1, 0, 0, 0],
            
            # 标注
            'gt_boxes': gt_boxes,
            'gt_names': gt_names,
            'gt_velocity': gt_velocity,
            'num_lidar_pts': np.array([0] * len(gt_boxes)),  # 需要统计
            'valid_flag': np.array([True] * len(gt_boxes)),
        }
        
        return info
    
    def _build_transform_matrix(self, translation, rotation):
        """构建4x4变换矩阵"""
        T = np.eye(4)
        T[:3, :3] = Quaternion(rotation).rotation_matrix
        T[:3, 3] = translation
        return T
    
    def _parse_annotations(self, annos):
        """解析标注数据"""
        gt_boxes = []
        gt_names = []
        gt_velocity = []
        
        for anno in annos['annotations']:
            # nuScenes格式: [x, y, z, w, l, h, yaw]
            box = [
                anno['translation'][0],
                anno['translation'][1],
                anno['translation'][2],
                anno['size'][0],  # width
                anno['size'][1],  # length
                anno['size'][2],  # height
                self._quaternion_to_yaw(anno['rotation'])
            ]
            gt_boxes.append(box)
            gt_names.append(anno['category_name'])
            gt_velocity.append(anno.get('velocity', [0, 0]))
        
        return (
            np.array(gt_boxes),
            np.array(gt_names),
            np.array(gt_velocity)
        )
    
    def _quaternion_to_yaw(self, q):
        """四元数转偏航角"""
        return Quaternion(q).yaw_pitch_roll[0]
    
    def _get_timestamp(self, sample_id):
        """获取时间戳 (微秒)"""
        # 从文件名或元数据中提取
        # 示例: sample_id = "2024_01_15_10_30_25_123456"
        # 返回Unix时间戳(微秒)
        return int(sample_id.split('_')[-1])
    
    def convert_all(self, split='train'):
        """转换所有数据"""
        # 获取样本列表
        sample_list_file = self.custom_root / f'{split}_samples.txt'
        with open(sample_list_file, 'r') as f:
            sample_ids = [line.strip() for line in f]
        
        infos = []
        for sample_id in tqdm(sample_ids, desc=f"Converting {split} set"):
            info = self.convert_sample(sample_id)
            infos.append(info)
        
        # 保存为pkl
        output_file = self.output_root / f'custom_infos_{split}.pkl'
        data = {
            'infos': infos,
            'metadata': {
                'version': 'custom-v1.0',
                'num_cameras': len(self.camera_names),
                'camera_names': self.camera_names
            }
        }
        
        with open(output_file, 'wb') as f:
            pickle.dump(data, f)
        
        print(f"✅ 已转换 {len(infos)} 个样本到 {output_file}")
        return output_file


# ========== 使用示例 ==========
if __name__ == '__main__':
    converter = CustomDataConverter(
        custom_data_root='/data/custom_vehicle_data',
        output_root='/data/custom_nuscenes_format',
        camera_names=['CAM_FRONT', 'CAM_LEFT', 'CAM_RIGHT', 'CAM_BACK']
    )
    
    # 转换训练集和验证集
    converter.convert_all(split='train')
    converter.convert_all(split='val')

5.1.2 标定验证工具

#!/usr/bin/env python3
# tools/verify_calibration.py

import numpy as np
import cv2
import open3d as o3d
from pathlib import Path

def verify_camera_lidar_calibration(
    lidar_file: str,
    image_file: str,
    calib_file: str,
    output_file: str = None
):
    """
    验证相机-LiDAR外参标定精度
    将LiDAR点投影到图像上目视检查对齐情况
    """
    
    # 1. 加载点云
    pcd = o3d.io.read_point_cloud(lidar_file)
    points = np.asarray(pcd.points)  # (N, 3)
    
    # 2. 加载图像
    img = cv2.imread(image_file)
    h, w = img.shape[:2]
    
    # 3. 加载标定参数
    import json
    with open(calib_file, 'r') as f:
        calib = json.load(f)
    
    intrinsic = np.array(calib['intrinsic'])  # (3, 3)
    lidar2cam = np.array(calib['lidar2cam'])  # (4, 4)
    
    # 4. 投影到相机坐标系
    points_homo = np.hstack([points, np.ones((len(points), 1))])  # (N, 4)
    points_cam = points_homo @ lidar2cam.T  # (N, 4)
    points_cam = points_cam[:, :3]  # (N, 3)
    
    # 5. 过滤相机后方的点
    mask = points_cam[:, 2] > 0
    points_cam = points_cam[mask]
    
    # 6. 投影到图像平面
    points_img = points_cam @ intrinsic.T  # (N, 3)
    points_img = points_img[:, :2] / points_img[:, 2:]  # (N, 2)
    
    # 7. 过滤图像外的点
    mask = (
        (points_img[:, 0] >= 0) & (points_img[:, 0] < w) &
        (points_img[:, 1] >= 0) & (points_img[:, 1] < h)
    )
    points_img = points_img[mask]
    depths = points_cam[mask, 2]
    
    # 8. 可视化
    img_with_points = img.copy()
    
    # 根据深度着色
    depths_norm = (depths - depths.min()) / (depths.max() - depths.min())
    colors = cv2.applyColorMap(
        (depths_norm * 255).astype(np.uint8),
        cv2.COLORMAP_JET
    )
    
    for i, (u, v) in enumerate(points_img.astype(int)):
        color = tuple(map(int, colors[i][0]))
        cv2.circle(img_with_points, (u, v), 2, color, -1)
    
    # 9. 保存结果
    if output_file:
        cv2.imwrite(output_file, img_with_points)
        print(f"✅ 验证结果保存至: {output_file}")
    
    # 10. 计算重投影统计
    print(f"\n标定验证统计:")
    print(f"  投影点数: {len(points_img)}")
    print(f"  深度范围: {depths.min():.2f}m - {depths.max():.2f}m")
    print(f"  图像尺寸: {w}x{h}")
    
    return img_with_points


# ========== 批量验证 ==========
def batch_verify_calibration(data_root: str, num_samples: int = 10):
    """批量验证多个样本的标定"""
    
    data_path = Path(data_root)
    calib_file = data_path / 'calibration' / 'CAM_FRONT.json'
    
    # 随机选择样本
    import random
    sample_ids = list((data_path / 'lidar').glob('*.pcd'))
    sample_ids = random.sample(sample_ids, min(num_samples, len(sample_ids)))
    
    output_dir = data_path / 'calibration_verification'
    output_dir.mkdir(exist_ok=True)
    
    for lidar_file in sample_ids:
        sample_id = lidar_file.stem
        image_file = data_path / 'camera' / 'CAM_FRONT' / f'{sample_id}.jpg'
        output_file = output_dir / f'{sample_id}_verification.jpg'
        
        verify_camera_lidar_calibration(
            str(lidar_file),
            str(image_file),
            str(calib_file),
            str(output_file)
        )
    
    print(f"\n✅ 已验证 {num_samples} 个样本")
    print(f"   结果保存在: {output_dir}")


if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('--data_root', required=True)
    parser.add_argument('--num_samples', type=int, default=10)
    args = parser.parse_args()
    
    batch_verify_calibration(args.data_root, args.num_samples)

5.1.3 BEV分割标注工具

#!/usr/bin/env python3
# tools/annotate_bev_segmentation.py

import numpy as np
import cv2
from pathlib import Path
import json

class BEVSegmentationAnnotator:
    """BEV语义分割标注工具"""
    
    def __init__(
        self,
        xbound=(-50, 50, 0.5),
        ybound=(-50, 50, 0.5),
        classes=None
    ):
        self.xbound = xbound
        self.ybound = ybound
        self.classes = classes or [
            'drivable_area',
            'ped_crossing',
            'walkway',
            'stop_line',
            'carpark_area',
            'divider'
        ]
        
        # 计算BEV网格大小
        self.bev_h = int((ybound[1] - ybound[0]) / ybound[2])  # 200
        self.bev_w = int((xbound[1] - xbound[0]) / xbound[2])  # 200
        
    def render_bev_from_lidar(self, lidar_file: str, output_file: str):
        """从LiDAR点云渲染BEV俯视图"""
        import open3d as o3d
        
        # 加载点云
        pcd = o3d.io.read_point_cloud(lidar_file)
        points = np.asarray(pcd.points)
        
        # 创建BEV图像
        bev_img = np.zeros((self.bev_h, self.bev_w, 3), dtype=np.uint8)
        
        # 将点投影到BEV网格
        for point in points:
            x, y, z = point
            
            # 过滤范围外的点
            if not (self.xbound[0] <= x < self.xbound[1] and
                    self.ybound[0] <= y < self.ybound[1]):
                continue
            
            # 转换为网格坐标
            i = int((y - self.ybound[0]) / self.ybound[2])
            j = int((x - self.xbound[0]) / self.xbound[2])
            
            # 根据高度着色
            height_norm = np.clip((z + 2) / 4, 0, 1)  # -2m到2m
            color = int(height_norm * 255)
            bev_img[i, j] = [color, color, color]
        
        # 翻转Y轴 (图像坐标系)
        bev_img = np.flip(bev_img, axis=0).copy()
        
        # 保存
        cv2.imwrite(output_file, bev_img)
        print(f"✅ BEV图像保存至: {output_file}")
        
        return bev_img
    
    def polygon_to_mask(
        self,
        polygon_json: str,
        class_name: str,
        output_file: str = None
    ):
        """将labelme标注的多边形转换为mask"""
        
        # 加载labelme的JSON
        with open(polygon_json, 'r') as f:
            data = json.load(f)
        
        # 创建mask
        mask = np.zeros((self.bev_h, self.bev_w), dtype=np.uint8)
        
        # 绘制多边形
        for shape in data['shapes']:
            if shape['label'] == class_name:
                points = np.array(shape['points'], dtype=np.int32)
                cv2.fillPoly(mask, [points], 1)
        
        if output_file:
            np.save(output_file, mask)
            print(f"✅ Mask保存至: {output_file}")
        
        return mask
    
    def merge_all_classes(self, polygon_dir: str, output_file: str):
        """合并所有类别的mask"""
        
        polygon_dir = Path(polygon_dir)
        masks = np.zeros((len(self.classes), self.bev_h, self.bev_w), dtype=np.uint8)
        
        for i, class_name in enumerate(self.classes):
            json_file = polygon_dir / f'{class_name}.json'
            if json_file.exists():
                mask = self.polygon_to_mask(str(json_file), class_name)
                masks[i] = mask
            else:
                print(f"⚠️  未找到类别 {class_name} 的标注")
        
        # 保存
        np.save(output_file, masks)
        print(f"✅ 完整BEV mask保存至: {output_file}")
        print(f"   形状: {masks.shape}")
        
        return masks


# ========== 使用流程 ==========
if __name__ == '__main__':
    annotator = BEVSegmentationAnnotator()
    
    # 步骤1: 渲染BEV图像
    bev_img = annotator.render_bev_from_lidar(
        'data/lidar/sample_001.pcd',
        'data/bev_images/sample_001.png'
    )
    
    # 步骤2: 使用labelme标注多边形
    # labelme data/bev_images/sample_001.png
    # 为每个类别绘制多边形保存为JSON
    
    # 步骤3: 转换为mask
    masks = annotator.merge_all_classes(
        polygon_dir='data/bev_annotations/sample_001',
        output_file='data/bev_masks/sample_001.npy'
    )
    
    print("✅ BEV分割标注完成!")

6. 实车部署完整流程

6.1 数据采集

# 1. 相机数据采集
roslaunch vehicle_perception camera_record.launch \
    cameras:="front,left,right,back" \
    fps:=10 \
    output:=/data/raw/camera

# 2. LiDAR数据采集  
roslaunch vehicle_perception lidar_record.launch \
    topic:=/velodyne_points \
    output:=/data/raw/lidar

# 3. 同步检查
python tools/check_sync.py \
    --camera_dir /data/raw/camera \
    --lidar_dir /data/raw/lidar \
    --max_time_diff 0.05  # 50ms

6.2 标定

# 相机内参标定
python tools/calibrate_camera.py \
    --images /data/calibration/checkerboard \
    --pattern 9x6 \
    --square_size 0.05  # 5cm

# LiDAR-相机外参标定
python tools/calibrate_lidar_camera.py \
    --lidar /data/calibration/lidar.pcd \
    --image /data/calibration/cam_front.jpg \
    --target_type checkerboard \
    --output calib/lidar2cam.json

# 验证标定
python tools/verify_calibration.py \
    --data_root /data/raw \
    --num_samples 10

6.3 数据转换

# 转换为nuScenes格式
python tools/data_converter/custom_to_nuscenes.py \
    --custom_root /data/raw \
    --output /data/nuscenes_format \
    --camera_names CAM_FRONT CAM_LEFT CAM_RIGHT CAM_BACK

# 验证转换结果
python tools/check_data_integrity.py \
    --data_root /data/nuscenes_format \
    --check_images \
    --check_lidar \
    --check_annotations

6.4 标注

# 启动CVAT标注服务
cd cvat
docker-compose up -d

# 导入数据
python tools/import_to_cvat.py \
    --data_root /data/nuscenes_format \
    --task_name "vehicle_annotation_batch_001"

# 标注完成后导出
python tools/export_from_cvat.py \
    --task_id 123 \
    --output /data/annotations/batch_001.json

# 转换为训练格式
python tools/cvat_to_nuscenes_anno.py \
    --cvat_json /data/annotations/batch_001.json \
    --output /data/nuscenes_format/annotations

6.5 训练配置调整

# configs/custom_vehicle/custom_4cam.yaml

# 继承基础配置
_base_:
  - ../nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/multitask_enhanced_phase1_HIGHRES.yaml

# 修改相机数量
model:
  encoders:
    camera:
      backbone:
        # 使用预训练权重
        init_cfg:
          checkpoint: runs/enhanced_from_epoch19/epoch_10.pth
      vtransform:
        # 4相机配置
        num_cameras: 4

# 数据配置
data:
  train:
    type: NuScenesDataset
    dataset_root: /data/nuscenes_format
    ann_file: /data/nuscenes_format/custom_infos_train.pkl
    pipeline: ${train_pipeline}
  
  val:
    type: NuScenesDataset
    dataset_root: /data/nuscenes_format
    ann_file: /data/nuscenes_format/custom_infos_val.pkl
    pipeline: ${val_pipeline}

# 训练参数 (迁移学习)
train:
  max_epochs: 20  # 微调20个epoch
  lr:
    warmup_iters: 500
    max_lr: 0.0001  # 降低学习率

6.6 微调训练

# 启动训练
torchpack dist-run -np 6 python tools/train.py \
    configs/custom_vehicle/custom_4cam.yaml \
    --model.encoders.camera.backbone.init_cfg.checkpoint=runs/enhanced_from_epoch19/epoch_10.pth \
    --run-dir runs/custom_vehicle_finetune

# 监控训练
tensorboard --logdir runs/custom_vehicle_finetune

7. 常见问题与解决方案

Q1: 相机数量不同如何处理?

# 方案1: 修改模型输入层
# mmdet3d/models/encoders/camera_encoder.py

class CameraEncoder:
    def __init__(self, num_cameras=6, ...):  # 改为4
        self.num_cameras = num_cameras
        # 模型自动适配

# 方案2: 使用虚拟相机填充
# 如果模型期望6相机实际只有4相机
# 可以复制前相机或使用黑图填充

Q2: 分辨率不同如何处理?

# 配置中调整
model:
  encoders:
    camera:
      vtransform:
        image_size: [384, 1024]  # 从[256, 704]改为实际分辨率
        feature_size: [48, 128]  # 相应调整

Q3: FOV差异如何处理?

# 需要重新标定,获取准确内参
# 如果FOV差异>30度建议重新训练
# 如果<30度可以通过fine-tuning适配

Q4: 没有BEV分割标注怎么办?

# 方案1: 仅训练3D检测关闭分割
model:
  heads:
    object: ...  # 保留
    map: null    # 关闭

# 方案2: 使用预训练的分割头冻结
model:
  heads:
    map:
      freeze: true  # 冻结分割头权重

8. 性能优化建议

8.1 数据增强

# 针对实车场景调整增强策略
augment2d:
  resize: [[0.4, 0.6], [0.5, 0.5]]  # 更大的resize范围
  rotate: [-10, 10]                  # 更大的旋转
  brightness: 0.2                    # 亮度增强
  contrast: 0.2                      # 对比度增强

8.2 困难样本挖掘

# 工具: 识别模型表现差的场景
python tools/find_hard_samples.py \
    --checkpoint runs/custom_vehicle_finetune/epoch_10.pth \
    --data_root /data/nuscenes_format \
    --output hard_samples.txt

# 针对性增加这些场景的标注

8.3 评估指标

# 在实车数据上评估
python tools/test.py \
    configs/custom_vehicle/custom_4cam.yaml \
    runs/custom_vehicle_finetune/epoch_10.pth \
    --eval bbox segm

# 分析性能
python tools/analyze_results.py \
    --results results.pkl \
    --by_scene \
    --by_distance \
    --by_weather

9. 总结

关键步骤检查清单

□ 数据采集
  □ 相机数据完整 (4-6路)
  □ LiDAR数据完整
  □ 时间戳同步 (<50ms)
  □ 场景覆盖充分

□ 传感器标定
  □ 相机内参精度 (重投影误差<1px)
  □ LiDAR-相机外参精度 (旋转<0.5°, 平移<2cm)
  □ 标定验证通过

□ 数据转换
  □ 转换为nuScenes格式
  □ 数据完整性检查
  □ 可视化验证

□ 标注
  □ 3D box标注 (IoU>0.7)
  □ BEV分割标注 (可选)
  □ 质量审核通过
  □ 数据集规模 (>5000帧)

□ 训练
  □ 配置文件调整
  □ 迁移学习/微调
  □ 性能验证
  □ 实车测试

最小可行部署

最小数据集: 5000帧
最小标注: 仅3D检测
训练时间: 2-3天 (6xV100)
性能预期: NDS 0.55-0.65

完整部署

推荐数据集: 10000-15000帧
完整标注: 3D检测 + BEV分割
训练时间: 5-7天
性能预期: NDS 0.65-0.75

文档版本: v1.0
最后更新: 2025-10-25
适用于: BEVFusion + 自定义传感器配置