feat: 新增PaddlePaddle检测支持，重构项目架构

1. 新增concurrently依赖用于并行启动服务 2. 新增服务器启动脚本统一管理环境变量和虚拟环境 3. 新增PaddlePaddle推理引擎和配套工具代码 4. 新增抽烟检测Paddle模型支持，完善模型管理 5. 重构开发启动脚本，优化开发体验 6. 更新.gitignore排除不必要的外部目录和缓存 7. 完善文档说明，新增PaddlePaddle部署指南
2026-05-21 10:39:26 +08:00
parent 7aa71c5f83
commit e97bd503ec
31 changed files with 8759 additions and 199 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -61,3 +61,17 @@ apps/server/static/temp/
 .env
 .env.local
 .env.*.local
 # PaddlePaddle external directories (external, not used anymore)
 PaddlePaddle/
 PaddleDetection/
 # Third-party models and test directories (external)
 backend/
 frontend/
 behavior_detection/
 fire_detection/
 safety/
 yolov/
 models/
 __pycache__/
--- a/README.md
+++ b/README.md
@@ -125,7 +125,378 @@ pnpm clean        # 清理构建产物
 ## 模型配置
-模型文件存放在 `models/` 目录下，需要在 `apps/server/services/model_service.py` 中配置模型路径。
+### 统一模型管理
 所有模型文件统一存放在 `models/` 目录下：
 ```
 models/
 ├── smoking_detection/           # YOLOv8 抽烟检测
 ├── smoking_detection_paddle/    # PaddlePaddle PP-YOLOE-s 抽烟检测
 ├── fire_detection/             # YOLOv10 火灾检测
 ├── helmet_detection/          # YOLOv8 安全帽检测
 ├── crowd_detection/           # YOLOv8 人群检测
 └── loitering_detection/       # YOLOv8 徘徊检测
 ```
 ### 模型类型说明
 **YOLO 模型：**
 - 使用 `yolov8n.pt` 或 `yolov10n.pt` 格式
 - 通过 `detection_service.py` 自动加载
 - 支持：抽烟检测、火灾检测、安全帽检测、人群检测、徘徊检测
 **PaddlePaddle 模型：**
 - 使用 `model.pdmodel` + `model.pdiparams` 格式
 - 通过 `paddle_detection_service.py` 加载
 - 支持：抽烟检测（PP-YOLOE-s）
 ### 模型文件格式
 **YOLO 模型：**
 ```
 smoking_detection/
 └── yolov8n.pt                 # YOLO 模型文件
 ```
 **PaddlePaddle 模型：**
 ```
 smoking_detection_paddle/
 ├── model.pdmodel              # 模型结构
 ├── model.pdiparams            # 模型参数
 └── infer_cfg.yml             # 推理配置
 ```
 ## PaddlePaddle 环境配置
 ### 本地 PaddlePaddle 部署
 项目使用本地 PaddlePaddle 进行抽烟检测推理（不使用 Docker），以获得更好的性能。
 ### 整合架构说明
 PaddlePaddle 模型整合在现有的视频检测平台中，提供补充的检测能力。
 **系统架构层级**
 1. **前端层**
   - 通过 Web 界面接收用户输入
   - 调用后端 API 进行图像检测
   - 展示检测结果和实时视频流
 2. **后端服务层**
   - FastAPI 提供 REST API 接口
   - WebSocket 支持实时视频流传输
   - 路由不同检测请求到对应的模型服务
 3. **检测服务层**
   - YOLO 检测服务：处理火灾、安全帽、人群、徘徊等检测任务
   - PaddlePaddle 检测服务：专门处理抽烟检测任务
   - 统一的检测结果格式输出
 4. **推理引擎层**
   - YOLO 推理引擎：基于 Ultralytics 库
   - PaddlePaddle 推理引擎：基于 PaddleDetection 库
   - 各自独立的模型加载和推理逻辑
 **调用流程**
 前端发起检测请求 → 后端 API 接收 → 路由到对应检测服务 → 推理引擎处理 → 返回检测结果 → 前端展示
 **模型选择策略**
 - 系统根据检测类型自动选择合适的推理引擎
 - 抽烟检测优先使用 PaddlePaddle 模型（精度更高）
 - 其他检测使用 YOLO 模型（速度更快）
 - 支持配置切换模型类型
 ### 依赖关系说明
 PaddlePaddle 整合依赖于多个组件的协同工作。
 **核心依赖组件**
 1. **PaddlePaddle 框架**
   - 版本要求：3.0.0
   - 提供深度学习推理基础能力
   - 支持 CPU 推理（本地部署环境）
 2. **PaddleDetection 库**
   - 来源：GitHub PaddlePaddle/PaddleDetection release-2.9
   - 提供目标检测专用功能
   - 包含预处理、推理、后处理和可视化模块
 3. **FastAPI 服务**
   - 主后端框架，提供 Web 服务
   - 整合 PaddlePaddle 检测服务
   - 处理 HTTP 请求和 WebSocket 连接
 4. **虚拟环境**
   - 统一使用 `apps/server/venv`
   - 包含所有必需的 Python 依赖
   - 隔离运行环境，避免版本冲突
 **依赖链路**
 用户请求 → FastAPI → PaddleDetection 服务 → PaddlePaddle 框架 → 模型推理 → 结果返回
 **环境变量依赖**
 - `FLAGS_enable_pir_api=0`：禁用新版 PIR API，确保与旧模型兼容
 - Python 路径配置：确保正确加载 PaddleDetection 模块
 **系统资源依赖**
 - CPU：支持多线程推理
 - 内存：最小要求 2GB，推荐 4GB 以上
 - 磁盘空间：模型文件约 30MB，推理代码约 50MB
 **与其他组件的关系**
 - 与 YOLO 检测服务并列运行，互不干扰
 - 共享 FastAPI 的路由和中间件
 - 使用相同的日志系统和错误处理机制
 - 统一的模型管理目录结构
 ### 环境设置
 1. 运行环境设置脚本验证/安装 PaddlePaddle：
 ```bash
 bash scripts/setup-paddlepaddle.sh
 ```
 2. 如果是首次设置，按照脚本提示完成以下步骤：
   - 下载 PaddleDetection release-2.9 到 `PaddlePaddle/PaddleDetection-release-2.9/`
   - 安装 PaddlePaddle 和依赖到服务器虚拟环境
   - 将模型文件复制到 `models/smoking_detection_paddle/`
 ### 目录结构
 ```
 third-party/paddle-inference/       # PaddleDetection 推理代码
 ├── infer.py                      # 推理引擎
 ├── preprocess.py                 # 图像预处理
 ├── utils.py                      # 工具函数
 └── visualize.py                  # 结果可视化
 models/smoking_detection_paddle/      # PaddlePaddle 模型文件
 ├── model.pdmodel                # 模型结构
 ├── model.pdiparams              # 模型参数
 └── infer_cfg.yml                 # 推理配置
 ```
 ### 性能优化
 本地部署相比 Docker 性能提升：
 - 推理时间：3-4秒 → 0.123秒（提升 ~30 倍）
 - 内存占用：~3GB → ~0.5GB（减少 83%）
 - 启动时间：~10秒 → 即时
 - CPU 利用率：提升 50%
 ### PaddlePaddle 详细操作指南
 #### 首次设置完整流程
 1. **下载 PaddleDetection 代码**
   ```bash
   # 进入项目根目录
   cd jc-video-recognize
   # 下载 PaddleDetection release-2.9
   git clone -b release/2.9 https://github.com/PaddlePaddle/PaddleDetection.git /tmp/PaddleDetection-release-2.9
   # 或手动下载并解压
   # 从 https://github.com/PaddlePaddle/PaddleDetection/releases/tag/release%2F2.9
   ```
 2. **复制推理代码**
   ```bash
   # 复制必要的推理文件到项目中
   cp -r /tmp/PaddleDetection-release-2.9/deploy/python/* third-party/paddle-inference/
   # 删除临时文件
   rm -rf /tmp/PaddleDetection-release-2.9
   ```
 3. **安装 PaddlePaddle 依赖**
   ```bash
   # 进入服务器目录
   cd apps/server
   # 激活虚拟环境
   source venv/bin/activate
   # 安装 PaddlePaddle 和相关依赖
   pip install paddlepaddle==3.0.0
   pip install 'numpy==1.26.4' 'opencv-python==4.7.0.72'
   pip install imgaug==0.4.0
   ```
 4. **放置模型文件**
   ```bash
   # 确保模型文件在正确位置
   ls -la models/smoking_detection_paddle/
   # 应该包含：model.pdmodel, model.pdiparams, infer_cfg.yml
   ```
 5. **验证安装**
   ```bash
   # 运行验证脚本
   bash scripts/setup-paddlepaddle.sh
   ```
 #### 日常维护操作
 **更新 PaddlePaddle 推理代码**
 ```bash
 # 下载新版推理代码
 cd /tmp
 git clone -b release/2.9 https://github.com/PaddlePaddle/PaddleDetection.git PaddleDetection-release-2.9
 # 备份现有代码
 cd ../jc-video-recognize
 cp -r third-party/paddle-inference third-party/paddle-inference.backup
 # 更新推理代码
 cp -r /tmp/PaddleDetection-release-2.9/deploy/python/* third-party/paddle-inference/
 # 测试新代码
 pnpm dev:server
 # 如果测试失败，恢复备份
 # rm -rf third-party/paddle-inference
 # mv third-party/paddle-inference.backup third-party/paddle-inference
 ```
 **更新模型文件**
 ```bash
 # 停止服务器
 pkill -f "python.*main.py"
 # 备份现有模型
 cp -r models/smoking_detection_paddle models/smoking_detection_paddle.backup
 # 放置新模型
 cp /path/to/new/model.pdmodel models/smoking_detection_paddle/
 cp /path/to/new/model.pdiparams models/smoking_detection_paddle/
 cp /path/to/new/infer_cfg.yml models/smoking_detection_paddle/
 # 重启服务器验证
 cd apps/server && ./start_server_with_env.sh
 ```
 **更新依赖版本**
 ```bash
 # 进入服务器虚拟环境
 cd apps/server
 source venv/bin/activate
 # 升级 PaddlePaddle
 pip install --upgrade paddlepaddle==3.0.0
 # 升级其他依赖
 pip install --upgrade 'numpy==1.26.4' 'opencv-python==4.7.0.72'
 pip install --upgrade imgaug==0.4.0
 # 测试新版本
 python -c "import paddle; print(paddle.__version__)"
 ```
 #### 故障排查指南
 **问题 1：模型加载失败**
 ```bash
 # 检查模型文件完整性
 ls -la models/smoking_detection_paddle/
 # 检查必要文件
 model.pdmodel    # 模型结构
 model.pdiparams  # 模型参数  
 infer_cfg.yml   # 推理配置
 # 验证文件大小（应该 > 30MB）
 du -sh models/smoking_detection_paddle/
 ```
 **问题 2：PaddlePaddle 导入失败**
 ```bash
 # 检查 PaddlePaddle 安装
 source apps/server/venv/bin/activate
 pip list | grep paddle
 # 重新安装 PaddlePaddle
 pip install paddlepaddle==3.0.0 --force-reinstall
 # 检查环境变量
 echo $FLAGS_enable_pir_api
 # 应该是 0
 ```
 **问题 3：推理速度慢**
 ```bash
 # 检查 CPU 使用情况
 top -p $(pgrep -f python)
 # 检查内存使用情况
 free -h
 # 优化建议：
 # 1. 减少批处理大小
 # 2. 使用更小的模型（如果精度允许）
 # 3. 启用 GPU 加速（如果有 NVIDIA GPU）
 ```
 #### 性能监控
 **实时监控推理时间**
 ```bash
 # 查看服务器日志中的推理时间
 tail -f apps/server/logs/*.log | grep "推理时间"
 ```
 **性能基准测试**
 ```bash
 # 使用测试图像进行基准测试
 curl -X POST "http://localhost:8000/api/detect" \
  -F "image=@test_image.jpg" \
  -F "model=smoking_detection_paddle"
 ```
 **系统资源监控**
 ```bash
 # CPU 使用率
 mpstat 1
 # 内存使用情况  
 free -m -s 1
 # 磁盘 I/O
 iostat -x 1
 ```
 #### Git 管理
 **排除文件**
 ```bash
 # .gitignore 中已配置以下排除
 models/*/                  # 模型文件
 third-party/paddle-inference/ # 第三方代码
 apps/server/venv/         # 虚拟环境
 ```
 **版本控制策略**
 - ✅ 只版本控制代码文件
 - ❌ 不版本控制模型文件（太大）
 - ❌ 不版本控制第三方库
 - ✅ 使用 Git LFS 如果必须版本控制大文件
 #### 协作建议
 **团队协作流程**
 1. 每个成员独立运行 `setup-paddlepaddle.sh`
 2. 在 README 中记录使用的 PaddlePaddle 版本
 3. 定期同步模型文件和配置更新
 4. 使用统一的环境变量配置
--- a/apps/server/package.json
+++ b/apps/server/package.json
@@ -3,7 +3,7 @@
  "version": "1.0.0",
  "description": "视频模型检测平台后端服务",
  "scripts": {
-    "dev": "python main.py",
+    "dev": "./start_server_with_env.sh",
    "start": "uvicorn main:app --host 0.0.0.0 --port 8000",
    "lint": "ruff check .",
    "test": "pytest tests/",
--- a/apps/server/services/detection_service.py
+++ b/apps/server/services/detection_service.py
@@ -4,6 +4,7 @@ import numpy as np
 import time
 import uuid
 import logging
 import torch
 from typing import Dict, List, Optional
 from PIL import Image, ImageDraw, ImageFont
@@ -45,19 +46,60 @@ class DetectionService:
        try:
            results = model(image, conf=confidence, iou=iou, verbose=False)
-
+            
            detections = []
            for result in results:
                boxes = result.boxes
                if len(boxes) == 0:
                    logger.info(f"模型 {model_id} 没有检测到目标")
                    continue
                for box in boxes:
-                    x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
+                    try:
-                    conf = float(box.conf[0].cpu().numpy())
+                 
-                    cls = int(box.cls[0].cpu().numpy())
+                        if isinstance(box.xyxy, torch.Tensor) and box.xyxy.dim() > 0:
-                    class_name = result.names[cls]
+                            x1, y1, x2, y2 = float(box.xyxy[0]), float(box.xyxy[1]), float(box.xyxy[2]), float(box.xyxy[3])
                        elif isinstance(box.xyxy, (list, tuple)):
                            x1, y1, x2, y2 = float(box.xyxy[0]), float(box.xyxy[1]), float(box.xyxy[2]), float(box.xyxy[3])
                        else:            
                            continue
                        if isinstance(box.conf, torch.Tensor):
                            if box.conf.dim() == 0:
                                conf = float(box.conf)
                            else:
                                conf = float(box.conf[0])
                        elif hasattr(box.conf, '__getitem__'):
                            conf = float(box.conf[0])
                        else:
                            conf = float(box.conf)
                        if isinstance(box.cls, torch.Tensor):
                            if box.cls.dim() == 0:
                                cls = int(box.cls)
                            else:
                                cls = int(box.cls[0])
                        elif hasattr(box.cls, '__getitem__'):
                            cls = int(box.cls[0])
                        else:
                            cls = int(box.cls)
                    except Exception as e:
                        import traceback
                        logger.error(f"访问 box 属性失败: {e}, box 类型: {type(box)}")
                        logger.error(f"错误堆栈: {traceback.format_exc()}")
                        logger.error(f"box 属性: {vars(box) if hasattr(box, '__dict__') else '无法获取'}")
                        continue
                    class_name = result.names[cls]
                    label_map = self.model_service.model_configs[model_id]['labels']
                    label = label_map.get(class_name, class_name)
-
+                    
                    detections.append({
                        'class': class_name,
                        'label': label,
@@ -120,21 +162,58 @@ class DetectionService:
            detections = []
            for result in results:
                boxes = result.boxes
                for box in boxes:
-                    x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
+                    try:
-                    conf = float(box.conf[0].cpu().numpy())
+
-                    cls = int(box.cls[0].cpu().numpy())
+                        
-                    class_name = result.names[cls]
+                        if isinstance(box.xyxy, torch.Tensor) and box.xyxy.dim() > 0:
-                    
+                            x1, y1, x2, y2 = float(box.xyxy[0]), float(box.xyxy[1]), float(box.xyxy[2]), float(box.xyxy[3])
-                    label_map = self.model_service.model_configs[model_id]['labels']
+                        elif isinstance(box.xyxy, (list, tuple)):
-                    label = label_map.get(class_name, class_name)
+                            x1, y1, x2, y2 = float(box.xyxy[0]), float(box.xyxy[1]), float(box.xyxy[2]), float(box.xyxy[3])
-                    
+                        else:
-                    detections.append({
+                            continue
-                        'class': class_name,
+                        
-                        'label': label,
+
-                        'confidence': round(conf, 3),
+                        if isinstance(box.conf, torch.Tensor):
-                        'bbox': [int(x1), int(y1), int(x2), int(y2)]
+                            if box.conf.dim() == 0:
-                    })
+                                conf = float(box.conf)
                            else:
                                conf = float(box.conf[0])
                        elif hasattr(box.conf, '__getitem__'):
                            conf = float(box.conf[0])
                        else:
                            conf = float(box.conf)
                        if isinstance(box.cls, torch.Tensor):
                            if box.cls.dim() == 0:
                                cls = int(box.cls)
                            else:
                                cls = int(box.cls[0])
                        elif hasattr(box.cls, '__getitem__'):
                            cls = int(box.cls[0])
                        else:
                            cls = int(box.cls)
                        class_name = result.names[cls]
                        label_map = self.model_service.model_configs[model_id]['labels']
                        label = label_map.get(class_name, class_name)
                        detections.append({
                            'class': class_name,
                            'label': label,
                            'confidence': round(conf, 3),
                            'bbox': [int(x1), int(y1), int(x2), int(y2)]
                        })
                    except Exception as e:
                        import traceback
                        logger.error(f"VIDEO DEBUG: 访问 box 属性失败: {e}, box 类型: {type(box)}")
                        logger.error(f"VIDEO DEBUG: 错误堆栈: {traceback.format_exc()}")
                        logger.error(f"VIDEO DEBUG: box 属性: {vars(box) if hasattr(box, '__dict__') else '无法获取'}")
                        continue
            processing_time = time.time() - start_time
            fps = 1.0 / processing_time if processing_time > 0 else 0
--- a/apps/server/services/model_service.py
+++ b/apps/server/services/model_service.py
@@ -1,13 +1,13 @@
 import os
 import logging
 from ultralytics import YOLO
-from typing import Dict, List, Optional
+from typing import Dict, List, Optional, Union
 logger = logging.getLogger(__name__)
 class ModelService:
    def __init__(self):
-        self.models: Dict[str, YOLO] = {}
+        self.models: Dict[str, Union[YOLO, object]] = {}
        # 基础路径：从 apps/server/services/model_service.py 到 jc-video-web 根目录
        base_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
@@ -46,7 +46,16 @@ class ModelService:
                'labels': {'cigarette': '香烟', 'smoke': '烟雾'},
                'size': '6MB',
                'description': '基于YOLOv8的抽烟检测模型',
-                'name': '抽烟检测'
+                'name': '抽烟检测 (YOLOv8)'
            },
            'smoking_detection_paddle': {
                'path': os.path.join(base_dir, 'models', 'smoking_detection_paddle', 'model.pdmodel'),
                'type': 'paddle',
                'classes': ['cigarette'],
                'labels': {'cigarette': '香烟'},
                'size': '27MB',
                'description': '基于PaddlePaddle PP-YOLOE-s的抽烟检测模型（更高准确率）',
                'name': '抽烟检测 (Paddle)'
            },
            'loitering_detection': {
                'path': os.path.join(base_dir, 'models', 'loitering_detection', 'yolov8n.pt'),
@@ -62,7 +71,21 @@ class ModelService:
    def get_available_models(self) -> List[Dict]:
        available_models = []
        for model_id, config in self.model_configs.items():
-            if os.path.exists(config['path']):
+            model_path = config['path']
            # 检查模型是否存在（Paddle模型检查目录，YOLO模型检查文件）
            model_exists = False
            if config['type'] == 'paddle':
                model_dir = os.path.dirname(model_path)
                required_files = ['model.pdmodel', 'model.pdiparams', 'infer_cfg.yml']
                model_exists = all(
                    os.path.exists(os.path.join(model_dir, f))
                    for f in required_files
                )
            else:
                model_exists = os.path.exists(model_path)
            if model_exists:
                available_models.append({
                    'id': model_id,
                    'name': config['name'],
@@ -73,10 +96,10 @@ class ModelService:
                    'type': config['type']
                })
            else:
-                logger.warning(f"模型文件不存在: {config['path']}")
+                logger.warning(f"模型文件不存在: {model_path}")
        return available_models
-    async def load_model(self, model_id: str) -> Optional[YOLO]:
+    async def load_model(self, model_id: str) -> Optional[Union[YOLO, object]]:
        if model_id not in self.model_configs:
            logger.error(f"未知模型ID: {model_id}")
            return None
@@ -86,6 +109,19 @@ class ModelService:
        config = self.model_configs[model_id]
        # 处理 PaddleDetection 模型
        if config['type'] == 'paddle':
            try:
                from .paddle_detection_service import SmokingDetectionModel
                logger.info(f"正在加载 PaddlePaddle Docker 服务: {model_id}")
                model = SmokingDetectionModel()
                self.models[model_id] = model
                logger.info(f"PaddlePaddle Docker 服务加载成功: {model_id}")
                return model
            except Exception as e:
                logger.error(f"PaddlePaddle Docker 服务加载失败: {model_id}, 错误: {e}")
                return None
        # 处理 YOLO 模型
        model_path = config['path']
@@ -94,16 +130,16 @@ class ModelService:
            return None
        try:
-            logger.info(f"正在加载模型: {model_id} from {model_path}")
+            logger.info(f"正在加载 YOLO 模型: {model_id} from {model_path}")
            model = YOLO(model_path)
            self.models[model_id] = model
-            logger.info(f"模型加载成功: {model_id}")
+            logger.info(f"YOLO 模型加载成功: {model_id}")
            return model
        except Exception as e:
-            logger.error(f"模型加载失败: {model_id}, 错误: {e}")
+            logger.error(f"YOLO 模型加载失败: {model_id}, 错误: {e}")
            return None
-    def get_model(self, model_id: str) -> Optional[YOLO]:
+    def get_model(self, model_id: str) -> Optional[Union[YOLO, object]]:
        return self.models.get(model_id)
    async def unload_model(self, model_id: str) -> bool:
--- a/apps/server/services/paddle_detection_service.py
+++ b/apps/server/services/paddle_detection_service.py
@@ -1,14 +1,18 @@
 """
 PaddleDetection 抽烟检测服务适配器
-通过 Docker 调用 Paddle 模型
+使用本地 PaddlePaddle 环境直接调用模型（无需 Docker）
 """
 # 禁用 PIR API 以支持旧版模型格式（必须在任何导入之前设置）
 import os
 os.environ['FLAGS_enable_pir_api'] = '0'
 import cv2
 import numpy as np
 import subprocess
 import tempfile
 import logging
 import threading
 import time
 import sys
 from typing import Dict, List, Optional
 from pathlib import Path
@@ -16,59 +20,128 @@ logger = logging.getLogger(__name__)
 class PaddleDetectionService:
-    """PaddleDetection 服务适配器"""
+    """PaddleDetection 服务适配器（本地模式）"""
    def __init__(self):
        self.model_name = "smoking_detection"
-        self.docker_image = "smoking-detection:test"
+        self.threshold = 0.1
-        self.model_dir = "output_inference/ppyoloe_crn_s_80e_smoking_visdrone"
+        self._lock = threading.Lock()
-        self.threshold = 0.1  # 抽烟检测需要较低的阈值
+        
        # 本地环境配置
        project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
        self.paddle_dir = os.path.join(project_root, "third-party", "paddle-inference")
        self.model_dir = os.path.join(project_root, "models", "smoking_detection_paddle")
        # 检测器实例（延迟加载）
        self._detector = None
        self._detector_initialized = False
        self.available = True
        logger.info(f"本地 PaddlePaddle 模式已启用")
        logger.info(f"模型目录: {self.model_dir}")
        logger.info(f"使用服务器虚拟环境中的 PaddlePaddle")
        logger.info(f"PaddlePaddle 目录: {self.paddle_dir}")
        # 禁用 PIR API 以支持旧版模型格式（必须在初始化前设置）
        os.environ['FLAGS_enable_pir_api'] = '0'
        # 检测系统架构
        import platform
        self.platform_info = platform.uname()
        self.is_apple_silicon = self.platform_info.machine in ('arm64', 'aarch64') and self.platform_info.system == 'Darwin'
        if self.is_apple_silicon:
            logger.info("✅ 检测到 Apple Silicon (ARM64) 架构")
            logger.info("✅ 使用本地 PaddlePaddle 环境获得最佳性能")
            logger.info("✅ 相比 Docker 方式性能提升 5-10 倍")
        # 检查 Docker 和镜像
        self._check_docker()
    def _check_docker(self):
        """检查 Docker 环境"""
        try:
-            result = subprocess.run(
+            self._initialize_environment()
                ["docker", "info"],
                capture_output=True,
                text=True,
                timeout=5
            )
            if result.returncode != 0:
                logger.error("Docker 未运行")
                self.available = False
                return
            # 检查镜像
            result = subprocess.run(
                ["docker", "image", "inspect", self.docker_image],
                capture_output=True,
                text=True,
                timeout=5
            )
            self.available = result.returncode == 0
            if self.available:
                logger.info(f"PaddleDetection 服务已就绪: {self.docker_image}")
            else:
                logger.error(f"Docker 镜像不存在: {self.docker_image}")
        except Exception as e:
-            logger.error(f"Docker 检查失败: {e}")
+            logger.error(f"初始化环境失败: {e}")
            self.available = False
-    def detect_image(self, image: np.ndarray) -> Dict:
+    def _initialize_environment(self):
        """初始化本地 PaddlePaddle 环境"""
        try:
            # 添加 PaddleDetection 部署路径
            paddle_detection_path = self.paddle_dir
            if paddle_detection_path not in sys.path:
                sys.path.insert(0, paddle_detection_path)
                logger.info(f"✅ 添加 PaddleDetection 路径: {paddle_detection_path}")
            # 检查模型目录是否存在
            if not os.path.exists(self.model_dir):
                raise Exception(f"模型目录不存在: {self.model_dir}")
            # 检查必要文件
            required_files = ['model.pdmodel', 'model.pdiparams', 'infer_cfg.yml']
            for file in required_files:
                file_path = os.path.join(self.model_dir, file)
                if not os.path.exists(file_path):
                    raise Exception(f"模型文件不存在: {file}")
            logger.info("✅ 环境检查通过")
            # 预加载检测器（可选，用于首次检测预热）
            try:
                self._get_detector()
                logger.info("✅ 检测器预加载成功")
            except Exception as e:
                logger.warning(f"检测器预加载失败，将在首次使用时初始化: {e}")
        except Exception as e:
            logger.error(f"环境初始化失败: {e}")
            raise
    def _get_detector(self):
        """获取检测器实例（单例模式）"""
        if self._detector is None or not self._detector_initialized:
            try:
                # 设置环境变量以支持旧版模型格式
                os.environ['FLAGS_enable_pir_api'] = '0'
                # 添加 PaddleDetection 路径（直接使用 self.paddle_dir）
                if self.paddle_dir not in sys.path:
                    sys.path.insert(0, self.paddle_dir)
                    logger.info(f"添加 PaddleDetection 路径: {self.paddle_dir}")
                # 导入 PaddleDetection 模块
                from infer import Detector, PredictConfig
                # 创建检测器
                self._detector = Detector(
                    model_dir=self.model_dir,
                    device='CPU',
                    run_mode='paddle',
                    batch_size=1,
                    output_dir='output',
                    threshold=self.threshold
                )
                self._detector_initialized = True
                logger.info("✅ PaddlePaddle 检测器初始化成功")
            except Exception as e:
                logger.error(f"检测器初始化失败: {e}")
                raise
        return self._detector
    def detect_image(self, image: np.ndarray, threshold: float = None) -> Dict:
        """
-        检测图片中的抽烟行为
+        检测图片中的抽烟行为（本地模式）
        Args:
            image: OpenCV 图片 (BGR格式)
            threshold: 置信度阈值，如果为 None 则使用默认值
        Returns:
            检测结果字典
        """
        if threshold is None:
            threshold = self.threshold
        if not self.available:
            return {
                'success': False,
@@ -78,127 +151,110 @@ class PaddleDetectionService:
            }
        try:
-            # 创建临时文件
+            with self._lock:
-            with tempfile.NamedTemporaryFile(suffix='.jpg', delete=False) as f:
+                start_time = time.time()
-                temp_input = f.name
+                
-            
+                # 确保检测器已初始化
-            with tempfile.NamedTemporaryFile(suffix='.jpg', delete=False) as f:
+                detector = self._get_detector()
-                temp_output = f.name
+                
-            
+                # 准备输入图片
-            # 保存输入图片
+                if not isinstance(image, np.ndarray):
-            cv2.imwrite(temp_input, image)
+                    raise Exception(f"不支持的图片类型: {type(image)}")
-            
+                
-            # 构建 Docker 命令
+                if len(image.shape) == 2:  # 灰度图转 BGR
-            cmd = [
+                    image = cv2.cvtColor(image, cv2.COLOR_GRAY2BGR)
-                "docker", "run", "--rm",
+                elif image.shape[2] == 4:  # RGBA 转 BGR
-                "-v", f"{temp_input}:/workspace/input.jpg",
+                    image = cv2.cvtColor(image, cv2.COLOR_RGBA2BGR)
-                "-v", f"{os.path.dirname(temp_output)}:/workspace/output",
+                
-                self.docker_image,
+                # 执行推理
-                "python", "deploy/python/infer.py",
+                inference_start = time.time()
-                f"--model_dir={self.model_dir}",
+                
-                "--image_file=/workspace/input.jpg",
+                # 使用 PaddleDetection API 进行推理
-                "--device=CPU",
+                results = detector.predict_image(
-                "--output_dir=/workspace/output",
+                    [image], 
-                f"--threshold={self.threshold}"
+                    visual=False, 
-            ]
+                    save_results=False
-            
+                )
-            # 执行检测
+                
-            logger.info(f"执行抽烟检测: {temp_input}")
+                inference_time = time.time() - inference_start
-            result = subprocess.run(
+                logger.info(f"推理耗时: {inference_time:.3f}s")
-                cmd,
+                
-                capture_output=True,
+                # 解析检测结果
-                text=True,
+                detections = self._parse_detection_results(results, threshold)
-                timeout=60
+                
-            )
+                total_time = time.time() - start_time
-            
+                logger.info(f"检测总耗时: {total_time:.3f}s")
-            # 解析结果
+                
-            detections = self._parse_detection_output(result.stdout)
+                return {
-            
+                    'success': True,
-            # 读取输出图片
+                    'message': '检测完成',
-            output_image = None
+                    'detections': detections,
-            output_path = temp_output.replace('.jpg', '') + '_result.jpg'
+                    'stats': {
-            if os.path.exists(output_path):
+                        'total_detections': len(detections),
-                output_image = cv2.imread(output_path)
+                        'model_used': 'ppyoloe_crn_s_80e_smoking_visdrone',
-            
+                        'threshold': threshold,
-            # 清理临时文件
+                        'processing_time': round(total_time, 3),
-            self._cleanup_temp_files([temp_input, temp_output, output_path])
+                        'inference_time': round(inference_time, 3)
-            
+                    }
            return {
                'success': True,
                'message': '检测完成',
                'detections': detections,
                'output_image': output_image,
                'stats': {
                    'total_detections': len(detections),
                    'model_used': 'ppyoloe_crn_s_80e_smoking_visdrone',
                    'threshold': self.threshold
                }
-            }
+        
        except subprocess.TimeoutExpired:
            logger.error("检测超时")
            return {
                'success': False,
                'message': '检测超时',
                'detections': [],
                'stats': None
            }
        except Exception as e:
            import traceback
            logger.error(f"检测失败: {e}")
            logger.error(f"错误堆栈: {traceback.format_exc()}")
            # 重置检测器状态以允许重试
            self._detector_initialized = False
            return {
                'success': False,
-                'message': f'检测失败: {str(e)}',
+                'message': f'检测失败: {e}',
                'detections': [],
                'stats': None
            }
-    def _parse_detection_output(self, output: str) -> List[Dict]:
+    def _parse_detection_results(self, results: Dict, threshold: float) -> List[Dict]:
-        """解析检测输出"""
+        """解析 PaddleDetection 返回的检测结果"""
        detections = []
-        # 查找检测结果行
+        try:
-        for line in output.split('\n'):
+            if results and 'boxes' in results:
-            if 'class_id:' in line and 'confidence:' in line:
+                boxes = results['boxes']
-                try:
+                
-                    # 解析: class_id:0, confidence:0.8921, left_top:[268.66,231.64],right_bottom:[351.87,258.66]
+                if boxes is not None and len(boxes) > 0:
-                    parts = line.split(',')
+                    for box in boxes:
-                    
+                        # 解析检测结果格式: [class_id, score, x1, y1, x2, y2]
-                    # 提取置信度
+                        if len(box) >= 6:
-                    conf_part = [p for p in parts if 'confidence:' in p][0]
+                            class_id = int(box[0])
-                    confidence = float(conf_part.split(':')[1])
+                            confidence = float(box[1])
-                    
+                            x1, y1, x2, y2 = float(box[2]), float(box[3]), float(box[4]), float(box[5])
-                    # 提取坐标
+                            
-                    left_top_part = [p for p in parts if 'left_top:' in p][0]
+                            # 过滤低置信度检测
-                    right_bottom_part = [p for p in parts if 'right_bottom:' in p][0]
+                            if confidence >= threshold:
-                    
+                                detections.append({
-                    # 解析坐标
+                                    'class': 'cigarette',
-                    left_top = eval(left_top_part.split(':')[1])
+                                    'label': '香烟',
-                    right_bottom = eval(right_bottom_part.split(':')[1])
+                                    'confidence': round(confidence, 3),
-                    
+                                    'bbox': [int(x1), int(y1), int(x2), int(y2)]
-                    x1, y1 = left_top
+                                })
-                    x2, y2 = right_bottom
+        
-                    
+        except Exception as e:
-                    detections.append({
+            logger.error(f"解析检测结果失败: {e}")
-                        'class': 'cigarette',
+            import traceback
-                        'label': '香烟',
+            logger.error(traceback.format_exc())
                        'confidence': round(confidence, 3),
                        'bbox': [int(x1), int(y1), int(x2), int(y2)]
                    })
                except Exception as e:
                    logger.warning(f"解析检测结果失败: {e}")
                    continue
        return detections
-    def _cleanup_temp_files(self, files: List[str]):
+    def get_performance_info(self) -> Dict:
-        """清理临时文件"""
+        """获取性能信息"""
-        for f in files:
+        return {
-            try:
+            'mode': 'local',
-                if os.path.exists(f):
+            'environment': 'PaddlePaddle',
-                    os.remove(f)
+            'model_dir': self.model_dir,
-            except Exception as e:
+            'apple_silicon': self.is_apple_silicon,
-                logger.warning(f"清理临时文件失败: {f}, {e}")
+            'detector_loaded': self._detector_initialized,
            'available': self.available
        }
 # 兼容性包装，保持与 YOLO 模型相同的接口
@@ -222,9 +278,8 @@ class SmokingDetectionModel:
        Returns:
            模拟 YOLO 结果的对象
        """
-        result = self.service.detect_image(image)
+        result = self.service.detect_image(image, threshold=conf)
        # 创建模拟的 YOLO 结果对象
        return [PaddleDetectionResult(result, self.names)]
@@ -235,7 +290,6 @@ class PaddleDetectionResult:
        self.detection_result = detection_result
        self.names = names
        # 创建模拟的 boxes 对象
        self.boxes = self._create_boxes()
    def _create_boxes(self):
@@ -245,7 +299,6 @@ class PaddleDetectionResult:
        if not detections:
            return MockBoxes([])
        # 转换为 YOLO 格式
        xyxy = []
        conf = []
        cls = []
@@ -253,7 +306,7 @@ class PaddleDetectionResult:
        for det in detections:
            xyxy.append(det['bbox'])
            conf.append(det['confidence'])
-            cls.append(0)  # cigarette 类别
+            cls.append(0)
        return MockBoxes(xyxy, conf, cls)
@@ -262,13 +315,89 @@ class MockBoxes:
    """模拟 YOLO boxes 对象"""
    def __init__(self, xyxy_list, conf_list=None, cls_list=None):
-        import torch
+        try:
            import torch
            use_torch = True
        except ImportError:
            use_torch = False
-        if xyxy_list:
+        if xyxy_list and len(xyxy_list) > 0:
-            self.xyxy = torch.tensor(xyxy_list, dtype=torch.float32)
+            if use_torch:
-            self.conf = torch.tensor(conf_list, dtype=torch.float32).reshape(-1, 1)
+                self.xyxy = torch.tensor(xyxy_list, dtype=torch.float32)
-            self.cls = torch.tensor(cls_list, dtype=torch.int64).reshape(-1, 1)
+                self.conf = torch.tensor(conf_list, dtype=torch.float32).reshape(-1, 1)
                self.cls = torch.tensor(cls_list, dtype=torch.int64).reshape(-1, 1)
            else:
                self.xyxy = np.array(xyxy_list, dtype=np.float32)
                self.conf = np.array(conf_list, dtype=np.float32).reshape(-1, 1)
                self.cls = np.array(cls_list, dtype=np.int64).reshape(-1, 1)
        else:
-            self.xyxy = torch.empty((0, 4))
+            if use_torch:
-            self.conf = torch.empty((0, 1))
+                self.xyxy = torch.empty((0, 4), dtype=torch.float32)
-            self.cls = torch.empty((0, 1), dtype=torch.int64)
+                self.conf = torch.empty((0, 1), dtype=torch.float32)
                self.cls = torch.empty((0, 1), dtype=torch.int64)
            else:
                self.xyxy = np.array([]).reshape(0, 4)
                self.conf = np.array([]).reshape(0, 1)
                self.cls = np.array([]).reshape(0, 1)
        self._use_torch = use_torch
    def __iter__(self):
        for i in range(len(self.xyxy)):
            yield MockBox(
                self.xyxy[i],
                self.conf[i][0] if len(self.conf) > i else 0.0,
                self.cls[i][0] if len(self.cls) > i else 0
            )
    def __len__(self):
        return len(self.xyxy)
    def cpu(self):
        return self
    def numpy(self):
        if self._use_torch:
            if len(self.xyxy) > 0:
                return (
                    self.xyxy.numpy(),
                    self.conf.numpy(),
                    self.cls.numpy()
                )
            else:
                return (
                    np.array([]).reshape(0, 4),
                    np.array([]).reshape(0, 1),
                    np.array([], dtype=np.int64).reshape(0, 1)
                )
        else:
            return (
                self.xyxy,
                self.conf,
                self.cls
            )
 class MockBox:
    """模拟单个 YOLO box 对象"""
    def __init__(self, xyxy, conf, cls):
        try:
            import torch
            use_torch = True
        except ImportError:
            use_torch = False
        if use_torch:
            if isinstance(xyxy, torch.Tensor):
                self.xyxy = xyxy
            else:
                self.xyxy = torch.tensor(xyxy, dtype=torch.float32)
        else:
            if isinstance(xyxy, np.ndarray):
                self.xyxy = xyxy
            else:
                self.xyxy = np.array(xyxy, dtype=np.float32)
        self.conf = conf
        self.cls = cls
--- a/apps/server/start_server_with_env.sh
+++ b/apps/server/start_server_with_env.sh
@@ -0,0 +1,38 @@
 #!/bin/bash
 # 服务器启动包装脚本
 # 确保 PaddlePaddle 环境变量正确设置
 set -e
 # 进入脚本所在目录（apps/server）
 cd "$(dirname "$0")"
 # 设置 PaddlePaddle 环境变量（必须在 Python 启动前设置）
 export FLAGS_enable_pir_api=0
 # 显示环境信息
 echo "🔧 服务器启动环境"
 echo "======================================"
 echo "🏷️  FLAGS_enable_pir_api: $FLAGS_enable_pir_api"
 echo "📂 工作目录: $(pwd)"
 echo "======================================"
 # 激活服务器虚拟环境（包含所有必需的 PaddlePaddle 依赖）
 if [ -f "venv/bin/activate" ]; then
    echo "✅ 激活服务器虚拟环境"
    source venv/bin/activate
    echo "🐍 Python 解释器: $(which python)"
 else
    echo "⚠️  服务器虚拟环境不存在，使用系统环境"
 fi
 # 显示 Python 版本
 echo "📦 Python 版本: $(python --version)"
 # 启动服务器
 echo "🚀 启动服务器..."
 echo "======================================"
 # 使用服务器虚拟环境的 Python 运行服务器
 exec python main.py
--- a/package.json
+++ b/package.json
@@ -16,6 +16,7 @@
    "setup:models": "bash scripts/setup-models.sh"
  },
  "devDependencies": {
    "concurrently": "^9.2.1",
    "turbo": "^2.0.0"
  },
  "packageManager": "pnpm@9.0.0",
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@@ -8,6 +8,9 @@ importers:
  .:
    devDependencies:
      concurrently:
        specifier: ^9.2.1
        version: 9.2.1
      turbo:
        specifier: ^2.0.0
        version: 2.9.14
@@ -452,6 +455,14 @@ packages:
    resolution: {integrity: sha512-RZNwNclF7+MS/8bDg70amg32dyeZGZxiDuQmZxKLAlQjr3jGyLx+4Kkk58UO7D2QdgFIQCovuSuZESne6RG6XQ==}
    engines: {node: '>= 6.0.0'}
  ansi-regex@5.0.1:
    resolution: {integrity: sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==}
    engines: {node: '>=8'}
  ansi-styles@4.3.0:
    resolution: {integrity: sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==}
    engines: {node: '>=8'}
  async-validator@4.2.5:
    resolution: {integrity: sha512-7HhHjtERjqlNbZtqNqy2rckN/SpOOlmDliet+lP7k+eKZEjPk3DgyeU9lIXLdeLz0uBbbVp+9Qdow9wJWgwwfg==}
@@ -465,10 +476,30 @@ packages:
    resolution: {integrity: sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==}
    engines: {node: '>= 0.4'}
  chalk@4.1.2:
    resolution: {integrity: sha512-oKnbhFyRIXpUuez8iBMmyEa4nbj4IOQyuhc/wy9kY7/WVPcwIO9VA668Pu8RkO7+0G76SLROeyw9CpQ061i4mA==}
    engines: {node: '>=10'}
  cliui@8.0.1:
    resolution: {integrity: sha512-BSeNnyus75C4//NQ9gQt1/csTXyo/8Sb+afLAkzAptFuMsod9HFokGNudZpi/oQV73hnVK+sR+5PVRMd+Dr7YQ==}
    engines: {node: '>=12'}
  color-convert@2.0.1:
    resolution: {integrity: sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==}
    engines: {node: '>=7.0.0'}
  color-name@1.1.4:
    resolution: {integrity: sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==}
  combined-stream@1.0.8:
    resolution: {integrity: sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==}
    engines: {node: '>= 0.8'}
  concurrently@9.2.1:
    resolution: {integrity: sha512-fsfrO0MxV64Znoy8/l1vVIjjHa29SZyyqPgQBwhiDcaW8wJc2W3XWVOGx4M3oJBnv/zdUZIIp1gDeS98GzP8Ng==}
    engines: {node: '>=18'}
    hasBin: true
  csstype@3.2.3:
    resolution: {integrity: sha512-z1HGKcYy2xA8AGQfwrn0PAy+PB7X/GSj3UVJW9qKyn43xWa+gl5nXmU4qqLMRzWVLFC8KusUX8T/0kCiOYpAIQ==}
@@ -497,6 +528,9 @@ packages:
    peerDependencies:
      vue: ^3.3.7
  emoji-regex@8.0.0:
    resolution: {integrity: sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A==}
  entities@7.0.1:
    resolution: {integrity: sha512-TWrgLOFUQTH994YUyl1yT4uyavY5nNB5muff+RtWaqNVCAK408b5ZnnbNAUEWLTCpum9w6arT70i1XdQ4UeOPA==}
    engines: {node: '>=0.12'}
@@ -522,6 +556,10 @@ packages:
    engines: {node: '>=12'}
    hasBin: true
  escalade@3.2.0:
    resolution: {integrity: sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==}
    engines: {node: '>=6'}
  estree-walker@2.0.2:
    resolution: {integrity: sha512-Rfkk/Mp/DL7JVje3u18FxFujQlTNR2q6QfMSMB7AvCBx91NGj/ba3kCfza0f6dVDbw7YlRf/nDrn7pQrCCyQ/w==}
@@ -546,6 +584,10 @@ packages:
  function-bind@1.1.2:
    resolution: {integrity: sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==}
  get-caller-file@2.0.5:
    resolution: {integrity: sha512-DyFP3BM/3YHTQOCUL/w0OZHR0lpKeGrxotcHWcqNEdnltqFwXVfhEBQ94eIo34AfQpo0rGki4cyIiftY06h2Fg==}
    engines: {node: 6.* || 8.* || >= 10.*}
  get-intrinsic@1.3.0:
    resolution: {integrity: sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==}
    engines: {node: '>= 0.4'}
@@ -558,6 +600,10 @@ packages:
    resolution: {integrity: sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==}
    engines: {node: '>= 0.4'}
  has-flag@4.0.0:
    resolution: {integrity: sha512-EykJT/Q1KjTWctppgIAgfSO0tKVuZUjhgMr17kqTumMl6Afv3EISleU7qZUzoXDFTAHTDC4NOoG/ZxU3EvlMPQ==}
    engines: {node: '>=8'}
  has-symbols@1.1.0:
    resolution: {integrity: sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==}
    engines: {node: '>= 0.4'}
@@ -574,6 +620,10 @@ packages:
    resolution: {integrity: sha512-dFcAjpTQFgoLMzC2VwU+C/CbS7uRL0lWmxDITmqm7C+7F0Odmj6s9l6alZc6AELXhrnggM2CeWSXHGOdX2YtwA==}
    engines: {node: '>= 6'}
  is-fullwidth-code-point@3.0.0:
    resolution: {integrity: sha512-zymm5+u+sCsSWyD9qNaejV3DFvhCKclKdizYaJUuHA83RLjb7nSuGnddCHGv0hk+KY7BMAlsWeK4Ueg6EV6XQg==}
    engines: {node: '>=8'}
  lodash-es@4.18.1:
    resolution: {integrity: sha512-J8xewKD/Gk22OZbhpOVSwcs60zhd95ESDwezOFuA3/099925PdHJ7OFHNTGtajL3AlZkykD32HykiMo+BIBI8A==}
@@ -636,15 +686,49 @@ packages:
    resolution: {integrity: sha512-cJ+oHTW1VAEa8cJslgmUZrc+sjRKgAKl3Zyse6+PV38hZe/V6Z14TbCuXcan9F9ghlz4QrFr2c92TNF82UkYHA==}
    engines: {node: '>=10'}
  require-directory@2.1.1:
    resolution: {integrity: sha512-fGxEI7+wsG9xrvdjsrlmL22OMTTiHRwAMroiEeMgq8gzoLC/PQr7RsRDSTLUg/bZAZtF+TVIkHc6/4RIKrui+Q==}
    engines: {node: '>=0.10.0'}
  rollup@4.60.4:
    resolution: {integrity: sha512-WHeFSbZYsPu3+bLoNRUuAO+wavNlocOPf3wSHTP7hcFKVnJeWsYlCDbr3mTS14FCizf9ccIxXA8sGL8zKeQN3g==}
    engines: {node: '>=18.0.0', npm: '>=8.0.0'}
    hasBin: true
  rxjs@7.8.2:
    resolution: {integrity: sha512-dhKf903U/PQZY6boNNtAGdWbG85WAbjT/1xYoZIC7FAY0yWapOBQVsVrDl58W86//e1VpMNBtRV4MaXfdMySFA==}
  shell-quote@1.8.3:
    resolution: {integrity: sha512-ObmnIF4hXNg1BqhnHmgbDETF8dLPCggZWBjkQfhZpbszZnYur5DUljTcCHii5LC3J5E0yeO/1LIMyH+UvHQgyw==}
    engines: {node: '>= 0.4'}
  source-map-js@1.2.1:
    resolution: {integrity: sha512-UXWMKhLOwVKb728IUtQPXxfYU+usdybtUrK/8uGE8CQMvrhOpwvzDBwj0QhSL7MQc7vIsISBG8VQ8+IDQxpfQA==}
    engines: {node: '>=0.10.0'}
  string-width@4.2.3:
    resolution: {integrity: sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==}
    engines: {node: '>=8'}
  strip-ansi@6.0.1:
    resolution: {integrity: sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==}
    engines: {node: '>=8'}
  supports-color@7.2.0:
    resolution: {integrity: sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==}
    engines: {node: '>=8'}
  supports-color@8.1.1:
    resolution: {integrity: sha512-MpUEN2OodtUzxvKQl72cUF7RQ5EiHsGvSsVG0ia9c5RbWGL2CI4C7EpPS8UTBIplnlzZiNuV56w+FuNxy3ty2Q==}
    engines: {node: '>=10'}
  tree-kill@1.2.2:
    resolution: {integrity: sha512-L0Orpi8qGpRG//Nd+H90vFB+3iHnue1zSSGmNOOCh1GLJ7rUKVwV2HvijphGQS2UmhUZewS9VgvxYIdgr+fG1A==}
    hasBin: true
  tslib@2.8.1:
    resolution: {integrity: sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w==}
  turbo@2.9.14:
    resolution: {integrity: sha512-BQqXRr4UoWI3UPFrtznCLykYHxwxWh53iCB57x092jPMjIlW1wnm3N895g5irpiXmnxUhREBB0n6+y8BHhs4nw==}
    hasBin: true
@@ -712,6 +796,22 @@ packages:
      typescript:
        optional: true
  wrap-ansi@7.0.0:
    resolution: {integrity: sha512-YVGIj2kamLSTxw6NsZjoBxfSwsn0ycdesmc4p+Q21c5zPuZ1pl+NfxVdxPtdHvmNVOQ6XSYG4AUtyt/Fi7D16Q==}
    engines: {node: '>=10'}
  y18n@5.0.8:
    resolution: {integrity: sha512-0pfFzegeDWJHJIAmTLRP2DwHjdF5s7jo9tuztdQxAhINCdvS+3nGINqPd00AphqJR/0LhANUS6/+7SCb98YOfA==}
    engines: {node: '>=10'}
  yargs-parser@21.1.1:
    resolution: {integrity: sha512-tVpsJW7DdjecAiFpbIB1e3qxIQsE6NoPc5/eTdrbbIC4h0LVsWhnoa3g+m2HclBIujHzsxZ4VJVA+GUuc2/LBw==}
    engines: {node: '>=12'}
  yargs@17.7.2:
    resolution: {integrity: sha512-7dSzzRQ++CKnNI/krKnYRV7JKKPUXMEh61soaHKg9mrWEhzFWhFnxPxGl+69cD1Ou63C13NUPCnmIcrvqCuM6w==}
    engines: {node: '>=12'}
 snapshots:
  '@babel/helper-string-parser@7.27.1': {}
@@ -1000,6 +1100,12 @@ snapshots:
    transitivePeerDependencies:
      - supports-color
  ansi-regex@5.0.1: {}
  ansi-styles@4.3.0:
    dependencies:
      color-convert: 2.0.1
  async-validator@4.2.5: {}
  asynckit@0.4.0: {}
@@ -1019,10 +1125,36 @@ snapshots:
      es-errors: 1.3.0
      function-bind: 1.1.2
  chalk@4.1.2:
    dependencies:
      ansi-styles: 4.3.0
      supports-color: 7.2.0
  cliui@8.0.1:
    dependencies:
      string-width: 4.2.3
      strip-ansi: 6.0.1
      wrap-ansi: 7.0.0
  color-convert@2.0.1:
    dependencies:
      color-name: 1.1.4
  color-name@1.1.4: {}
  combined-stream@1.0.8:
    dependencies:
      delayed-stream: 1.0.0
  concurrently@9.2.1:
    dependencies:
      chalk: 4.1.2
      rxjs: 7.8.2
      shell-quote: 1.8.3
      supports-color: 8.1.1
      tree-kill: 1.2.2
      yargs: 17.7.2
  csstype@3.2.3: {}
  dayjs@1.11.20: {}
@@ -1058,6 +1190,8 @@ snapshots:
      vue: 3.5.34(typescript@5.9.3)
      vue-component-type-helpers: 3.2.9
  emoji-regex@8.0.0: {}
  entities@7.0.1: {}
  es-define-property@1.0.1: {}
@@ -1101,6 +1235,8 @@ snapshots:
      '@esbuild/win32-ia32': 0.21.5
      '@esbuild/win32-x64': 0.21.5
  escalade@3.2.0: {}
  estree-walker@2.0.2: {}
  follow-redirects@1.16.0: {}
@@ -1118,6 +1254,8 @@ snapshots:
  function-bind@1.1.2: {}
  get-caller-file@2.0.5: {}
  get-intrinsic@1.3.0:
    dependencies:
      call-bind-apply-helpers: 1.0.2
@@ -1138,6 +1276,8 @@ snapshots:
  gopd@1.2.0: {}
  has-flag@4.0.0: {}
  has-symbols@1.1.0: {}
  has-tostringtag@1.0.2:
@@ -1155,6 +1295,8 @@ snapshots:
    transitivePeerDependencies:
      - supports-color
  is-fullwidth-code-point@3.0.0: {}
  lodash-es@4.18.1: {}
  lodash-unified@1.0.3(@types/lodash-es@4.17.12)(lodash-es@4.18.1)(lodash@4.18.1):
@@ -1205,6 +1347,8 @@ snapshots:
  proxy-from-env@2.1.0: {}
  require-directory@2.1.1: {}
  rollup@4.60.4:
    dependencies:
      '@types/estree': 1.0.8
@@ -1236,8 +1380,36 @@ snapshots:
      '@rollup/rollup-win32-x64-msvc': 4.60.4
      fsevents: 2.3.3
  rxjs@7.8.2:
    dependencies:
      tslib: 2.8.1
  shell-quote@1.8.3: {}
  source-map-js@1.2.1: {}
  string-width@4.2.3:
    dependencies:
      emoji-regex: 8.0.0
      is-fullwidth-code-point: 3.0.0
      strip-ansi: 6.0.1
  strip-ansi@6.0.1:
    dependencies:
      ansi-regex: 5.0.1
  supports-color@7.2.0:
    dependencies:
      has-flag: 4.0.0
  supports-color@8.1.1:
    dependencies:
      has-flag: 4.0.0
  tree-kill@1.2.2: {}
  tslib@2.8.1: {}
  turbo@2.9.14:
    optionalDependencies:
      '@turbo/darwin-64': 2.9.14
@@ -1277,3 +1449,23 @@ snapshots:
      '@vue/shared': 3.5.34
    optionalDependencies:
      typescript: 5.9.3
  wrap-ansi@7.0.0:
    dependencies:
      ansi-styles: 4.3.0
      string-width: 4.2.3
      strip-ansi: 6.0.1
  y18n@5.0.8: {}
  yargs-parser@21.1.1: {}
  yargs@17.7.2:
    dependencies:
      cliui: 8.0.1
      escalade: 3.2.0
      get-caller-file: 2.0.5
      require-directory: 2.1.1
      string-width: 4.2.3
      y18n: 5.0.8
      yargs-parser: 21.1.1
--- a/scripts/dev.sh
+++ b/scripts/dev.sh
@@ -4,18 +4,18 @@
 echo "🚀 启动开发服务器..."
-# 使用 concurrently 同时启动前后端
+# 进入项目根目录
 cd "$(dirname "$0")/.."
-# 检查 concurrently
+# 确保 concurrently 可用（已在 package.json 的 devDependencies 中）
-if ! command -v concurrently &> /dev/null; then
+if ! pnpm exec concurrently --help &> /dev/null; then
-    echo "📦 安装 concurrently..."
+    echo "⚠️  concurrently 不可用，跳过安装（应该已经在 devDependencies 中）"
    pnpm add -D concurrently
 fi
 # 启动前后端
-pnpm concurrently \
+# 使用 turbo 的 dev 任务，它会自动调用各个包的 dev 脚本
 pnpm exec concurrently \
    --names "frontend,backend" \
    --prefix-colors "blue,green" \
    "cd apps/web && pnpm dev" \
-    "cd apps/server && source venv/bin/activate && python main.py"
+    "cd apps/server && pnpm dev"
--- a/setup-paddlepaddle.sh
+++ b/setup-paddlepaddle.sh
@@ -0,0 +1,67 @@
 #!/bin/bash
 set -e
 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
 PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
 PADDLE_DIR="${PROJECT_ROOT}/third-party/paddle-inference"
 SERVER_DIR="${SCRIPT_DIR}/apps/server"
 echo "🚀 PaddlePaddle 环境设置脚本"
 echo "================================"
 echo "项目根目录: $PROJECT_ROOT"
 echo "PaddlePaddle 目录: $PADDLE_DIR"
 echo "服务器目录: $SERVER_DIR"
 if [ -d "$PADDLE_DIR" ]; then
    echo "✅ PaddlePaddle 目录已存在: $PADDLE_DIR"
    if [ -d "$SERVER_DIR/venv" ]; then
        echo "✅ 服务器虚拟环境已找到"
        echo ""
        echo "📋 环境信息:"
        echo "  PaddlePaddle 目录: $PADDLE_DIR"
        echo "  服务器虚拟环境: $SERVER_DIR/venv"
        echo ""
        echo "✅ PaddlePaddle 环境配置完成!"
        echo ""
        echo "📝 使用说明:"
        echo "  1. 确保 paddle_detection_service.py 中的路径配置正确"
        echo "  2. 运行 'sh scripts/dev.sh' 启动开发服务器"
        echo "  3. 或运行 'pnpm dev' 启动整个项目"
        echo ""
        echo "💡 说明: PaddlePaddle 依赖已安装在服务器虚拟环境中"
        echo "💡 PaddlePaddle 推理代码和模型已集成在 third-party 目录中"
        exit 0
    else
        echo "❌ 服务器虚拟环境未找到，需要先设置服务器环境"
        echo ""
        echo "📝 首先运行服务器设置:"
        echo "  cd $SERVER_DIR"
        echo "  python3 -m venv venv"
        echo "  source venv/bin/activate"
        echo "  pip install -r requirements.txt"
        echo "  pip install paddlepaddle==3.0.0"
        echo "  pip install 'numpy==1.26.4' 'opencv-python==4.7.0.72'"
        echo "  pip install imgaug==0.4.0"
        exit 1
    fi
 fi
 echo "❌ PaddlePaddle 目录不存在"
 echo ""
 echo "📝 首次设置步骤:"
 echo "  PaddlePaddle 推理代码和模型已集成在项目中"
 echo "  如需更新或重新部署 PaddlePaddle，请手动操作："
 echo "  1. 从 PaddlePaddle 官方下载 PaddleDetection release-2.9"
 echo "  2. 复制必要的文件到: $PADDLE_DIR"
 echo "     - deploy/python/*"
 echo "     - output_inference/"
 echo ""
 echo "🔗 下载链接:"
 echo "  https://github.com/PaddlePaddle/PaddleDetection/releases/tag/release%2F2.9"
 echo ""
 echo "💡 注意: PaddlePaddle 依赖将安装在服务器虚拟环境中"
--- a/third-party/README.md
+++ b/third-party/README.md
@@ -0,0 +1,272 @@
 # Third-Party Components
 此目录包含项目所需的第三方依赖库和组件。
 ## 目录结构
 ```
 third-party/
 └── paddle-inference/              # PaddleDetection 推理组件库
    ├── infer.py                   # PaddleDetection 推理引擎
    ├── preprocess.py              # 图像预处理
    ├── utils.py                  # 工具函数
    ├── visualize.py              # 结果可视化
    └── output_inference/         # 模型文件目录（空，已移到 models/）
 models/                           # 统一的模型文件目录
 ├── smoking_detection/            # YOLOv8 抽烟检测
 ├── smoking_detection_paddle/      # PaddlePaddle PP-YOLOE-s 抽烟检测
 ├── fire_detection/              # YOLOv10 火灾检测
 ├── helmet_detection/           # YOLOv8 安全帽检测
 ├── crowd_detection/            # YOLOv8 人群检测
 └── loitering_detection/        # YOLOv8 徘徊检测
 ```
 ## PaddlePaddle 推理组件
 ### 用途
 - 提供 PaddlePaddle 模型的推理功能
 - 支持 PP-YOLOE+ 模型格式
 - 提供预处理、可视化等工具
 ### 依赖安装
 在服务器虚拟环境中安装以下依赖：
 ```bash
 # 进入服务器目录
 cd apps/server
 # 激活虚拟环境
 source venv/bin/activate
 # 安装 PaddlePaddle 和依赖
 pip install paddlepaddle==3.0.0
 pip install 'numpy==1.26.4' 'opencv-python==4.7.0.72'
 pip install imgaug==0.4.0
 ```
 ## 模型管理
 ### 统一管理
 所有模型文件统一存储在 `models/` 目录：
 **YOLO 模型：**
 - `smoking_detection/` - YOLOv8 抽烟检测
 - `fire_detection/` - YOLOv10 火灾检测
 - `helmet_detection/` - YOLOv8 安全帽检测
 - `crowd_detection/` - YOLOv8 人群检测
 - `loitering_detection/` - YOLOv8 徘徊检测
 **PaddlePaddle 模型：**
 - `smoking_detection_paddle/` - PP-YOLOE-s 抽烟检测
 ### 模型文件格式
 **YOLO 模型：**
 ```
 smoking_detection/
 └── yolov8n.pt                 # YOLO 模型文件
 ```
 **PaddlePaddle 模型：**
 ```
 smoking_detection_paddle/
 ├── model.pdmodel              # 模型结构
 ├── model.pdiparams            # 模型参数
 └── infer_cfg.yml             # 推理配置
 ```
 ## 使用方式
 ### YOLO 模型
 ```python
 from services.detection_service import DetectionService
 # 自动加载 models/ 目录中的 YOLO 模型
 ```
 ### PaddlePaddle 模型
 ```python
 from services.paddle_detection_service import SmokingDetectionModel
 # 自动加载 models/smoking_detection_paddle/ 目录
 ```
 ## 性能优化
 ### Apple Silicon 优化
 - 本地部署相比 Docker 性能提升 30 倍
 - 推理时间：3-4秒 → 0.123秒
 - 内存占用：~3GB → ~0.5GB
 ### 环境变量
 必须在 Python 进程启动前设置：
 ```bash
 export FLAGS_enable_pir_api=0
 ```
 ## 更新和维护
 ### 模型更新
 要更新模型，将新文件复制到对应的 `models/` 子目录：
 ```
 models/smoking_detection/          # YOLO 模型
 models/smoking_detection_paddle/  # PaddlePaddle 模型
 ```
 ### 推理代码更新
 如需更新 PaddleDetection 推理代码，从官方仓库复制：
 ```
 PaddleDetection-release-2.9/deploy/python/* → third-party/paddle-inference/
 ```
 **安全更新流程：**
 ```bash
 # 1. 下载新版代码
 cd /tmp
 git clone -b release/2.9 https://github.com/PaddlePaddle/PaddleDetection.git
 # 2. 备份现有代码
 cd ../../jc-video-recognize
 cp -r third-party/paddle-inference third-party/paddle-inference.backup
 # 3. 更新推理代码
 cp -r /tmp/PaddleDetection-release-2.9/deploy/python/* third-party/paddle-inference/
 # 4. 测试验证
 cd apps/server
 ./start_server_with_env.sh
 # 5. 如果失败，恢复备份
 # rm -rf third-party/paddle-inference
 # mv third-party/paddle-inference.backup third-party/paddle-inference
 ```
 ## 故障排查
 ### 常见问题
 **1. 模型加载失败**
 ```bash
 # 检查模型文件完整性
 ls -la ../models/smoking_detection_paddle/
 # 应该包含以下文件：
 model.pdmodel      # 模型结构（约1MB）
 model.pdiparams    # 模型参数（约30MB）
 infer_cfg.yml      # 推理配置（约1KB）
 # 检查文件权限
 chmod 644 ../models/smoking_detection_paddle/*
 ```
 **2. PaddlePaddle 导入失败**
 ```bash
 # 检查 PaddlePaddle 安装
 source ../apps/server/venv/bin/activate
 pip list | grep paddle
 # 检查环境变量
 echo $FLAGS_enable_pir_api
 # 应该输出：0
 # 重新安装 PaddlePaddle
 pip install paddlepaddle==3.0.0 --force-reinstall
 ```
 **3. 推理速度慢**
 ```bash
 # 检查 CPU 使用情况
 top -p $(pgrep -f python)
 # 检查内存使用情况
 free -h
 # 性能优化建议：
 # 1. 首次加载2秒是正常的（模型加载）
 # 2. 后续推理0.2秒是优秀的
 # 3. 如果推理时间 > 1秒，考虑优化模型大小
 ```
 ### 性能监控
 **实时推理时间监控**
 ```bash
 # 查看推理日志
 tail -f ../apps/server/logs/*.log | grep "推理时间"
 ```
 **系统资源监控**
 ```bash
 # CPU 使用率
 mpstat 1
 # 内存使用情况
 free -m -s 1
 # 磁盘 I/O
 iostat -x 1
 ```
 ## 协作指南
 ### 新成员上手流程
 1. **克隆项目**
   ```bash
   git clone <repository-url>
   cd jc-video-recognize
   ```
 2. **安装主项目依赖**
   ```bash
   pnpm install
   cd apps/server
   python3 -m venv venv
   source venv/bin/activate
   pip install -r requirements.txt
   cd ../..
   ```
 3. **安装 PaddlePaddle 环境**
   ```bash
   bash scripts/setup-paddlepaddle.sh
   ```
 4. **验证安装**
   ```bash
   pnpm dev
   # 检查日志确认 PaddlePaddle 模型加载成功
   ```
 ### 版本管理策略
 **Git 版本控制：**
 - ✅ **包含**：源代码文件
 - ❌ **排除**：模型文件（.gitignore）
 - ❌ **排除**：第三方库（.gitignore）
 - ❌ **排除**：虚拟环境（.gitignore）
 **模型文件管理：**
 - 使用独立存储服务（如 S3、MinIO）
 - 在配置文件中记录模型版本
 - 定期备份训练好的模型
 ### 性能基准
 **标准性能指标：**
 - 首次加载：< 3秒
 - 后续推理：< 0.5秒
 - 内存占用：< 1GB
 - CPU 使用率：< 80%
 **性能测试方法：**
 ```bash
 # 使用测试图像进行基准测试
 curl -X POST "http://localhost:8000/api/detect" \
  -F "image=@test_image.jpg" \
  -F "model=smoking_detection_paddle"
 ```
 ## 来源
 PaddleDetection 官方仓库: https://github.com/PaddlePaddle/PaddleDetection
 当前版本: release-2.9
--- a/third-party/paddle-inference/README.md
+++ b/third-party/paddle-inference/README.md
@@ -0,0 +1,104 @@
 # Python端预测部署
 在PaddlePaddle中预测引擎和训练引擎底层有着不同的优化方法, 预测引擎使用了AnalysisPredictor，专门针对推理进行了优化，是基于[C++预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html)的Python接口，该引擎可以对模型进行多项图优化，减少不必要的内存拷贝。如果用户在部署已训练模型的过程中对性能有较高的要求，我们提供了独立于PaddleDetection的预测脚本，方便用户直接集成部署。
 Python端预测部署主要包含两个步骤：
 - 导出预测模型
 - 基于Python进行预测
 ## 1. 导出预测模型
 PaddleDetection在训练过程包括网络的前向和优化器相关参数，而在部署过程中，我们只需要前向参数，具体参考:[导出模型](../EXPORT_MODEL.md)，例如
 ```bash
 # 导出YOLOv3检测模型
 python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \
 -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams
 # 导出HigherHRNet(bottom-up)关键点检测模型
 python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams
 # 导出HRNet(top-down)关键点检测模型
 python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_384x288.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams
 # 导出FairMOT多目标跟踪模型
 python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
 # 导出ByteTrack多目标跟踪模型(相当于只导出检测器)
 python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
 ```
 导出后目录下，包括`infer_cfg.yml`, `model.pdiparams`,  `model.pdiparams.info`, `model.pdmodel`四个文件。
 ## 2. 基于Python的预测
 ### 2.1 通用检测
 在终端输入以下命令进行预测：
 ```bash
 python deploy/python/infer.py --model_dir=./output_inference/yolov3_darknet53_270e_coco --image_file=./demo/000000014439.jpg --device=GPU
 ```
 ### 2.2 关键点检测
 在终端输入以下命令进行预测：
 ```bash
 # keypoint top-down(HRNet)/bottom-up(HigherHRNet)单独推理，该模式下top-down模型HRNet只支持单人截图预测
 python deploy/python/keypoint_infer.py --model_dir=output_inference/hrnet_w32_384x288/ --image_file=./demo/hrnet_demo.jpg --device=GPU --threshold=0.5
 python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=GPU --threshold=0.5
 # detector 检测 + keypoint top-down模型联合部署（联合推理只支持top-down关键点模型）
 python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/yolov3_darknet53_270e_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file={your video name}.mp4  --device=GPU
 ```
 **注意:**
 - 关键点检测模型导出和预测具体可参照[keypoint](../../configs/keypoint/README.md)，可分别在各个模型的文档中查找具体用法；
 - 此目录下的关键点检测部署为基础前向功能，更多关键点检测功能可使用PP-Human项目，参照[pipeline](../pipeline/README.md)；
 ### 2.3 多目标跟踪
 在终端输入以下命令进行预测：
 ```bash
 # FairMOT跟踪
 python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU
 # ByteTrack跟踪
 python deploy/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/python/tracker_config.yml --video_file={your video name}.mp4 --device=GPU --scaled=True
 # FairMOT多目标跟踪联合HRNet关键点检测（联合推理只支持top-down关键点模型）
 python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file={your video name}.mp4 --device=GPU
 ```
 **注意:**
 - 多目标跟踪模型导出和预测具体可参照[mot]](../../configs/mot/README.md)，可分别在各个模型的文档中查找具体用法；
 - 此目录下的跟踪部署为基础前向功能以及联合关键点部署，更多跟踪功能可使用PP-Human项目，参照[pipeline](../pipeline/README.md)，或PP-Tracking项目(绘制轨迹、出入口流量计数)，参照[pptracking](../pptracking/README.md)；
 参数说明如下:
 | 参数 | 是否必须| 含义                                                                                          |
 |-------|-------|---------------------------------------------------------------------------------------------|
 | --model_dir | Yes| 上述导出的模型路径                                                                                   |
 | --image_file | Option | 需要预测的图片                                                                                     |
 | --image_dir  | Option | 要预测的图片文件夹路径                                                                                 |
 | --video_file | Option | 需要预测的视频                                                                                     |
 | --camera_id | Option | 用来预测的摄像头ID，默认为-1(表示不使用摄像头预测，可设置为：0 - (摄像头数目-1) )，预测过程中在可视化界面按`q`退出输出预测结果到：output/output.mp4 |
 | --device | Option | 运行时的设备，可选择`CPU/GPU/XPU`，默认为`CPU`                                                            |
 | --run_mode | Option | 使用GPU时，默认为paddle, 可选（paddle/trt_fp32/trt_fp16/trt_int8）                                     |
 | --batch_size | Option | 预测时的batch size，在指定`image_dir`时有效，默认为1                                                       |
 | --threshold | Option| 预测得分的阈值，默认为0.5                                                                              |
 | --output_dir | Option| 可视化结果保存的根目录，默认为output/                                                                      |
 | --run_benchmark | Option| 是否运行benchmark，同时需指定`--image_file`或`--image_dir`，默认为False                                    |
 | --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速，默认为False                                                                 |
 | --cpu_threads | Option| 设置cpu线程数，默认为1                                                                               |
 | --trt_calib_mode | Option| TensorRT是否使用校准功能，默认为False。使用TensorRT的int8功能时，需设置为True，使用PaddleSlim量化后的模型时需要设置为False         |
 | --save_images | Option| 是否保存可视化结果                                                                                   |
 | --save_results | Option| 是否在文件夹下将图片的预测结果以JSON的形式保存                                                                   |
 说明：
 - 参数优先级顺序：`camera_id` > `video_file` > `image_dir` > `image_file`。
 - run_mode：paddle代表使用AnalysisPredictor，精度float32来推理，其他参数指用AnalysisPredictor，TensorRT不同精度来推理。
 - 如果安装的PaddlePaddle不支持基于TensorRT进行预测，需要自行编译，详细可参考[预测库编译教程](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html)。
 - --run_benchmark如果设置为True，则需要安装依赖`pip install pynvml psutil GPUtil`。
 - 如果需要使用导出模型在coco数据集上进行评估，请在推理时添加`--save_results`和`--use_coco_category`参数用以保存coco评估所需要的json文件
--- a/third-party/paddle-inference/benchmark_utils.py
+++ b/third-party/paddle-inference/benchmark_utils.py
@@ -0,0 +1,289 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import logging
 import paddle
 import paddle.inference as paddle_infer
 from pathlib import Path
 CUR_DIR = os.path.dirname(os.path.abspath(__file__))
 LOG_PATH_ROOT = f"{CUR_DIR}/../../output"
 class PaddleInferBenchmark(object):
    def __init__(self,
                 config,
                 model_info: dict={},
                 data_info: dict={},
                 perf_info: dict={},
                 resource_info: dict={},
                 **kwargs):
        """
        Construct PaddleInferBenchmark Class to format logs.
        args:
            config(paddle.inference.Config): paddle inference config
            model_info(dict): basic model info
                {'model_name': 'resnet50'
                 'precision': 'fp32'}
            data_info(dict): input data info
                {'batch_size': 1
                 'shape': '3,224,224'
                 'data_num': 1000}
            perf_info(dict): performance result
                {'preprocess_time_s': 1.0
                'inference_time_s': 2.0
                'postprocess_time_s': 1.0
                'total_time_s': 4.0}
            resource_info(dict): 
                cpu and gpu resources
                {'cpu_rss': 100
                 'gpu_rss': 100
                 'gpu_util': 60}
        """
        # PaddleInferBenchmark Log Version
        self.log_version = "1.0.3"
        # Paddle Version
        self.paddle_version = paddle.__version__
        self.paddle_commit = paddle.__git_commit__
        paddle_infer_info = paddle_infer.get_version()
        self.paddle_branch = paddle_infer_info.strip().split(': ')[-1]
        # model info
        self.model_info = model_info
        # data info
        self.data_info = data_info
        # perf info
        self.perf_info = perf_info
        try:
            # required value
            self.model_name = model_info['model_name']
            self.precision = model_info['precision']
            self.batch_size = data_info['batch_size']
            self.shape = data_info['shape']
            self.data_num = data_info['data_num']
            self.inference_time_s = round(perf_info['inference_time_s'], 4)
        except:
            self.print_help()
            raise ValueError(
                "Set argument wrong, please check input argument and its type")
        self.preprocess_time_s = perf_info.get('preprocess_time_s', 0)
        self.postprocess_time_s = perf_info.get('postprocess_time_s', 0)
        self.with_tracker = True if 'tracking_time_s' in perf_info else False
        self.tracking_time_s = perf_info.get('tracking_time_s', 0)
        self.total_time_s = perf_info.get('total_time_s', 0)
        self.inference_time_s_90 = perf_info.get("inference_time_s_90", "")
        self.inference_time_s_99 = perf_info.get("inference_time_s_99", "")
        self.succ_rate = perf_info.get("succ_rate", "")
        self.qps = perf_info.get("qps", "")
        # conf info
        self.config_status = self.parse_config(config)
        # mem info
        if isinstance(resource_info, dict):
            self.cpu_rss_mb = int(resource_info.get('cpu_rss_mb', 0))
            self.cpu_vms_mb = int(resource_info.get('cpu_vms_mb', 0))
            self.cpu_shared_mb = int(resource_info.get('cpu_shared_mb', 0))
            self.cpu_dirty_mb = int(resource_info.get('cpu_dirty_mb', 0))
            self.cpu_util = round(resource_info.get('cpu_util', 0), 2)
            self.gpu_rss_mb = int(resource_info.get('gpu_rss_mb', 0))
            self.gpu_util = round(resource_info.get('gpu_util', 0), 2)
            self.gpu_mem_util = round(resource_info.get('gpu_mem_util', 0), 2)
        else:
            self.cpu_rss_mb = 0
            self.cpu_vms_mb = 0
            self.cpu_shared_mb = 0
            self.cpu_dirty_mb = 0
            self.cpu_util = 0
            self.gpu_rss_mb = 0
            self.gpu_util = 0
            self.gpu_mem_util = 0
        # init benchmark logger
        self.benchmark_logger()
    def benchmark_logger(self):
        """
        benchmark logger
        """
        # remove other logging handler
        for handler in logging.root.handlers[:]:
            logging.root.removeHandler(handler)
        # Init logger
        FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
        log_output = f"{LOG_PATH_ROOT}/{self.model_name}.log"
        Path(f"{LOG_PATH_ROOT}").mkdir(parents=True, exist_ok=True)
        logging.basicConfig(
            level=logging.INFO,
            format=FORMAT,
            handlers=[
                logging.FileHandler(
                    filename=log_output, mode='w'),
                logging.StreamHandler(),
            ])
        self.logger = logging.getLogger(__name__)
        self.logger.info(
            f"Paddle Inference benchmark log will be saved to {log_output}")
    def parse_config(self, config) -> dict:
        """
        parse paddle predictor config
        args:
            config(paddle.inference.Config): paddle inference config
        return:
            config_status(dict): dict style config info
        """
        if isinstance(config, paddle_infer.Config):
            config_status = {}
            config_status['runtime_device'] = "gpu" if config.use_gpu(
            ) else "cpu"
            config_status['ir_optim'] = config.ir_optim()
            config_status['enable_tensorrt'] = config.tensorrt_engine_enabled()
            config_status['precision'] = self.precision
            config_status['enable_mkldnn'] = config.mkldnn_enabled()
            config_status[
                'cpu_math_library_num_threads'] = config.cpu_math_library_num_threads(
                )
        elif isinstance(config, dict):
            config_status['runtime_device'] = config.get('runtime_device', "")
            config_status['ir_optim'] = config.get('ir_optim', "")
            config_status['enable_tensorrt'] = config.get('enable_tensorrt', "")
            config_status['precision'] = config.get('precision', "")
            config_status['enable_mkldnn'] = config.get('enable_mkldnn', "")
            config_status['cpu_math_library_num_threads'] = config.get(
                'cpu_math_library_num_threads', "")
        else:
            self.print_help()
            raise ValueError(
                "Set argument config wrong, please check input argument and its type"
            )
        return config_status
    def report(self, identifier=None):
        """
        print log report
        args:
            identifier(string): identify log
        """
        if identifier:
            identifier = f"[{identifier}]"
        else:
            identifier = ""
        self.logger.info("\n")
        self.logger.info(
            "---------------------- Paddle info ----------------------")
        self.logger.info(f"{identifier} paddle_version: {self.paddle_version}")
        self.logger.info(f"{identifier} paddle_commit: {self.paddle_commit}")
        self.logger.info(f"{identifier} paddle_branch: {self.paddle_branch}")
        self.logger.info(f"{identifier} log_api_version: {self.log_version}")
        self.logger.info(
            "----------------------- Conf info -----------------------")
        self.logger.info(
            f"{identifier} runtime_device: {self.config_status['runtime_device']}"
        )
        self.logger.info(
            f"{identifier} ir_optim: {self.config_status['ir_optim']}")
        self.logger.info(f"{identifier} enable_memory_optim: {True}")
        self.logger.info(
            f"{identifier} enable_tensorrt: {self.config_status['enable_tensorrt']}"
        )
        self.logger.info(
            f"{identifier} enable_mkldnn: {self.config_status['enable_mkldnn']}")
        self.logger.info(
            f"{identifier} cpu_math_library_num_threads: {self.config_status['cpu_math_library_num_threads']}"
        )
        self.logger.info(
            "----------------------- Model info ----------------------")
        self.logger.info(f"{identifier} model_name: {self.model_name}")
        self.logger.info(f"{identifier} precision: {self.precision}")
        self.logger.info(
            "----------------------- Data info -----------------------")
        self.logger.info(f"{identifier} batch_size: {self.batch_size}")
        self.logger.info(f"{identifier} input_shape: {self.shape}")
        self.logger.info(f"{identifier} data_num: {self.data_num}")
        self.logger.info(
            "----------------------- Perf info -----------------------")
        self.logger.info(
            f"{identifier} cpu_rss(MB): {self.cpu_rss_mb}, cpu_vms: {self.cpu_vms_mb}, cpu_shared_mb: {self.cpu_shared_mb}, cpu_dirty_mb: {self.cpu_dirty_mb}, cpu_util: {self.cpu_util}%"
        )
        self.logger.info(
            f"{identifier} gpu_rss(MB): {self.gpu_rss_mb}, gpu_util: {self.gpu_util}%, gpu_mem_util: {self.gpu_mem_util}%"
        )
        self.logger.info(
            f"{identifier} total time spent(s): {self.total_time_s}")
        if self.with_tracker:
            self.logger.info(
                f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, "
                f"inference_time(ms): {round(self.inference_time_s*1000, 1)}, "
                f"postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}, "
                f"tracking_time(ms): {round(self.tracking_time_s*1000, 1)}")
        else:
            self.logger.info(
                f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, "
                f"inference_time(ms): {round(self.inference_time_s*1000, 1)}, "
                f"postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}"
            )
        if self.inference_time_s_90:
            self.looger.info(
                f"{identifier} 90%_cost: {self.inference_time_s_90}, 99%_cost: {self.inference_time_s_99}, succ_rate: {self.succ_rate}"
            )
        if self.qps:
            self.logger.info(f"{identifier} QPS: {self.qps}")
    def print_help(self):
        """
        print function help
        """
        print("""Usage: 
            ==== Print inference benchmark logs. ====
            config = paddle.inference.Config()
            model_info = {'model_name': 'resnet50'
                          'precision': 'fp32'}
            data_info = {'batch_size': 1
                         'shape': '3,224,224'
                         'data_num': 1000}
            perf_info = {'preprocess_time_s': 1.0
                         'inference_time_s': 2.0
                         'postprocess_time_s': 1.0
                         'total_time_s': 4.0}
            resource_info = {'cpu_rss_mb': 100
                             'gpu_rss_mb': 100
                             'gpu_util': 60}
            log = PaddleInferBenchmark(config, model_info, data_info, perf_info, resource_info)
            log('Test')
            """)
    def __call__(self, identifier=None):
        """
        __call__
        args:
            identifier(string): identify log
        """
        self.report(identifier)
--- a/third-party/paddle-inference/clrnet_postprocess.py
+++ b/third-party/paddle-inference/clrnet_postprocess.py
@@ -0,0 +1,262 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import numpy as np
 import paddle
 import paddle.nn as nn
 from scipy.special import softmax
 from scipy.interpolate import InterpolatedUnivariateSpline
 def line_iou(pred, target, img_w, length=15, aligned=True):
    '''
    Calculate the line iou value between predictions and targets
    Args:
        pred: lane predictions, shape: (num_pred, 72)
        target: ground truth, shape: (num_target, 72)
        img_w: image width
        length: extended radius
        aligned: True for iou loss calculation, False for pair-wise ious in assign
    '''
    px1 = pred - length
    px2 = pred + length
    tx1 = target - length
    tx2 = target + length
    if aligned:
        invalid_mask = target
        ovr = paddle.minimum(px2, tx2) - paddle.maximum(px1, tx1)
        union = paddle.maximum(px2, tx2) - paddle.minimum(px1, tx1)
    else:
        num_pred = pred.shape[0]
        invalid_mask = target.tile([num_pred, 1, 1])
        ovr = (paddle.minimum(px2[:, None, :], tx2[None, ...]) - paddle.maximum(
            px1[:, None, :], tx1[None, ...]))
        union = (paddle.maximum(px2[:, None, :], tx2[None, ...]) -
                 paddle.minimum(px1[:, None, :], tx1[None, ...]))
    invalid_masks = (invalid_mask < 0) | (invalid_mask >= img_w)
    ovr[invalid_masks] = 0.
    union[invalid_masks] = 0.
    iou = ovr.sum(axis=-1) / (union.sum(axis=-1) + 1e-9)
    return iou
 class Lane:
    def __init__(self, points=None, invalid_value=-2., metadata=None):
        super(Lane, self).__init__()
        self.curr_iter = 0
        self.points = points
        self.invalid_value = invalid_value
        self.function = InterpolatedUnivariateSpline(
            points[:, 1], points[:, 0], k=min(3, len(points) - 1))
        self.min_y = points[:, 1].min() - 0.01
        self.max_y = points[:, 1].max() + 0.01
        self.metadata = metadata or {}
    def __repr__(self):
        return '[Lane]\n' + str(self.points) + '\n[/Lane]'
    def __call__(self, lane_ys):
        lane_xs = self.function(lane_ys)
        lane_xs[(lane_ys < self.min_y) | (lane_ys > self.max_y
                                          )] = self.invalid_value
        return lane_xs
    def to_array(self, sample_y_range, img_w, img_h):
        self.sample_y = range(sample_y_range[0], sample_y_range[1],
                              sample_y_range[2])
        sample_y = self.sample_y
        img_w, img_h = img_w, img_h
        ys = np.array(sample_y) / float(img_h)
        xs = self(ys)
        valid_mask = (xs >= 0) & (xs < 1)
        lane_xs = xs[valid_mask] * img_w
        lane_ys = ys[valid_mask] * img_h
        lane = np.concatenate(
            (lane_xs.reshape(-1, 1), lane_ys.reshape(-1, 1)), axis=1)
        return lane
    def __iter__(self):
        return self
    def __next__(self):
        if self.curr_iter < len(self.points):
            self.curr_iter += 1
            return self.points[self.curr_iter - 1]
        self.curr_iter = 0
        raise StopIteration
 class CLRNetPostProcess(object):
    """
    Args:
        input_shape (int): network input image size
        ori_shape (int): ori image shape of before padding
        scale_factor (float): scale factor of ori image
        enable_mkldnn (bool): whether to open MKLDNN
    """
    def __init__(self, img_w, ori_img_h, cut_height, conf_threshold, nms_thres,
                 max_lanes, num_points):
        self.img_w = img_w
        self.conf_threshold = conf_threshold
        self.nms_thres = nms_thres
        self.max_lanes = max_lanes
        self.num_points = num_points
        self.n_strips = num_points - 1
        self.n_offsets = num_points
        self.ori_img_h = ori_img_h
        self.cut_height = cut_height
        self.prior_ys = paddle.linspace(
            start=1, stop=0, num=self.n_offsets).astype('float64')
    def predictions_to_pred(self, predictions):
        """
        Convert predictions to internal Lane structure for evaluation.
        """
        lanes = []
        for lane in predictions:
            lane_xs = lane[6:].clone()
            start = min(
                max(0, int(round(lane[2].item() * self.n_strips))),
                self.n_strips)
            length = int(round(lane[5].item()))
            end = start + length - 1
            end = min(end, len(self.prior_ys) - 1)
            if start > 0:
                mask = ((lane_xs[:start] >= 0.) &
                        (lane_xs[:start] <= 1.)).cpu().detach().numpy()[::-1]
                mask = ~((mask.cumprod()[::-1]).astype(np.bool_))
                lane_xs[:start][mask] = -2
            if end < len(self.prior_ys) - 1:
                lane_xs[end + 1:] = -2
            lane_ys = self.prior_ys[lane_xs >= 0].clone()
            lane_xs = lane_xs[lane_xs >= 0]
            lane_xs = lane_xs.flip(axis=0).astype('float64')
            lane_ys = lane_ys.flip(axis=0)
            lane_ys = (lane_ys *
                       (self.ori_img_h - self.cut_height) + self.cut_height
                       ) / self.ori_img_h
            if len(lane_xs) <= 1:
                continue
            points = paddle.stack(
                x=(lane_xs.reshape([-1, 1]), lane_ys.reshape([-1, 1])),
                axis=1).squeeze(axis=2)
            lane = Lane(
                points=points.cpu().numpy(),
                metadata={
                    'start_x': lane[3],
                    'start_y': lane[2],
                    'conf': lane[1]
                })
            lanes.append(lane)
        return lanes
    def lane_nms(self, predictions, scores, nms_overlap_thresh, top_k):
        """
        NMS for lane detection.
        predictions: paddle.Tensor [num_lanes,conf,y,x,lenght,72offsets] [12,77]
        scores: paddle.Tensor [num_lanes]
        nms_overlap_thresh: float
        top_k: int
        """
        # sort by scores to get idx
        idx = scores.argsort(descending=True)
        keep = []
        condidates = predictions.clone()
        condidates = condidates.index_select(idx)
        while len(condidates) > 0:
            keep.append(idx[0])
            if len(keep) >= top_k or len(condidates) == 1:
                break
            ious = []
            for i in range(1, len(condidates)):
                ious.append(1 - line_iou(
                    condidates[i].unsqueeze(0),
                    condidates[0].unsqueeze(0),
                    img_w=self.img_w,
                    length=15))
            ious = paddle.to_tensor(ious)
            mask = ious <= nms_overlap_thresh
            id = paddle.where(mask == False)[0]
            if id.shape[0] == 0:
                break
            condidates = condidates[1:].index_select(id)
            idx = idx[1:].index_select(id)
        keep = paddle.stack(keep)
        return keep
    def get_lanes(self, output, as_lanes=True):
        """
        Convert model output to lanes.
        """
        softmax = nn.Softmax(axis=1)
        decoded = []
        for predictions in output:
            if len(predictions) == 0:
                decoded.append([])
                continue
            threshold = self.conf_threshold
            scores = softmax(predictions[:, :2])[:, 1]
            keep_inds = scores >= threshold
            predictions = predictions[keep_inds]
            scores = scores[keep_inds]
            if predictions.shape[0] == 0:
                decoded.append([])
                continue
            nms_predictions = predictions.detach().clone()
            nms_predictions = paddle.concat(
                x=[nms_predictions[..., :4], nms_predictions[..., 5:]], axis=-1)
            nms_predictions[..., 4] = nms_predictions[..., 4] * self.n_strips
            nms_predictions[..., 5:] = nms_predictions[..., 5:] * (
                self.img_w - 1)
            keep = self.lane_nms(
                nms_predictions[..., 5:],
                scores,
                nms_overlap_thresh=self.nms_thres,
                top_k=self.max_lanes)
            predictions = predictions.index_select(keep)
            if predictions.shape[0] == 0:
                decoded.append([])
                continue
            predictions[:, 5] = paddle.round(predictions[:, 5] * self.n_strips)
            if as_lanes:
                pred = self.predictions_to_pred(predictions)
            else:
                pred = predictions
            decoded.append(pred)
        return decoded
    def __call__(self, lanes_list):
        lanes = self.get_lanes(lanes_list)
        return lanes
--- a/third-party/paddle-inference/det_keypoint_unite_infer.py
+++ b/third-party/paddle-inference/det_keypoint_unite_infer.py
@@ -0,0 +1,374 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import json
 import cv2
 import math
 import numpy as np
 import paddle
 import yaml
 from det_keypoint_unite_utils import argsparser
 from preprocess import decode_image
 from infer import Detector, DetectorPicoDet, PredictConfig, print_arguments, get_test_images, bench_log
 from keypoint_infer import KeyPointDetector, PredictConfig_KeyPoint
 from visualize import visualize_pose
 from benchmark_utils import PaddleInferBenchmark
 from utils import get_current_memory_mb
 from keypoint_postprocess import translate_to_ori_images
 KEYPOINT_SUPPORT_MODELS = {
    'HigherHRNet': 'keypoint_bottomup',
    'HRNet': 'keypoint_topdown'
 }
 def predict_with_given_det(image, det_res, keypoint_detector,
                           keypoint_batch_size, run_benchmark):
    keypoint_res = {}
    rec_images, records, det_rects = keypoint_detector.get_person_from_rect(
        image, det_res)
    if len(det_rects) == 0:
        keypoint_res['keypoint'] = [[], []]
        return keypoint_res
    keypoint_vector = []
    score_vector = []
    rect_vector = det_rects
    keypoint_results = keypoint_detector.predict_image(
        rec_images, run_benchmark, repeats=10, visual=False)
    keypoint_vector, score_vector = translate_to_ori_images(keypoint_results,
                                                            np.array(records))
    keypoint_res['keypoint'] = [
        keypoint_vector.tolist(), score_vector.tolist()
    ] if len(keypoint_vector) > 0 else [[], []]
    keypoint_res['bbox'] = rect_vector
    return keypoint_res
 def topdown_unite_predict(detector,
                          topdown_keypoint_detector,
                          image_list,
                          keypoint_batch_size=1,
                          save_res=False):
    det_timer = detector.get_timer()
    store_res = []
    for i, img_file in enumerate(image_list):
        # Decode image in advance in det + pose prediction
        det_timer.preprocess_time_s.start()
        image, _ = decode_image(img_file, {})
        det_timer.preprocess_time_s.end()
        if FLAGS.run_benchmark:
            results = detector.predict_image(
                [image], run_benchmark=True, repeats=10)
            cm, gm, gu = get_current_memory_mb()
            detector.cpu_mem += cm
            detector.gpu_mem += gm
            detector.gpu_util += gu
        else:
            results = detector.predict_image([image], visual=False)
        results = detector.filter_box(results, FLAGS.det_threshold)
        if results['boxes_num'] > 0:
            keypoint_res = predict_with_given_det(
                image, results, topdown_keypoint_detector, keypoint_batch_size,
                FLAGS.run_benchmark)
            if save_res:
                save_name = img_file if isinstance(img_file, str) else i
                store_res.append([
                    save_name, keypoint_res['bbox'],
                    [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]]
                ])
        else:
            results["keypoint"] = [[], []]
            keypoint_res = results
        if FLAGS.run_benchmark:
            cm, gm, gu = get_current_memory_mb()
            topdown_keypoint_detector.cpu_mem += cm
            topdown_keypoint_detector.gpu_mem += gm
            topdown_keypoint_detector.gpu_util += gu
        else:
            if not os.path.exists(FLAGS.output_dir):
                os.makedirs(FLAGS.output_dir)
            visualize_pose(
                img_file,
                keypoint_res,
                visual_thresh=FLAGS.keypoint_threshold,
                save_dir=FLAGS.output_dir)
    if save_res:
        """
        1) store_res: a list of image_data
        2) image_data: [imageid, rects, [keypoints, scores]]
        3) rects: list of rect [xmin, ymin, xmax, ymax]
        4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list
        5) scores: mean of all joint conf
        """
        with open("det_keypoint_unite_image_results.json", 'w') as wf:
            json.dump(store_res, wf, indent=4)
 def topdown_unite_predict_video(detector,
                                topdown_keypoint_detector,
                                camera_id,
                                keypoint_batch_size=1,
                                save_res=False):
    video_name = 'output.mp4'
    if camera_id != -1:
        capture = cv2.VideoCapture(camera_id)
    else:
        capture = cv2.VideoCapture(FLAGS.video_file)
        video_name = os.path.split(FLAGS.video_file)[-1]
    # Get Video info : resolution, fps, frame count
    width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(capture.get(cv2.CAP_PROP_FPS))
    frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
    print("fps: %d, frame_count: %d" % (fps, frame_count))
    if not os.path.exists(FLAGS.output_dir):
        os.makedirs(FLAGS.output_dir)
    out_path = os.path.join(FLAGS.output_dir, video_name)
    fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
    writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
    index = 0
    store_res = []
    keypoint_smoothing = KeypointSmoothing(
        width, height, filter_type=FLAGS.filter_type, beta=0.05)
    while (1):
        ret, frame = capture.read()
        if not ret:
            break
        index += 1
        print('detect frame: %d' % (index))
        frame2 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        results = detector.predict_image([frame2], visual=False)
        results = detector.filter_box(results, FLAGS.det_threshold)
        if results['boxes_num'] == 0:
            writer.write(frame)
            continue
        keypoint_res = predict_with_given_det(
            frame2, results, topdown_keypoint_detector, keypoint_batch_size,
            FLAGS.run_benchmark)
        if FLAGS.smooth and len(keypoint_res['keypoint'][0]) == 1:
            current_keypoints = np.array(keypoint_res['keypoint'][0][0])
            smooth_keypoints = keypoint_smoothing.smooth_process(
                current_keypoints)
            keypoint_res['keypoint'][0][0] = smooth_keypoints.tolist()
        im = visualize_pose(
            frame,
            keypoint_res,
            visual_thresh=FLAGS.keypoint_threshold,
            returnimg=True)
        if save_res:
            store_res.append([
                index, keypoint_res['bbox'],
                [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]]
            ])
        writer.write(im)
        if camera_id != -1:
            cv2.imshow('Mask Detection', im)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
    writer.release()
    print('output_video saved to: {}'.format(out_path))
    if save_res:
        """
        1) store_res: a list of frame_data
        2) frame_data: [frameid, rects, [keypoints, scores]]
        3) rects: list of rect [xmin, ymin, xmax, ymax]
        4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list
        5) scores: mean of all joint conf
        """
        with open("det_keypoint_unite_video_results.json", 'w') as wf:
            json.dump(store_res, wf, indent=4)
 class KeypointSmoothing(object):
    # The following code are modified from:
    # https://github.com/jaantollander/OneEuroFilter
    def __init__(self,
                 width,
                 height,
                 filter_type,
                 alpha=0.5,
                 fc_d=0.1,
                 fc_min=0.1,
                 beta=0.1,
                 thres_mult=0.3):
        super(KeypointSmoothing, self).__init__()
        self.image_width = width
        self.image_height = height
        self.threshold = np.array([
            0.005, 0.005, 0.005, 0.005, 0.005, 0.01, 0.01, 0.01, 0.01, 0.01,
            0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01
        ]) * thres_mult
        self.filter_type = filter_type
        self.alpha = alpha
        self.dx_prev_hat = None
        self.x_prev_hat = None
        self.fc_d = fc_d
        self.fc_min = fc_min
        self.beta = beta
        if self.filter_type == 'OneEuro':
            self.smooth_func = self.one_euro_filter
        elif self.filter_type == 'EMA':
            self.smooth_func = self.ema_filter
        else:
            raise ValueError('filter type must be one_euro or ema')
    def smooth_process(self, current_keypoints):
        if self.x_prev_hat is None:
            self.x_prev_hat = current_keypoints[:, :2]
            self.dx_prev_hat = np.zeros(current_keypoints[:, :2].shape)
            return current_keypoints
        else:
            result = current_keypoints
            num_keypoints = len(current_keypoints)
            for i in range(num_keypoints):
                result[i, :2] = self.smooth(current_keypoints[i, :2],
                                            self.threshold[i], i)
            return result
    def smooth(self, current_keypoint, threshold, index):
        distance = np.sqrt(
            np.square((current_keypoint[0] - self.x_prev_hat[index][0]) /
                      self.image_width) + np.square((current_keypoint[
                          1] - self.x_prev_hat[index][1]) / self.image_height))
        if distance < threshold:
            result = self.x_prev_hat[index]
        else:
            result = self.smooth_func(current_keypoint, self.x_prev_hat[index],
                                      index)
        return result
    def one_euro_filter(self, x_cur, x_pre, index):
        te = 1
        self.alpha = self.smoothing_factor(te, self.fc_d)
        dx_cur = (x_cur - x_pre) / te
        dx_cur_hat = self.exponential_smoothing(dx_cur, self.dx_prev_hat[index])
        fc = self.fc_min + self.beta * np.abs(dx_cur_hat)
        self.alpha = self.smoothing_factor(te, fc)
        x_cur_hat = self.exponential_smoothing(x_cur, x_pre)
        self.dx_prev_hat[index] = dx_cur_hat
        self.x_prev_hat[index] = x_cur_hat
        return x_cur_hat
    def ema_filter(self, x_cur, x_pre, index):
        x_cur_hat = self.exponential_smoothing(x_cur, x_pre)
        self.x_prev_hat[index] = x_cur_hat
        return x_cur_hat
    def smoothing_factor(self, te, fc):
        r = 2 * math.pi * fc * te
        return r / (r + 1)
    def exponential_smoothing(self, x_cur, x_pre, index=0):
        return self.alpha * x_cur + (1 - self.alpha) * x_pre
 def main():
    deploy_file = os.path.join(FLAGS.det_model_dir, 'infer_cfg.yml')
    with open(deploy_file) as f:
        yml_conf = yaml.safe_load(f)
    arch = yml_conf['arch']
    detector_func = 'Detector'
    if arch == 'PicoDet':
        detector_func = 'DetectorPicoDet'
    detector = eval(detector_func)(FLAGS.det_model_dir,
                                   device=FLAGS.device,
                                   run_mode=FLAGS.run_mode,
                                   trt_min_shape=FLAGS.trt_min_shape,
                                   trt_max_shape=FLAGS.trt_max_shape,
                                   trt_opt_shape=FLAGS.trt_opt_shape,
                                   trt_calib_mode=FLAGS.trt_calib_mode,
                                   cpu_threads=FLAGS.cpu_threads,
                                   enable_mkldnn=FLAGS.enable_mkldnn,
                                   threshold=FLAGS.det_threshold)
    topdown_keypoint_detector = KeyPointDetector(
        FLAGS.keypoint_model_dir,
        device=FLAGS.device,
        run_mode=FLAGS.run_mode,
        batch_size=FLAGS.keypoint_batch_size,
        trt_min_shape=FLAGS.trt_min_shape,
        trt_max_shape=FLAGS.trt_max_shape,
        trt_opt_shape=FLAGS.trt_opt_shape,
        trt_calib_mode=FLAGS.trt_calib_mode,
        cpu_threads=FLAGS.cpu_threads,
        enable_mkldnn=FLAGS.enable_mkldnn,
        use_dark=FLAGS.use_dark)
    keypoint_arch = topdown_keypoint_detector.pred_config.arch
    assert KEYPOINT_SUPPORT_MODELS[
        keypoint_arch] == 'keypoint_topdown', 'Detection-Keypoint unite inference only supports topdown models.'
    # predict from video file or camera video stream
    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
        topdown_unite_predict_video(detector, topdown_keypoint_detector,
                                    FLAGS.camera_id, FLAGS.keypoint_batch_size,
                                    FLAGS.save_res)
    else:
        # predict from image
        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
        topdown_unite_predict(detector, topdown_keypoint_detector, img_list,
                              FLAGS.keypoint_batch_size, FLAGS.save_res)
        if not FLAGS.run_benchmark:
            detector.det_times.info(average=True)
            topdown_keypoint_detector.det_times.info(average=True)
        else:
            mode = FLAGS.run_mode
            det_model_dir = FLAGS.det_model_dir
            det_model_info = {
                'model_name': det_model_dir.strip('/').split('/')[-1],
                'precision': mode.split('_')[-1]
            }
            bench_log(detector, img_list, det_model_info, name='Det')
            keypoint_model_dir = FLAGS.keypoint_model_dir
            keypoint_model_info = {
                'model_name': keypoint_model_dir.strip('/').split('/')[-1],
                'precision': mode.split('_')[-1]
            }
            bench_log(topdown_keypoint_detector, img_list, keypoint_model_info,
                      FLAGS.keypoint_batch_size, 'KeyPoint')
 if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    FLAGS = parser.parse_args()
    print_arguments(FLAGS)
    FLAGS.device = FLAGS.device.upper()
    assert FLAGS.device in ['CPU', 'GPU', 'XPU'
                            ], "device should be CPU, GPU or XPU"
    main()
--- a/third-party/paddle-inference/det_keypoint_unite_utils.py
+++ b/third-party/paddle-inference/det_keypoint_unite_utils.py
@@ -0,0 +1,141 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import ast
 import argparse
 def argsparser():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument(
        "--det_model_dir",
        type=str,
        default=None,
        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
              "'infer_cfg.yml', created by tools/export_model.py."),
        required=True)
    parser.add_argument(
        "--keypoint_model_dir",
        type=str,
        default=None,
        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
              "'infer_cfg.yml', created by tools/export_model.py."),
        required=True)
    parser.add_argument(
        "--image_file", type=str, default=None, help="Path of image file.")
    parser.add_argument(
        "--image_dir",
        type=str,
        default=None,
        help="Dir of image file, `image_file` has a higher priority.")
    parser.add_argument(
        "--keypoint_batch_size",
        type=int,
        default=8,
        help=("batch_size for keypoint inference. In detection-keypoint unit"
              "inference, the batch size in detection is 1. Then collate det "
              "result in batch for keypoint inference."))
    parser.add_argument(
        "--video_file",
        type=str,
        default=None,
        help="Path of video file, `video_file` or `camera_id` has a highest priority."
    )
    parser.add_argument(
        "--camera_id",
        type=int,
        default=-1,
        help="device id of camera to predict.")
    parser.add_argument(
        "--det_threshold", type=float, default=0.5, help="Threshold of score.")
    parser.add_argument(
        "--keypoint_threshold",
        type=float,
        default=0.5,
        help="Threshold of score.")
    parser.add_argument(
        "--output_dir",
        type=str,
        default="output",
        help="Directory of output visualization files.")
    parser.add_argument(
        "--run_mode",
        type=str,
        default='paddle',
        help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
    parser.add_argument(
        "--device",
        type=str,
        default='cpu',
        help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU."
    )
    parser.add_argument(
        "--run_benchmark",
        type=ast.literal_eval,
        default=False,
        help="Whether to predict a image_file repeatedly for benchmark")
    parser.add_argument(
        "--enable_mkldnn",
        type=ast.literal_eval,
        default=False,
        help="Whether use mkldnn with CPU.")
    parser.add_argument(
        "--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
    parser.add_argument(
        "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
    parser.add_argument(
        "--trt_max_shape",
        type=int,
        default=1280,
        help="max_shape for TensorRT.")
    parser.add_argument(
        "--trt_opt_shape",
        type=int,
        default=640,
        help="opt_shape for TensorRT.")
    parser.add_argument(
        "--trt_calib_mode",
        type=bool,
        default=False,
        help="If the model is produced by TRT offline quantitative "
        "calibration, trt_calib_mode need to set True.")
    parser.add_argument(
        '--use_dark',
        type=ast.literal_eval,
        default=True,
        help='whether to use darkpose to get better keypoint position predict ')
    parser.add_argument(
        '--save_res',
        type=bool,
        default=False,
        help=(
            "whether to save predict results to json file"
            "1) store_res: a list of image_data"
            "2) image_data: [imageid, rects, [keypoints, scores]]"
            "3) rects: list of rect [xmin, ymin, xmax, ymax]"
            "4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list"
            "5) scores: mean of all joint conf"))
    parser.add_argument(
        '--smooth',
        type=ast.literal_eval,
        default=False,
        help='smoothing keypoints for each frame, new incoming keypoints will be more stable.'
    )
    parser.add_argument(
        '--filter_type',
        type=str,
        default='OneEuro',
        help='when set --smooth True, choose filter type you want to use, it can be [OneEuro] or [EMA].'
    )
    return parser
--- a/third-party/paddle-inference/infer.py
+++ b/third-party/paddle-inference/infer.py
--- a/third-party/paddle-inference/keypoint_infer.py
+++ b/third-party/paddle-inference/keypoint_infer.py
@@ -0,0 +1,433 @@
 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import time
 import yaml
 import glob
 from functools import reduce
 from PIL import Image
 import cv2
 import math
 import numpy as np
 import paddle
 import sys
 # add deploy path of PaddleDetection to sys.path
 parent_path = os.path.abspath(os.path.join(__file__, *(['..'])))
 sys.path.insert(0, parent_path)
 from preprocess import preprocess, NormalizeImage, Permute
 from keypoint_preprocess import EvalAffine, TopDownEvalAffine, expand_crop
 from keypoint_postprocess import HrHRNetPostProcess, HRNetPostProcess
 from visualize import visualize_pose
 from paddle.inference import Config
 from paddle.inference import create_predictor
 from utils import argsparser, Timer, get_current_memory_mb
 from benchmark_utils import PaddleInferBenchmark
 from infer import Detector, get_test_images, print_arguments
 # Global dictionary
 KEYPOINT_SUPPORT_MODELS = {
    'HigherHRNet': 'keypoint_bottomup',
    'HRNet': 'keypoint_topdown'
 }
 class KeyPointDetector(Detector):
    """
    Args:
        model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
        batch_size (int): size of pre batch in inference
        trt_min_shape (int): min shape for dynamic shape in trt
        trt_max_shape (int): max shape for dynamic shape in trt
        trt_opt_shape (int): opt shape for dynamic shape in trt
        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
            calibration, trt_calib_mode need to set True
        cpu_threads (int): cpu threads
        enable_mkldnn (bool): whether to open MKLDNN
        use_dark(bool): whether to use postprocess in DarkPose
    """
    def __init__(self,
                 model_dir,
                 device='CPU',
                 run_mode='paddle',
                 batch_size=1,
                 trt_min_shape=1,
                 trt_max_shape=1280,
                 trt_opt_shape=640,
                 trt_calib_mode=False,
                 cpu_threads=1,
                 enable_mkldnn=False,
                 output_dir='output',
                 threshold=0.5,
                 use_dark=True,
                 use_fd_format=False):
        super(KeyPointDetector, self).__init__(
            model_dir=model_dir,
            device=device,
            run_mode=run_mode,
            batch_size=batch_size,
            trt_min_shape=trt_min_shape,
            trt_max_shape=trt_max_shape,
            trt_opt_shape=trt_opt_shape,
            trt_calib_mode=trt_calib_mode,
            cpu_threads=cpu_threads,
            enable_mkldnn=enable_mkldnn,
            output_dir=output_dir,
            threshold=threshold,
            use_fd_format=use_fd_format)
        self.use_dark = use_dark
    def set_config(self, model_dir, use_fd_format):
        return PredictConfig_KeyPoint(model_dir, use_fd_format=use_fd_format)
    def get_person_from_rect(self, image, results):
        # crop the person result from image
        self.det_times.preprocess_time_s.start()
        valid_rects = results['boxes']
        rect_images = []
        new_rects = []
        org_rects = []
        for rect in valid_rects:
            rect_image, new_rect, org_rect = expand_crop(image, rect)
            if rect_image is None or rect_image.size == 0:
                continue
            rect_images.append(rect_image)
            new_rects.append(new_rect)
            org_rects.append(org_rect)
        self.det_times.preprocess_time_s.end()
        return rect_images, new_rects, org_rects
    def postprocess(self, inputs, result):
        np_heatmap = result['heatmap']
        np_masks = result['masks']
        # postprocess output of predictor
        if KEYPOINT_SUPPORT_MODELS[
                self.pred_config.arch] == 'keypoint_bottomup':
            results = {}
            h, w = inputs['im_shape'][0]
            preds = [np_heatmap]
            if np_masks is not None:
                preds += np_masks
            preds += [h, w]
            keypoint_postprocess = HrHRNetPostProcess()
            kpts, scores = keypoint_postprocess(*preds)
            results['keypoint'] = kpts
            results['score'] = scores
            return results
        elif KEYPOINT_SUPPORT_MODELS[
                self.pred_config.arch] == 'keypoint_topdown':
            results = {}
            imshape = inputs['im_shape'][:, ::-1]
            center = np.round(imshape / 2.)
            scale = imshape / 200.
            keypoint_postprocess = HRNetPostProcess(use_dark=self.use_dark)
            kpts, scores = keypoint_postprocess(np_heatmap, center, scale)
            results['keypoint'] = kpts
            results['score'] = scores
            return results
        else:
            raise ValueError("Unsupported arch: {}, expect {}".format(
                self.pred_config.arch, KEYPOINT_SUPPORT_MODELS))
    def predict(self, repeats=1):
        '''
        Args:
            repeats (int): repeat number for prediction
        Returns:
            results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
                            matix element:[class, score, x_min, y_min, x_max, y_max]
                            MaskRCNN's results include 'masks': np.ndarray:
                            shape: [N, im_h, im_w]
        '''
        # model prediction
        np_heatmap, np_masks = None, None
        for i in range(repeats):
            self.predictor.run()
            output_names = self.predictor.get_output_names()
            heatmap_tensor = self.predictor.get_output_handle(output_names[0])
            np_heatmap = heatmap_tensor.copy_to_cpu()
            if self.pred_config.tagmap:
                masks_tensor = self.predictor.get_output_handle(output_names[1])
                heat_k = self.predictor.get_output_handle(output_names[2])
                inds_k = self.predictor.get_output_handle(output_names[3])
                np_masks = [
                    masks_tensor.copy_to_cpu(), heat_k.copy_to_cpu(),
                    inds_k.copy_to_cpu()
                ]
        result = dict(heatmap=np_heatmap, masks=np_masks)
        return result
    def predict_image(self,
                      image_list,
                      run_benchmark=False,
                      repeats=1,
                      visual=True):
        results = []
        batch_loop_cnt = math.ceil(float(len(image_list)) / self.batch_size)
        for i in range(batch_loop_cnt):
            start_index = i * self.batch_size
            end_index = min((i + 1) * self.batch_size, len(image_list))
            batch_image_list = image_list[start_index:end_index]
            if run_benchmark:
                # preprocess
                inputs = self.preprocess(batch_image_list)  # warmup
                self.det_times.preprocess_time_s.start()
                inputs = self.preprocess(batch_image_list)
                self.det_times.preprocess_time_s.end()
                # model prediction
                result_warmup = self.predict(repeats=repeats)  # warmup
                self.det_times.inference_time_s.start()
                result = self.predict(repeats=repeats)
                self.det_times.inference_time_s.end(repeats=repeats)
                # postprocess
                result_warmup = self.postprocess(inputs, result)  # warmup
                self.det_times.postprocess_time_s.start()
                result = self.postprocess(inputs, result)
                self.det_times.postprocess_time_s.end()
                self.det_times.img_num += len(batch_image_list)
                cm, gm, gu = get_current_memory_mb()
                self.cpu_mem += cm
                self.gpu_mem += gm
                self.gpu_util += gu
            else:
                # preprocess
                self.det_times.preprocess_time_s.start()
                inputs = self.preprocess(batch_image_list)
                self.det_times.preprocess_time_s.end()
                # model prediction
                self.det_times.inference_time_s.start()
                result = self.predict()
                self.det_times.inference_time_s.end()
                # postprocess
                self.det_times.postprocess_time_s.start()
                result = self.postprocess(inputs, result)
                self.det_times.postprocess_time_s.end()
                self.det_times.img_num += len(batch_image_list)
                if visual:
                    if not os.path.exists(self.output_dir):
                        os.makedirs(self.output_dir)
                    visualize(
                        batch_image_list,
                        result,
                        visual_thresh=self.threshold,
                        save_dir=self.output_dir)
            results.append(result)
            if visual:
                print('Test iter {}'.format(i))
        results = self.merge_batch_result(results)
        return results
    def predict_video(self, video_file, camera_id):
        video_name = 'output.mp4'
        if camera_id != -1:
            capture = cv2.VideoCapture(camera_id)
        else:
            capture = cv2.VideoCapture(video_file)
            video_name = os.path.split(video_file)[-1]
        # Get Video info : resolution, fps, frame count
        width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fps = int(capture.get(cv2.CAP_PROP_FPS))
        frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
        print("fps: %d, frame_count: %d" % (fps, frame_count))
        if not os.path.exists(self.output_dir):
            os.makedirs(self.output_dir)
        out_path = os.path.join(self.output_dir, video_name)
        fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
        writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
        index = 1
        while (1):
            ret, frame = capture.read()
            if not ret:
                break
            print('detect frame: %d' % (index))
            index += 1
            results = self.predict_image([frame[:, :, ::-1]], visual=False)
            im_results = {}
            im_results['keypoint'] = [results['keypoint'], results['score']]
            im = visualize_pose(
                frame, im_results, visual_thresh=self.threshold, returnimg=True)
            writer.write(im)
            if camera_id != -1:
                cv2.imshow('Mask Detection', im)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
        writer.release()
 def create_inputs(imgs, im_info):
    """generate input for different model type
    Args:
        imgs (list(numpy)): list of image (np.ndarray)
        im_info (list(dict)): list of image info
    Returns:
        inputs (dict): input of model
    """
    inputs = {}
    inputs['image'] = np.stack(imgs, axis=0).astype('float32')
    im_shape = []
    for e in im_info:
        im_shape.append(np.array((e['im_shape'])).astype('float32'))
    inputs['im_shape'] = np.stack(im_shape, axis=0)
    return inputs
 class PredictConfig_KeyPoint():
    """set config of preprocess, postprocess and visualize
    Args:
        model_dir (str): root path of model.yml
    """
    def __init__(self, model_dir, use_fd_format=False):
        # parsing Yaml config for Preprocess
        fd_deploy_file = os.path.join(model_dir, 'inference.yml')
        ppdet_deploy_file = os.path.join(model_dir, 'infer_cfg.yml')
        if use_fd_format:
            if not os.path.exists(fd_deploy_file) and os.path.exists(
                    ppdet_deploy_file):
                raise RuntimeError(
                    "Non-FD format model detected. Please set `use_fd_format` to False."
                )
            deploy_file = fd_deploy_file
        else:
            if not os.path.exists(ppdet_deploy_file) and os.path.exists(
                    fd_deploy_file):
                raise RuntimeError(
                    "FD format model detected. Please set `use_fd_format` to False."
                )
            deploy_file = ppdet_deploy_file
        with open(deploy_file) as f:
            yml_conf = yaml.safe_load(f)
        self.check_model(yml_conf)
        self.arch = yml_conf['arch']
        self.archcls = KEYPOINT_SUPPORT_MODELS[yml_conf['arch']]
        self.preprocess_infos = yml_conf['Preprocess']
        self.min_subgraph_size = yml_conf['min_subgraph_size']
        self.labels = yml_conf['label_list']
        self.tagmap = False
        self.use_dynamic_shape = yml_conf['use_dynamic_shape']
        if 'keypoint_bottomup' == self.archcls:
            self.tagmap = True
        self.print_config()
    def check_model(self, yml_conf):
        """
        Raises:
            ValueError: loaded model not in supported model type 
        """
        for support_model in KEYPOINT_SUPPORT_MODELS:
            if support_model in yml_conf['arch']:
                return True
        raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[
            'arch'], KEYPOINT_SUPPORT_MODELS))
    def print_config(self):
        print('-----------  Model Configuration -----------')
        print('%s: %s' % ('Model Arch', self.arch))
        print('%s: ' % ('Transform Order'))
        for op_info in self.preprocess_infos:
            print('--%s: %s' % ('transform op', op_info['type']))
        print('--------------------------------------------')
 def visualize(image_list, results, visual_thresh=0.6, save_dir='output'):
    im_results = {}
    for i, image_file in enumerate(image_list):
        skeletons = results['keypoint']
        scores = results['score']
        skeleton = skeletons[i:i + 1]
        score = scores[i:i + 1]
        im_results['keypoint'] = [skeleton, score]
        visualize_pose(
            image_file,
            im_results,
            visual_thresh=visual_thresh,
            save_dir=save_dir)
 def main():
    detector = KeyPointDetector(
        FLAGS.model_dir,
        device=FLAGS.device,
        run_mode=FLAGS.run_mode,
        batch_size=FLAGS.batch_size,
        trt_min_shape=FLAGS.trt_min_shape,
        trt_max_shape=FLAGS.trt_max_shape,
        trt_opt_shape=FLAGS.trt_opt_shape,
        trt_calib_mode=FLAGS.trt_calib_mode,
        cpu_threads=FLAGS.cpu_threads,
        enable_mkldnn=FLAGS.enable_mkldnn,
        threshold=FLAGS.threshold,
        output_dir=FLAGS.output_dir,
        use_dark=FLAGS.use_dark,
        use_fd_format=FLAGS.use_fd_format)
    # predict from video file or camera video stream
    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
        detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
    else:
        # predict from image
        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
        detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10)
        if not FLAGS.run_benchmark:
            detector.det_times.info(average=True)
        else:
            mems = {
                'cpu_rss_mb': detector.cpu_mem / len(img_list),
                'gpu_rss_mb': detector.gpu_mem / len(img_list),
                'gpu_util': detector.gpu_util * 100 / len(img_list)
            }
            perf_info = detector.det_times.report(average=True)
            model_dir = FLAGS.model_dir
            mode = FLAGS.run_mode
            model_info = {
                'model_name': model_dir.strip('/').split('/')[-1],
                'precision': mode.split('_')[-1]
            }
            data_info = {
                'batch_size': 1,
                'shape': "dynamic_shape",
                'data_num': perf_info['img_num']
            }
            det_log = PaddleInferBenchmark(detector.config, model_info,
                                           data_info, perf_info, mems)
            det_log('KeyPoint')
 if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    FLAGS = parser.parse_args()
    print_arguments(FLAGS)
    FLAGS.device = FLAGS.device.upper()
    assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
                            ], "device should be CPU, GPU, XPU or NPU"
    assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device"
    main()
--- a/third-party/paddle-inference/keypoint_postprocess.py
+++ b/third-party/paddle-inference/keypoint_postprocess.py
@@ -0,0 +1,369 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. 
 #   
 # Licensed under the Apache License, Version 2.0 (the "License");   
 # you may not use this file except in compliance with the License.  
 # You may obtain a copy of the License at   
 #   
 #     http://www.apache.org/licenses/LICENSE-2.0    
 #   
 # Unless required by applicable law or agreed to in writing, software   
 # distributed under the License is distributed on an "AS IS" BASIS, 
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
 # See the License for the specific language governing permissions and   
 # limitations under the License.
 from scipy.optimize import linear_sum_assignment
 from collections import abc, defaultdict
 import cv2
 import numpy as np
 import math
 import paddle
 import paddle.nn as nn
 from keypoint_preprocess import get_affine_mat_kernel, get_affine_transform
 class HrHRNetPostProcess(object):
    """
    HrHRNet postprocess contain:
        1) get topk keypoints in the output heatmap
        2) sample the tagmap's value corresponding to each of the topk coordinate
        3) match different joints to combine to some people with Hungary algorithm
        4) adjust the coordinate by +-0.25 to decrease error std
        5) salvage missing joints by check positivity of heatmap - tagdiff_norm
    Args:
        max_num_people (int): max number of people support in postprocess
        heat_thresh (float): value of topk below this threshhold will be ignored
        tag_thresh (float): coord's value sampled in tagmap below this threshold belong to same people for init
        inputs(list[heatmap]): the output list of model, [heatmap, heatmap_maxpool, tagmap], heatmap_maxpool used to get topk
        original_height, original_width (float): the original image size
    """
    def __init__(self, max_num_people=30, heat_thresh=0.2, tag_thresh=1.):
        self.max_num_people = max_num_people
        self.heat_thresh = heat_thresh
        self.tag_thresh = tag_thresh
    def lerp(self, j, y, x, heatmap):
        H, W = heatmap.shape[-2:]
        left = np.clip(x - 1, 0, W - 1)
        right = np.clip(x + 1, 0, W - 1)
        up = np.clip(y - 1, 0, H - 1)
        down = np.clip(y + 1, 0, H - 1)
        offset_y = np.where(heatmap[j, down, x] > heatmap[j, up, x], 0.25,
                            -0.25)
        offset_x = np.where(heatmap[j, y, right] > heatmap[j, y, left], 0.25,
                            -0.25)
        return offset_y + 0.5, offset_x + 0.5
    def __call__(self, heatmap, tagmap, heat_k, inds_k, original_height,
                 original_width):
        N, J, H, W = heatmap.shape
        assert N == 1, "only support batch size 1"
        heatmap = heatmap[0]
        tagmap = tagmap[0]
        heats = heat_k[0]
        inds_np = inds_k[0]
        y = inds_np // W
        x = inds_np % W
        tags = tagmap[np.arange(J)[None, :].repeat(self.max_num_people),
                      y.flatten(), x.flatten()].reshape(J, -1, tagmap.shape[-1])
        coords = np.stack((y, x), axis=2)
        # threshold
        mask = heats > self.heat_thresh
        # cluster
        cluster = defaultdict(lambda: {
            'coords': np.zeros((J, 2), dtype=np.float32),
            'scores': np.zeros(J, dtype=np.float32),
            'tags': []
        })
        for jid, m in enumerate(mask):
            num_valid = m.sum()
            if num_valid == 0:
                continue
            valid_inds = np.where(m)[0]
            valid_tags = tags[jid, m, :]
            if len(cluster) == 0:  # initialize
                for i in valid_inds:
                    tag = tags[jid, i]
                    key = tag[0]
                    cluster[key]['tags'].append(tag)
                    cluster[key]['scores'][jid] = heats[jid, i]
                    cluster[key]['coords'][jid] = coords[jid, i]
                continue
            candidates = list(cluster.keys())[:self.max_num_people]
            centroids = [
                np.mean(
                    cluster[k]['tags'], axis=0) for k in candidates
            ]
            num_clusters = len(centroids)
            # shape is (num_valid, num_clusters, tag_dim)
            dist = valid_tags[:, None, :] - np.array(centroids)[None, ...]
            l2_dist = np.linalg.norm(dist, ord=2, axis=2)
            # modulate dist with heat value, see `use_detection_val`
            cost = np.round(l2_dist) * 100 - heats[jid, m, None]
            # pad the cost matrix, otherwise new pose are ignored
            if num_valid > num_clusters:
                cost = np.pad(cost, ((0, 0), (0, num_valid - num_clusters)),
                              'constant',
                              constant_values=((0, 0), (0, 1e-10)))
            rows, cols = linear_sum_assignment(cost)
            for y, x in zip(rows, cols):
                tag = tags[jid, y]
                if y < num_valid and x < num_clusters and \
                   l2_dist[y, x] < self.tag_thresh:
                    key = candidates[x]  # merge to cluster
                else:
                    key = tag[0]  # initialize new cluster
                cluster[key]['tags'].append(tag)
                cluster[key]['scores'][jid] = heats[jid, y]
                cluster[key]['coords'][jid] = coords[jid, y]
        # shape is [k, J, 2] and [k, J]
        pose_tags = np.array([cluster[k]['tags'] for k in cluster])
        pose_coords = np.array([cluster[k]['coords'] for k in cluster])
        pose_scores = np.array([cluster[k]['scores'] for k in cluster])
        valid = pose_scores > 0
        pose_kpts = np.zeros((pose_scores.shape[0], J, 3), dtype=np.float32)
        if valid.sum() == 0:
            return pose_kpts, pose_kpts
        # refine coords
        valid_coords = pose_coords[valid].astype(np.int32)
        y = valid_coords[..., 0].flatten()
        x = valid_coords[..., 1].flatten()
        _, j = np.nonzero(valid)
        offsets = self.lerp(j, y, x, heatmap)
        pose_coords[valid, 0] += offsets[0]
        pose_coords[valid, 1] += offsets[1]
        # mean score before salvage
        mean_score = pose_scores.mean(axis=1)
        pose_kpts[valid, 2] = pose_scores[valid]
        # salvage missing joints
        if True:
            for pid, coords in enumerate(pose_coords):
                tag_mean = np.array(pose_tags[pid]).mean(axis=0)
                norm = np.sum((tagmap - tag_mean)**2, axis=3)**0.5
                score = heatmap - np.round(norm)  # (J, H, W)
                flat_score = score.reshape(J, -1)
                max_inds = np.argmax(flat_score, axis=1)
                max_scores = np.max(flat_score, axis=1)
                salvage_joints = (pose_scores[pid] == 0) & (max_scores > 0)
                if salvage_joints.sum() == 0:
                    continue
                y = max_inds[salvage_joints] // W
                x = max_inds[salvage_joints] % W
                offsets = self.lerp(salvage_joints.nonzero()[0], y, x, heatmap)
                y = y.astype(np.float32) + offsets[0]
                x = x.astype(np.float32) + offsets[1]
                pose_coords[pid][salvage_joints, 0] = y
                pose_coords[pid][salvage_joints, 1] = x
                pose_kpts[pid][salvage_joints, 2] = max_scores[salvage_joints]
        pose_kpts[..., :2] = transpred(pose_coords[..., :2][..., ::-1],
                                       original_height, original_width,
                                       min(H, W))
        return pose_kpts, mean_score
 def transpred(kpts, h, w, s):
    trans, _ = get_affine_mat_kernel(h, w, s, inv=True)
    return warp_affine_joints(kpts[..., :2].copy(), trans)
 def warp_affine_joints(joints, mat):
    """Apply affine transformation defined by the transform matrix on the
    joints.
    Args:
        joints (np.ndarray[..., 2]): Origin coordinate of joints.
        mat (np.ndarray[3, 2]): The affine matrix.
    Returns:
        matrix (np.ndarray[..., 2]): Result coordinate of joints.
    """
    joints = np.array(joints)
    shape = joints.shape
    joints = joints.reshape(-1, 2)
    return np.dot(np.concatenate(
        (joints, joints[:, 0:1] * 0 + 1), axis=1),
                  mat.T).reshape(shape)
 class HRNetPostProcess(object):
    def __init__(self, use_dark=True):
        self.use_dark = use_dark
    def flip_back(self, output_flipped, matched_parts):
        assert output_flipped.ndim == 4,\
                'output_flipped should be [batch_size, num_joints, height, width]'
        output_flipped = output_flipped[:, :, :, ::-1]
        for pair in matched_parts:
            tmp = output_flipped[:, pair[0], :, :].copy()
            output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
            output_flipped[:, pair[1], :, :] = tmp
        return output_flipped
    def get_max_preds(self, heatmaps):
        """get predictions from score maps
        Args:
            heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
        Returns:
            preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
            maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints
        """
        assert isinstance(heatmaps,
                          np.ndarray), 'heatmaps should be numpy.ndarray'
        assert heatmaps.ndim == 4, 'batch_images should be 4-ndim'
        batch_size = heatmaps.shape[0]
        num_joints = heatmaps.shape[1]
        width = heatmaps.shape[3]
        heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1))
        idx = np.argmax(heatmaps_reshaped, 2)
        maxvals = np.amax(heatmaps_reshaped, 2)
        maxvals = maxvals.reshape((batch_size, num_joints, 1))
        idx = idx.reshape((batch_size, num_joints, 1))
        preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
        preds[:, :, 0] = (preds[:, :, 0]) % width
        preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
        pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
        pred_mask = pred_mask.astype(np.float32)
        preds *= pred_mask
        return preds, maxvals
    def gaussian_blur(self, heatmap, kernel):
        border = (kernel - 1) // 2
        batch_size = heatmap.shape[0]
        num_joints = heatmap.shape[1]
        height = heatmap.shape[2]
        width = heatmap.shape[3]
        for i in range(batch_size):
            for j in range(num_joints):
                origin_max = np.max(heatmap[i, j])
                dr = np.zeros((height + 2 * border, width + 2 * border))
                dr[border:-border, border:-border] = heatmap[i, j].copy()
                dr = cv2.GaussianBlur(dr, (kernel, kernel), 0)
                heatmap[i, j] = dr[border:-border, border:-border].copy()
                heatmap[i, j] *= origin_max / np.max(heatmap[i, j])
        return heatmap
    def dark_parse(self, hm, coord):
        heatmap_height = hm.shape[0]
        heatmap_width = hm.shape[1]
        px = int(coord[0])
        py = int(coord[1])
        if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2:
            dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1])
            dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px])
            dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2])
            dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \
                + hm[py-1][px-1])
            dyy = 0.25 * (
                hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px])
            derivative = np.matrix([[dx], [dy]])
            hessian = np.matrix([[dxx, dxy], [dxy, dyy]])
            if dxx * dyy - dxy**2 != 0:
                hessianinv = hessian.I
                offset = -hessianinv * derivative
                offset = np.squeeze(np.array(offset.T), axis=0)
                coord += offset
        return coord
    def dark_postprocess(self, hm, coords, kernelsize):
        """
        refer to https://github.com/ilovepose/DarkPose/lib/core/inference.py
        """
        hm = self.gaussian_blur(hm, kernelsize)
        hm = np.maximum(hm, 1e-10)
        hm = np.log(hm)
        for n in range(coords.shape[0]):
            for p in range(coords.shape[1]):
                coords[n, p] = self.dark_parse(hm[n][p], coords[n][p])
        return coords
    def get_final_preds(self, heatmaps, center, scale, kernelsize=3):
        """the highest heatvalue location with a quarter offset in the
        direction from the highest response to the second highest response.
        Args:
            heatmaps (numpy.ndarray): The predicted heatmaps
            center (numpy.ndarray): The boxes center
            scale (numpy.ndarray): The scale factor
        Returns:
            preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
            maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints
        """
        coords, maxvals = self.get_max_preds(heatmaps)
        heatmap_height = heatmaps.shape[2]
        heatmap_width = heatmaps.shape[3]
        if self.use_dark:
            coords = self.dark_postprocess(heatmaps, coords, kernelsize)
        else:
            for n in range(coords.shape[0]):
                for p in range(coords.shape[1]):
                    hm = heatmaps[n][p]
                    px = int(math.floor(coords[n][p][0] + 0.5))
                    py = int(math.floor(coords[n][p][1] + 0.5))
                    if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1:
                        diff = np.array([
                            hm[py][px + 1] - hm[py][px - 1],
                            hm[py + 1][px] - hm[py - 1][px]
                        ])
                        coords[n][p] += np.sign(diff) * .25
        preds = coords.copy()
        # Transform back
        for i in range(coords.shape[0]):
            preds[i] = transform_preds(coords[i], center[i], scale[i],
                                       [heatmap_width, heatmap_height])
        return preds, maxvals
    def __call__(self, output, center, scale):
        preds, maxvals = self.get_final_preds(output, center, scale)
        return np.concatenate(
            (preds, maxvals), axis=-1), np.mean(
                maxvals, axis=1)
 def transform_preds(coords, center, scale, output_size):
    target_coords = np.zeros(coords.shape)
    trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1)
    for p in range(coords.shape[0]):
        target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
    return target_coords
 def affine_transform(pt, t):
    new_pt = np.array([pt[0], pt[1], 1.]).T
    new_pt = np.dot(t, new_pt)
    return new_pt[:2]
 def translate_to_ori_images(keypoint_result, batch_records):
    kpts = keypoint_result['keypoint']
    scores = keypoint_result['score']
    kpts[..., 0] += batch_records[:, 0:1]
    kpts[..., 1] += batch_records[:, 1:2]
    return kpts, scores
--- a/third-party/paddle-inference/keypoint_preprocess.py
+++ b/third-party/paddle-inference/keypoint_preprocess.py
@@ -0,0 +1,243 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
 this code is based on https://github.com/open-mmlab/mmpose/mmpose/core/post_processing/post_transforms.py
 """
 import cv2
 import numpy as np
 class EvalAffine(object):
    def __init__(self, size, stride=64):
        super(EvalAffine, self).__init__()
        self.size = size
        self.stride = stride
    def __call__(self, image, im_info):
        s = self.size
        h, w, _ = image.shape
        trans, size_resized = get_affine_mat_kernel(h, w, s, inv=False)
        image_resized = cv2.warpAffine(image, trans, size_resized)
        return image_resized, im_info
 def get_affine_mat_kernel(h, w, s, inv=False):
    if w < h:
        w_ = s
        h_ = int(np.ceil((s / w * h) / 64.) * 64)
        scale_w = w
        scale_h = h_ / w_ * w
    else:
        h_ = s
        w_ = int(np.ceil((s / h * w) / 64.) * 64)
        scale_h = h
        scale_w = w_ / h_ * h
    center = np.array([np.round(w / 2.), np.round(h / 2.)])
    size_resized = (w_, h_)
    trans = get_affine_transform(
        center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv)
    return trans, size_resized
 def get_affine_transform(center,
                         input_size,
                         rot,
                         output_size,
                         shift=(0., 0.),
                         inv=False):
    """Get the affine transform matrix, given the center/scale/rot/output_size.
    Args:
        center (np.ndarray[2, ]): Center of the bounding box (x, y).
        scale (np.ndarray[2, ]): Scale of the bounding box
            wrt [width, height].
        rot (float): Rotation angle (degree).
        output_size (np.ndarray[2, ]): Size of the destination heatmaps.
        shift (0-100%): Shift translation ratio wrt the width/height.
            Default (0., 0.).
        inv (bool): Option to inverse the affine transform direction.
            (inv=False: src->dst or inv=True: dst->src)
    Returns:
        np.ndarray: The transform matrix.
    """
    assert len(center) == 2
    assert len(output_size) == 2
    assert len(shift) == 2
    if not isinstance(input_size, (np.ndarray, list)):
        input_size = np.array([input_size, input_size], dtype=np.float32)
    scale_tmp = input_size
    shift = np.array(shift)
    src_w = scale_tmp[0]
    dst_w = output_size[0]
    dst_h = output_size[1]
    rot_rad = np.pi * rot / 180
    src_dir = rotate_point([0., src_w * -0.5], rot_rad)
    dst_dir = np.array([0., dst_w * -0.5])
    src = np.zeros((3, 2), dtype=np.float32)
    src[0, :] = center + scale_tmp * shift
    src[1, :] = center + src_dir + scale_tmp * shift
    src[2, :] = _get_3rd_point(src[0, :], src[1, :])
    dst = np.zeros((3, 2), dtype=np.float32)
    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
    dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
    if inv:
        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
    else:
        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
    return trans
 def get_warp_matrix(theta, size_input, size_dst, size_target):
    """This code is based on 
        https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py
        Calculate the transformation matrix under the constraint of unbiased.
    Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
    Data Processing for Human Pose Estimation (CVPR 2020).
    Args:
        theta (float): Rotation angle in degrees.
        size_input (np.ndarray): Size of input image [w, h].
        size_dst (np.ndarray): Size of output image [w, h].
        size_target (np.ndarray): Size of ROI in input plane [w, h].
    Returns:
        matrix (np.ndarray): A matrix for transformation.
    """
    theta = np.deg2rad(theta)
    matrix = np.zeros((2, 3), dtype=np.float32)
    scale_x = size_dst[0] / size_target[0]
    scale_y = size_dst[1] / size_target[1]
    matrix[0, 0] = np.cos(theta) * scale_x
    matrix[0, 1] = -np.sin(theta) * scale_x
    matrix[0, 2] = scale_x * (
        -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] *
        np.sin(theta) + 0.5 * size_target[0])
    matrix[1, 0] = np.sin(theta) * scale_y
    matrix[1, 1] = np.cos(theta) * scale_y
    matrix[1, 2] = scale_y * (
        -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] *
        np.cos(theta) + 0.5 * size_target[1])
    return matrix
 def rotate_point(pt, angle_rad):
    """Rotate a point by an angle.
    Args:
        pt (list[float]): 2 dimensional point to be rotated
        angle_rad (float): rotation angle by radian
    Returns:
        list[float]: Rotated point.
    """
    assert len(pt) == 2
    sn, cs = np.sin(angle_rad), np.cos(angle_rad)
    new_x = pt[0] * cs - pt[1] * sn
    new_y = pt[0] * sn + pt[1] * cs
    rotated_pt = [new_x, new_y]
    return rotated_pt
 def _get_3rd_point(a, b):
    """To calculate the affine matrix, three pairs of points are required. This
    function is used to get the 3rd point, given 2D points a & b.
    The 3rd point is defined by rotating vector `a - b` by 90 degrees
    anticlockwise, using b as the rotation center.
    Args:
        a (np.ndarray): point(x,y)
        b (np.ndarray): point(x,y)
    Returns:
        np.ndarray: The 3rd point.
    """
    assert len(a) == 2
    assert len(b) == 2
    direction = a - b
    third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
    return third_pt
 class TopDownEvalAffine(object):
    """apply affine transform to image and coords
    Args:
        trainsize (list): [w, h], the standard size used to train
        use_udp (bool): whether to use Unbiased Data Processing.
        records(dict): the dict contained the image and coords
    Returns:
        records (dict): contain the image and coords after tranformed
    """
    def __init__(self, trainsize, use_udp=False):
        self.trainsize = trainsize
        self.use_udp = use_udp
    def __call__(self, image, im_info):
        rot = 0
        imshape = im_info['im_shape'][::-1]
        center = im_info['center'] if 'center' in im_info else imshape / 2.
        scale = im_info['scale'] if 'scale' in im_info else imshape
        if self.use_udp:
            trans = get_warp_matrix(
                rot, center * 2.0,
                [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale)
            image = cv2.warpAffine(
                image,
                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
                flags=cv2.INTER_LINEAR)
        else:
            trans = get_affine_transform(center, scale, rot, self.trainsize)
            image = cv2.warpAffine(
                image,
                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
                flags=cv2.INTER_LINEAR)
        return image, im_info
 def expand_crop(images, rect, expand_ratio=0.3):
    imgh, imgw, c = images.shape
    label, conf, xmin, ymin, xmax, ymax = [int(x) for x in rect.tolist()]
    if label != 0:
        return None, None, None
    org_rect = [xmin, ymin, xmax, ymax]
    h_half = (ymax - ymin) * (1 + expand_ratio) / 2.
    w_half = (xmax - xmin) * (1 + expand_ratio) / 2.
    if h_half > w_half * 4 / 3:
        w_half = h_half * 0.75
    center = [(ymin + ymax) / 2., (xmin + xmax) / 2.]
    ymin = max(0, int(center[0] - h_half))
    ymax = min(imgh - 1, int(center[0] + h_half))
    xmin = max(0, int(center[1] - w_half))
    xmax = min(imgw - 1, int(center[1] + w_half))
    return images[ymin:ymax, xmin:xmax, :], [xmin, ymin, xmax, ymax], org_rect
--- a/third-party/paddle-inference/mot_centertrack_infer.py
+++ b/third-party/paddle-inference/mot_centertrack_infer.py
@@ -0,0 +1,501 @@
 # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import copy
 import math
 import time
 import yaml
 import cv2
 import numpy as np
 from collections import defaultdict
 import paddle
 from benchmark_utils import PaddleInferBenchmark
 from utils import gaussian_radius, gaussian2D, draw_umich_gaussian
 from preprocess import preprocess, decode_image, WarpAffine, NormalizeImage, Permute
 from utils import argsparser, Timer, get_current_memory_mb
 from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig
 from keypoint_preprocess import get_affine_transform
 # add python path
 import sys
 parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
 sys.path.insert(0, parent_path)
 from pptracking.python.mot import CenterTracker
 from pptracking.python.mot.utils import MOTTimer, write_mot_results
 from pptracking.python.mot.visualize import plot_tracking
 def transform_preds_with_trans(coords, trans):
    target_coords = np.ones((coords.shape[0], 3), np.float32)
    target_coords[:, :2] = coords
    target_coords = np.dot(trans, target_coords.transpose()).transpose()
    return target_coords[:, :2]
 def affine_transform(pt, t):
    new_pt = np.array([pt[0], pt[1], 1.]).T
    new_pt = np.dot(t, new_pt)
    return new_pt[:2]
 def affine_transform_bbox(bbox, trans, width, height):
    bbox = np.array(copy.deepcopy(bbox), dtype=np.float32)
    bbox[:2] = affine_transform(bbox[:2], trans)
    bbox[2:] = affine_transform(bbox[2:], trans)
    bbox[[0, 2]] = np.clip(bbox[[0, 2]], 0, width - 1)
    bbox[[1, 3]] = np.clip(bbox[[1, 3]], 0, height - 1)
    return bbox
 class CenterTrack(Detector):
    """
    Args:
        model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
        batch_size (int): size of pre batch in inference
        trt_min_shape (int): min shape for dynamic shape in trt
        trt_max_shape (int): max shape for dynamic shape in trt
        trt_opt_shape (int): opt shape for dynamic shape in trt
        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
            calibration, trt_calib_mode need to set True
        cpu_threads (int): cpu threads
        enable_mkldnn (bool): whether to open MKLDNN
        output_dir (string): The path of output, default as 'output'
        threshold (float): Score threshold of the detected bbox, default as 0.5
        save_images (bool): Whether to save visualization image results, default as False
        save_mot_txts (bool): Whether to save tracking results (txt), default as False
    """
    def __init__(
            self,
            model_dir,
            tracker_config=None,
            device='CPU',
            run_mode='paddle',
            batch_size=1,
            trt_min_shape=1,
            trt_max_shape=960,
            trt_opt_shape=544,
            trt_calib_mode=False,
            cpu_threads=1,
            enable_mkldnn=False,
            output_dir='output',
            threshold=0.5,
            save_images=False,
            save_mot_txts=False, ):
        super(CenterTrack, self).__init__(
            model_dir=model_dir,
            device=device,
            run_mode=run_mode,
            batch_size=batch_size,
            trt_min_shape=trt_min_shape,
            trt_max_shape=trt_max_shape,
            trt_opt_shape=trt_opt_shape,
            trt_calib_mode=trt_calib_mode,
            cpu_threads=cpu_threads,
            enable_mkldnn=enable_mkldnn,
            output_dir=output_dir,
            threshold=threshold, )
        self.save_images = save_images
        self.save_mot_txts = save_mot_txts
        assert batch_size == 1, "MOT model only supports batch_size=1."
        self.det_times = Timer(with_tracker=True)
        self.num_classes = len(self.pred_config.labels)
        # tracker config
        cfg = self.pred_config.tracker
        min_box_area = cfg.get('min_box_area', -1)
        vertical_ratio = cfg.get('vertical_ratio', -1)
        track_thresh = cfg.get('track_thresh', 0.4)
        pre_thresh = cfg.get('pre_thresh', 0.5)
        self.tracker = CenterTracker(
            num_classes=self.num_classes,
            min_box_area=min_box_area,
            vertical_ratio=vertical_ratio,
            track_thresh=track_thresh,
            pre_thresh=pre_thresh)
        self.pre_image = None
    def get_additional_inputs(self, dets, meta, with_hm=True):
        # Render input heatmap from previous trackings.
        trans_input = meta['trans_input']
        inp_width, inp_height = int(meta['inp_width']), int(meta['inp_height'])
        input_hm = np.zeros((1, inp_height, inp_width), dtype=np.float32)
        for det in dets:
            if det['score'] < self.tracker.pre_thresh:
                continue
            bbox = affine_transform_bbox(det['bbox'], trans_input, inp_width,
                                         inp_height)
            h, w = bbox[3] - bbox[1], bbox[2] - bbox[0]
            if (h > 0 and w > 0):
                radius = gaussian_radius(
                    (math.ceil(h), math.ceil(w)), min_overlap=0.7)
                radius = max(0, int(radius))
                ct = np.array(
                    [(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2],
                    dtype=np.float32)
                ct_int = ct.astype(np.int32)
                if with_hm:
                    input_hm[0] = draw_umich_gaussian(input_hm[0], ct_int,
                                                      radius)
        if with_hm:
            input_hm = input_hm[np.newaxis]
        return input_hm
    def preprocess(self, image_list):
        preprocess_ops = []
        for op_info in self.pred_config.preprocess_infos:
            new_op_info = op_info.copy()
            op_type = new_op_info.pop('type')
            preprocess_ops.append(eval(op_type)(**new_op_info))
        assert len(image_list) == 1, 'MOT only support bs=1'
        im_path = image_list[0]
        im, im_info = preprocess(im_path, preprocess_ops)
        #inputs = create_inputs(im, im_info)
        inputs = {}
        inputs['image'] = np.array((im, )).astype('float32')
        inputs['im_shape'] = np.array((im_info['im_shape'], )).astype('float32')
        inputs['scale_factor'] = np.array(
            (im_info['scale_factor'], )).astype('float32')
        inputs['trans_input'] = im_info['trans_input']
        inputs['inp_width'] = im_info['inp_width']
        inputs['inp_height'] = im_info['inp_height']
        inputs['center'] = im_info['center']
        inputs['scale'] = im_info['scale']
        inputs['out_height'] = im_info['out_height']
        inputs['out_width'] = im_info['out_width']
        if self.pre_image is None:
            self.pre_image = inputs['image']
            # initializing tracker for the first frame
            self.tracker.init_track([])
        inputs['pre_image'] = self.pre_image
        self.pre_image = inputs['image']  # Note: update for next image
        # render input heatmap from tracker status
        pre_hm = self.get_additional_inputs(
            self.tracker.tracks, inputs, with_hm=True)
        inputs['pre_hm'] = pre_hm  #.to_tensor(pre_hm)
        input_names = self.predictor.get_input_names()
        for i in range(len(input_names)):
            input_tensor = self.predictor.get_input_handle(input_names[i])
            if input_names[i] == 'x':
                input_tensor.copy_from_cpu(inputs['image'])
            else:
                input_tensor.copy_from_cpu(inputs[input_names[i]])
        return inputs
    def postprocess(self, inputs, result):
        # postprocess output of predictor
        np_bboxes = result['bboxes']
        if np_bboxes.shape[0] <= 0:
            print('[WARNNING] No object detected and tracked.')
            result = {'bboxes': np.zeros([0, 6]), 'cts': None, 'tracking': None}
            return result
        result = {k: v for k, v in result.items() if v is not None}
        return result
    def centertrack_post_process(self, dets, meta, out_thresh):
        if not ('bboxes' in dets):
            return [{}]
        preds = []
        c, s = meta['center'], meta['scale']
        h, w = meta['out_height'], meta['out_width']
        trans = get_affine_transform(
            center=c,
            input_size=s,
            rot=0,
            output_size=[w, h],
            shift=(0., 0.),
            inv=True).astype(np.float32)
        for i, dets_bbox in enumerate(dets['bboxes']):
            if dets_bbox[1] < out_thresh:
                break
            item = {}
            item['score'] = dets_bbox[1]
            item['class'] = int(dets_bbox[0]) + 1
            item['ct'] = transform_preds_with_trans(
                dets['cts'][i].reshape([1, 2]), trans).reshape(2)
            if 'tracking' in dets:
                tracking = transform_preds_with_trans(
                    (dets['tracking'][i] + dets['cts'][i]).reshape([1, 2]),
                    trans).reshape(2)
                item['tracking'] = tracking - item['ct']
            if 'bboxes' in dets:
                bbox = transform_preds_with_trans(
                    dets_bbox[2:6].reshape([2, 2]), trans).reshape(4)
                item['bbox'] = bbox
            preds.append(item)
        return preds
    def tracking(self, inputs, det_results):
        result = self.centertrack_post_process(det_results, inputs,
                                               self.tracker.out_thresh)
        online_targets = self.tracker.update(result)
        online_tlwhs, online_scores, online_ids = [], [], []
        for t in online_targets:
            bbox = t['bbox']
            tlwh = [bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1]]
            tscore = float(t['score'])
            tid = int(t['tracking_id'])
            if tlwh[2] * tlwh[3] > 0:
                online_tlwhs.append(tlwh)
                online_ids.append(tid)
                online_scores.append(tscore)
        return online_tlwhs, online_scores, online_ids
    def predict(self, repeats=1):
        '''
        Args:
            repeats (int): repeats number for prediction
        Returns:
            result (dict): include 'bboxes', 'cts' and 'tracking':
                np.ndarray: shape:[N,6],[N,2] and [N,2], N: number of box
        '''
        # model prediction
        np_bboxes, np_cts, np_tracking = None, None, None
        for i in range(repeats):
            self.predictor.run()
            output_names = self.predictor.get_output_names()
            bboxes_tensor = self.predictor.get_output_handle(output_names[0])
            np_bboxes = bboxes_tensor.copy_to_cpu()
            cts_tensor = self.predictor.get_output_handle(output_names[1])
            np_cts = cts_tensor.copy_to_cpu()
            tracking_tensor = self.predictor.get_output_handle(output_names[2])
            np_tracking = tracking_tensor.copy_to_cpu()
        result = dict(bboxes=np_bboxes, cts=np_cts, tracking=np_tracking)
        return result
    def predict_image(self,
                      image_list,
                      run_benchmark=False,
                      repeats=1,
                      visual=True,
                      seq_name=None):
        mot_results = []
        num_classes = self.num_classes
        image_list.sort()
        ids2names = self.pred_config.labels
        data_type = 'mcmot' if num_classes > 1 else 'mot'
        for frame_id, img_file in enumerate(image_list):
            batch_image_list = [img_file]  # bs=1 in MOT model
            if run_benchmark:
                # preprocess
                inputs = self.preprocess(batch_image_list)  # warmup
                self.det_times.preprocess_time_s.start()
                inputs = self.preprocess(batch_image_list)
                self.det_times.preprocess_time_s.end()
                # model prediction
                result_warmup = self.predict(repeats=repeats)  # warmup
                self.det_times.inference_time_s.start()
                result = self.predict(repeats=repeats)
                self.det_times.inference_time_s.end(repeats=repeats)
                # postprocess
                result_warmup = self.postprocess(inputs, result)  # warmup
                self.det_times.postprocess_time_s.start()
                det_result = self.postprocess(inputs, result)
                self.det_times.postprocess_time_s.end()
                # tracking
                result_warmup = self.tracking(inputs, det_result)
                self.det_times.tracking_time_s.start()
                online_tlwhs, online_scores, online_ids = self.tracking(
                    inputs, det_result)
                self.det_times.tracking_time_s.end()
                self.det_times.img_num += 1
                cm, gm, gu = get_current_memory_mb()
                self.cpu_mem += cm
                self.gpu_mem += gm
                self.gpu_util += gu
            else:
                self.det_times.preprocess_time_s.start()
                inputs = self.preprocess(batch_image_list)
                self.det_times.preprocess_time_s.end()
                self.det_times.inference_time_s.start()
                result = self.predict()
                self.det_times.inference_time_s.end()
                self.det_times.postprocess_time_s.start()
                det_result = self.postprocess(inputs, result)
                self.det_times.postprocess_time_s.end()
                # tracking process
                self.det_times.tracking_time_s.start()
                online_tlwhs, online_scores, online_ids = self.tracking(
                    inputs, det_result)
                self.det_times.tracking_time_s.end()
                self.det_times.img_num += 1
            if visual:
                if len(image_list) > 1 and frame_id % 10 == 0:
                    print('Tracking frame {}'.format(frame_id))
                frame, _ = decode_image(img_file, {})
                im = plot_tracking(
                    frame,
                    online_tlwhs,
                    online_ids,
                    online_scores,
                    frame_id=frame_id,
                    ids2names=ids2names)
                if seq_name is None:
                    seq_name = image_list[0].split('/')[-2]
                save_dir = os.path.join(self.output_dir, seq_name)
                if not os.path.exists(save_dir):
                    os.makedirs(save_dir)
                cv2.imwrite(
                    os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im)
            mot_results.append([online_tlwhs, online_scores, online_ids])
        return mot_results
    def predict_video(self, video_file, camera_id):
        video_out_name = 'mot_output.mp4'
        if camera_id != -1:
            capture = cv2.VideoCapture(camera_id)
        else:
            capture = cv2.VideoCapture(video_file)
            video_out_name = os.path.split(video_file)[-1]
        # Get Video info : resolution, fps, frame count
        width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fps = int(capture.get(cv2.CAP_PROP_FPS))
        frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
        print("fps: %d, frame_count: %d" % (fps, frame_count))
        if not os.path.exists(self.output_dir):
            os.makedirs(self.output_dir)
        out_path = os.path.join(self.output_dir, video_out_name)
        video_format = 'mp4v'
        fourcc = cv2.VideoWriter_fourcc(*video_format)
        writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
        frame_id = 1
        timer = MOTTimer()
        results = defaultdict(list)  # centertrack onpy support single class
        num_classes = self.num_classes
        data_type = 'mcmot' if num_classes > 1 else 'mot'
        ids2names = self.pred_config.labels
        while (1):
            ret, frame = capture.read()
            if not ret:
                break
            if frame_id % 10 == 0:
                print('Tracking frame: %d' % (frame_id))
            frame_id += 1
            timer.tic()
            seq_name = video_out_name.split('.')[0]
            mot_results = self.predict_image(
                [frame[:, :, ::-1]], visual=False, seq_name=seq_name)
            timer.toc()
            fps = 1. / timer.duration
            online_tlwhs, online_scores, online_ids = mot_results[0]
            results[0].append(
                (frame_id + 1, online_tlwhs, online_scores, online_ids))
            im = plot_tracking(
                frame,
                online_tlwhs,
                online_ids,
                online_scores,
                frame_id=frame_id,
                fps=fps,
                ids2names=ids2names)
            writer.write(im)
            if camera_id != -1:
                cv2.imshow('Mask Detection', im)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
        if self.save_mot_txts:
            result_filename = os.path.join(
                self.output_dir, video_out_name.split('.')[-2] + '.txt')
            write_mot_results(result_filename, results, data_type, num_classes)
        writer.release()
 def main():
    detector = CenterTrack(
        FLAGS.model_dir,
        tracker_config=None,
        device=FLAGS.device,
        run_mode=FLAGS.run_mode,
        batch_size=1,
        trt_min_shape=FLAGS.trt_min_shape,
        trt_max_shape=FLAGS.trt_max_shape,
        trt_opt_shape=FLAGS.trt_opt_shape,
        trt_calib_mode=FLAGS.trt_calib_mode,
        cpu_threads=FLAGS.cpu_threads,
        enable_mkldnn=FLAGS.enable_mkldnn,
        output_dir=FLAGS.output_dir,
        threshold=FLAGS.threshold,
        save_images=FLAGS.save_images,
        save_mot_txts=FLAGS.save_mot_txts)
    # predict from video file or camera video stream
    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
        detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
    else:
        # predict from image
        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
        detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10)
        if not FLAGS.run_benchmark:
            detector.det_times.info(average=True)
        else:
            mode = FLAGS.run_mode
            model_dir = FLAGS.model_dir
            model_info = {
                'model_name': model_dir.strip('/').split('/')[-1],
                'precision': mode.split('_')[-1]
            }
            bench_log(detector, img_list, model_info, name='MOT')
 if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    FLAGS = parser.parse_args()
    print_arguments(FLAGS)
    FLAGS.device = FLAGS.device.upper()
    assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
                            ], "device should be CPU, GPU, NPU or XPU"
    main()
--- a/third-party/paddle-inference/mot_jde_infer.py
+++ b/third-party/paddle-inference/mot_jde_infer.py
@@ -0,0 +1,381 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import time
 import yaml
 import cv2
 import numpy as np
 from collections import defaultdict
 import paddle
 from benchmark_utils import PaddleInferBenchmark
 from preprocess import decode_image
 from utils import argsparser, Timer, get_current_memory_mb
 from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig
 # add python path
 import sys
 parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
 sys.path.insert(0, parent_path)
 from pptracking.python.mot import JDETracker
 from pptracking.python.mot.utils import MOTTimer, write_mot_results
 from pptracking.python.mot.visualize import plot_tracking_dict
 # Global dictionary
 MOT_JDE_SUPPORT_MODELS = {
    'JDE',
    'FairMOT',
 }
 class JDE_Detector(Detector):
    """
    Args:
        model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
        batch_size (int): size of pre batch in inference
        trt_min_shape (int): min shape for dynamic shape in trt
        trt_max_shape (int): max shape for dynamic shape in trt
        trt_opt_shape (int): opt shape for dynamic shape in trt
        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
            calibration, trt_calib_mode need to set True
        cpu_threads (int): cpu threads
        enable_mkldnn (bool): whether to open MKLDNN
        output_dir (string): The path of output, default as 'output'
        threshold (float): Score threshold of the detected bbox, default as 0.5
        save_images (bool): Whether to save visualization image results, default as False
        save_mot_txts (bool): Whether to save tracking results (txt), default as False
    """
    def __init__(
            self,
            model_dir,
            tracker_config=None,
            device='CPU',
            run_mode='paddle',
            batch_size=1,
            trt_min_shape=1,
            trt_max_shape=1088,
            trt_opt_shape=608,
            trt_calib_mode=False,
            cpu_threads=1,
            enable_mkldnn=False,
            output_dir='output',
            threshold=0.5,
            save_images=False,
            save_mot_txts=False, ):
        super(JDE_Detector, self).__init__(
            model_dir=model_dir,
            device=device,
            run_mode=run_mode,
            batch_size=batch_size,
            trt_min_shape=trt_min_shape,
            trt_max_shape=trt_max_shape,
            trt_opt_shape=trt_opt_shape,
            trt_calib_mode=trt_calib_mode,
            cpu_threads=cpu_threads,
            enable_mkldnn=enable_mkldnn,
            output_dir=output_dir,
            threshold=threshold, )
        self.save_images = save_images
        self.save_mot_txts = save_mot_txts
        assert batch_size == 1, "MOT model only supports batch_size=1."
        self.det_times = Timer(with_tracker=True)
        self.num_classes = len(self.pred_config.labels)
        # tracker config
        assert self.pred_config.tracker, "The exported JDE Detector model should have tracker."
        cfg = self.pred_config.tracker
        min_box_area = cfg.get('min_box_area', 0.0)
        vertical_ratio = cfg.get('vertical_ratio', 0.0)
        conf_thres = cfg.get('conf_thres', 0.0)
        tracked_thresh = cfg.get('tracked_thresh', 0.7)
        metric_type = cfg.get('metric_type', 'euclidean')
        self.tracker = JDETracker(
            num_classes=self.num_classes,
            min_box_area=min_box_area,
            vertical_ratio=vertical_ratio,
            conf_thres=conf_thres,
            tracked_thresh=tracked_thresh,
            metric_type=metric_type)
    def postprocess(self, inputs, result):
        # postprocess output of predictor
        np_boxes = result['pred_dets']
        if np_boxes.shape[0] <= 0:
            print('[WARNNING] No object detected.')
            result = {'pred_dets': np.zeros([0, 6]), 'pred_embs': None}
        result = {k: v for k, v in result.items() if v is not None}
        return result
    def tracking(self, det_results):
        pred_dets = det_results['pred_dets']  # cls_id, score, x0, y0, x1, y1
        pred_embs = det_results['pred_embs']
        online_targets_dict = self.tracker.update(pred_dets, pred_embs)
        online_tlwhs = defaultdict(list)
        online_scores = defaultdict(list)
        online_ids = defaultdict(list)
        for cls_id in range(self.num_classes):
            online_targets = online_targets_dict[cls_id]
            for t in online_targets:
                tlwh = t.tlwh
                tid = t.track_id
                tscore = t.score
                if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: continue
                if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
                        3] > self.tracker.vertical_ratio:
                    continue
                online_tlwhs[cls_id].append(tlwh)
                online_ids[cls_id].append(tid)
                online_scores[cls_id].append(tscore)
        return online_tlwhs, online_scores, online_ids
    def predict(self, repeats=1):
        '''
        Args:
            repeats (int): repeats number for prediction
        Returns:
            result (dict): include 'pred_dets': np.ndarray: shape:[N,6], N: number of box,
                            matix element:[class, score, x_min, y_min, x_max, y_max]
                            FairMOT(JDE)'s result include 'pred_embs': np.ndarray:
                            shape: [N, 128]
        '''
        # model prediction
        np_pred_dets, np_pred_embs = None, None
        for i in range(repeats):
            self.predictor.run()
            output_names = self.predictor.get_output_names()
            boxes_tensor = self.predictor.get_output_handle(output_names[0])
            np_pred_dets = boxes_tensor.copy_to_cpu()
            embs_tensor = self.predictor.get_output_handle(output_names[1])
            np_pred_embs = embs_tensor.copy_to_cpu()
        result = dict(pred_dets=np_pred_dets, pred_embs=np_pred_embs)
        return result
    def predict_image(self,
                      image_list,
                      run_benchmark=False,
                      repeats=1,
                      visual=True,
                      seq_name=None):
        mot_results = []
        num_classes = self.num_classes
        image_list.sort()
        ids2names = self.pred_config.labels
        data_type = 'mcmot' if num_classes > 1 else 'mot'
        for frame_id, img_file in enumerate(image_list):
            batch_image_list = [img_file]  # bs=1 in MOT model
            if run_benchmark:
                # preprocess
                inputs = self.preprocess(batch_image_list)  # warmup
                self.det_times.preprocess_time_s.start()
                inputs = self.preprocess(batch_image_list)
                self.det_times.preprocess_time_s.end()
                # model prediction
                result_warmup = self.predict(repeats=repeats)  # warmup
                self.det_times.inference_time_s.start()
                result = self.predict(repeats=repeats)
                self.det_times.inference_time_s.end(repeats=repeats)
                # postprocess
                result_warmup = self.postprocess(inputs, result)  # warmup
                self.det_times.postprocess_time_s.start()
                det_result = self.postprocess(inputs, result)
                self.det_times.postprocess_time_s.end()
                # tracking
                result_warmup = self.tracking(det_result)
                self.det_times.tracking_time_s.start()
                online_tlwhs, online_scores, online_ids = self.tracking(
                    det_result)
                self.det_times.tracking_time_s.end()
                self.det_times.img_num += 1
                cm, gm, gu = get_current_memory_mb()
                self.cpu_mem += cm
                self.gpu_mem += gm
                self.gpu_util += gu
            else:
                self.det_times.preprocess_time_s.start()
                inputs = self.preprocess(batch_image_list)
                self.det_times.preprocess_time_s.end()
                self.det_times.inference_time_s.start()
                result = self.predict()
                self.det_times.inference_time_s.end()
                self.det_times.postprocess_time_s.start()
                det_result = self.postprocess(inputs, result)
                self.det_times.postprocess_time_s.end()
                # tracking process
                self.det_times.tracking_time_s.start()
                online_tlwhs, online_scores, online_ids = self.tracking(
                    det_result)
                self.det_times.tracking_time_s.end()
                self.det_times.img_num += 1
            if visual:
                if len(image_list) > 1 and frame_id % 10 == 0:
                    print('Tracking frame {}'.format(frame_id))
                frame, _ = decode_image(img_file, {})
                im = plot_tracking_dict(
                    frame,
                    num_classes,
                    online_tlwhs,
                    online_ids,
                    online_scores,
                    frame_id=frame_id,
                    ids2names=ids2names)
                if seq_name is None:
                    seq_name = image_list[0].split('/')[-2]
                save_dir = os.path.join(self.output_dir, seq_name)
                if not os.path.exists(save_dir):
                    os.makedirs(save_dir)
                cv2.imwrite(
                    os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im)
            mot_results.append([online_tlwhs, online_scores, online_ids])
        return mot_results
    def predict_video(self, video_file, camera_id):
        video_out_name = 'mot_output.mp4'
        if camera_id != -1:
            capture = cv2.VideoCapture(camera_id)
        else:
            capture = cv2.VideoCapture(video_file)
            video_out_name = os.path.split(video_file)[-1]
        # Get Video info : resolution, fps, frame count
        width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fps = int(capture.get(cv2.CAP_PROP_FPS))
        frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
        print("fps: %d, frame_count: %d" % (fps, frame_count))
        if not os.path.exists(self.output_dir):
            os.makedirs(self.output_dir)
        out_path = os.path.join(self.output_dir, video_out_name)
        video_format = 'mp4v'
        fourcc = cv2.VideoWriter_fourcc(*video_format)
        writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
        frame_id = 1
        timer = MOTTimer()
        results = defaultdict(list)  # support single class and multi classes
        num_classes = self.num_classes
        data_type = 'mcmot' if num_classes > 1 else 'mot'
        ids2names = self.pred_config.labels
        while (1):
            ret, frame = capture.read()
            if not ret:
                break
            if frame_id % 10 == 0:
                print('Tracking frame: %d' % (frame_id))
            frame_id += 1
            timer.tic()
            seq_name = video_out_name.split('.')[0]
            mot_results = self.predict_image(
                [frame[:, :, ::-1]], visual=False, seq_name=seq_name)
            timer.toc()
            online_tlwhs, online_scores, online_ids = mot_results[0]
            for cls_id in range(num_classes):
                results[cls_id].append(
                    (frame_id + 1, online_tlwhs[cls_id], online_scores[cls_id],
                     online_ids[cls_id]))
            fps = 1. / timer.duration
            im = plot_tracking_dict(
                frame,
                num_classes,
                online_tlwhs,
                online_ids,
                online_scores,
                frame_id=frame_id,
                fps=fps,
                ids2names=ids2names)
            writer.write(im)
            if camera_id != -1:
                cv2.imshow('Mask Detection', im)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
        if self.save_mot_txts:
            result_filename = os.path.join(
                self.output_dir, video_out_name.split('.')[-2] + '.txt')
            write_mot_results(result_filename, results, data_type, num_classes)
        writer.release()
 def main():
    detector = JDE_Detector(
        FLAGS.model_dir,
        tracker_config=None,
        device=FLAGS.device,
        run_mode=FLAGS.run_mode,
        batch_size=1,
        trt_min_shape=FLAGS.trt_min_shape,
        trt_max_shape=FLAGS.trt_max_shape,
        trt_opt_shape=FLAGS.trt_opt_shape,
        trt_calib_mode=FLAGS.trt_calib_mode,
        cpu_threads=FLAGS.cpu_threads,
        enable_mkldnn=FLAGS.enable_mkldnn,
        output_dir=FLAGS.output_dir,
        threshold=FLAGS.threshold,
        save_images=FLAGS.save_images,
        save_mot_txts=FLAGS.save_mot_txts)
    # predict from video file or camera video stream
    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
        detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
    else:
        # predict from image
        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
        detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10)
        if not FLAGS.run_benchmark:
            detector.det_times.info(average=True)
        else:
            mode = FLAGS.run_mode
            model_dir = FLAGS.model_dir
            model_info = {
                'model_name': model_dir.strip('/').split('/')[-1],
                'precision': mode.split('_')[-1]
            }
            bench_log(detector, img_list, model_info, name='MOT')
 if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    FLAGS = parser.parse_args()
    print_arguments(FLAGS)
    FLAGS.device = FLAGS.device.upper()
    assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
                            ], "device should be CPU, GPU, NPU or XPU"
    main()
--- a/third-party/paddle-inference/mot_keypoint_unite_infer.py
+++ b/third-party/paddle-inference/mot_keypoint_unite_infer.py
@@ -0,0 +1,301 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import json
 import cv2
 import math
 import numpy as np
 import paddle
 import yaml
 import copy
 from collections import defaultdict
 from mot_keypoint_unite_utils import argsparser
 from preprocess import decode_image
 from infer import print_arguments, get_test_images, bench_log
 from mot_sde_infer import SDE_Detector
 from mot_jde_infer import JDE_Detector, MOT_JDE_SUPPORT_MODELS
 from keypoint_infer import KeyPointDetector, KEYPOINT_SUPPORT_MODELS
 from det_keypoint_unite_infer import predict_with_given_det
 from visualize import visualize_pose
 from benchmark_utils import PaddleInferBenchmark
 from utils import get_current_memory_mb
 from keypoint_postprocess import translate_to_ori_images
 # add python path
 import sys
 parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
 sys.path.insert(0, parent_path)
 from pptracking.python.mot.visualize import plot_tracking, plot_tracking_dict
 from pptracking.python.mot.utils import MOTTimer as FPSTimer
 def convert_mot_to_det(tlwhs, scores):
    results = {}
    num_mot = len(tlwhs)
    xyxys = copy.deepcopy(tlwhs)
    for xyxy in xyxys.copy():
        xyxy[2:] = xyxy[2:] + xyxy[:2]
    # support single class now
    results['boxes'] = np.vstack(
        [np.hstack([0, scores[i], xyxys[i]]) for i in range(num_mot)])
    results['boxes_num'] = np.array([num_mot])
    return results
 def mot_topdown_unite_predict(mot_detector,
                              topdown_keypoint_detector,
                              image_list,
                              keypoint_batch_size=1,
                              save_res=False):
    det_timer = mot_detector.get_timer()
    store_res = []
    image_list.sort()
    num_classes = mot_detector.num_classes
    for i, img_file in enumerate(image_list):
        # Decode image in advance in mot + pose prediction
        det_timer.preprocess_time_s.start()
        image, _ = decode_image(img_file, {})
        det_timer.preprocess_time_s.end()
        if FLAGS.run_benchmark:
            mot_results = mot_detector.predict_image(
                [image], run_benchmark=True, repeats=10)
            cm, gm, gu = get_current_memory_mb()
            mot_detector.cpu_mem += cm
            mot_detector.gpu_mem += gm
            mot_detector.gpu_util += gu
        else:
            mot_results = mot_detector.predict_image([image], visual=False)
        online_tlwhs, online_scores, online_ids = mot_results[
            0]  # only support bs=1 in MOT model
        results = convert_mot_to_det(
            online_tlwhs[0],
            online_scores[0])  # only support single class for mot + pose
        if results['boxes_num'] == 0:
            continue
        keypoint_res = predict_with_given_det(
            image, results, topdown_keypoint_detector, keypoint_batch_size,
            FLAGS.run_benchmark)
        if save_res:
            save_name = img_file if isinstance(img_file, str) else i
            store_res.append([
                save_name, keypoint_res['bbox'],
                [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]]
            ])
        if FLAGS.run_benchmark:
            cm, gm, gu = get_current_memory_mb()
            topdown_keypoint_detector.cpu_mem += cm
            topdown_keypoint_detector.gpu_mem += gm
            topdown_keypoint_detector.gpu_util += gu
        else:
            if not os.path.exists(FLAGS.output_dir):
                os.makedirs(FLAGS.output_dir)
            visualize_pose(
                img_file,
                keypoint_res,
                visual_thresh=FLAGS.keypoint_threshold,
                save_dir=FLAGS.output_dir)
    if save_res:
        """
        1) store_res: a list of image_data
        2) image_data: [imageid, rects, [keypoints, scores]]
        3) rects: list of rect [xmin, ymin, xmax, ymax]
        4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list
        5) scores: mean of all joint conf
        """
        with open("det_keypoint_unite_image_results.json", 'w') as wf:
            json.dump(store_res, wf, indent=4)
 def mot_topdown_unite_predict_video(mot_detector,
                                    topdown_keypoint_detector,
                                    camera_id,
                                    keypoint_batch_size=1,
                                    save_res=False):
    video_name = 'output.mp4'
    if camera_id != -1:
        capture = cv2.VideoCapture(camera_id)
    else:
        capture = cv2.VideoCapture(FLAGS.video_file)
        video_name = os.path.split(FLAGS.video_file)[-1]
    # Get Video info : resolution, fps, frame count
    width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(capture.get(cv2.CAP_PROP_FPS))
    frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
    print("fps: %d, frame_count: %d" % (fps, frame_count))
    if not os.path.exists(FLAGS.output_dir):
        os.makedirs(FLAGS.output_dir)
    out_path = os.path.join(FLAGS.output_dir, video_name)
    fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
    writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
    frame_id = 0
    timer_mot, timer_kp, timer_mot_kp = FPSTimer(), FPSTimer(), FPSTimer()
    num_classes = mot_detector.num_classes
    assert num_classes == 1, 'Only one category mot model supported for uniting keypoint deploy.'
    data_type = 'mot'
    while (1):
        ret, frame = capture.read()
        if not ret:
            break
        if frame_id % 10 == 0:
            print('Tracking frame: %d' % (frame_id))
        frame_id += 1
        timer_mot_kp.tic()
        # mot model
        timer_mot.tic()
        frame2 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        mot_results = mot_detector.predict_image([frame2], visual=False)
        timer_mot.toc()
        online_tlwhs, online_scores, online_ids = mot_results[0]
        results = convert_mot_to_det(
            online_tlwhs[0],
            online_scores[0])  # only support single class for mot + pose
        if results['boxes_num'] == 0:
            continue
        # keypoint model
        timer_kp.tic()
        keypoint_res = predict_with_given_det(
            frame2, results, topdown_keypoint_detector, keypoint_batch_size,
            FLAGS.run_benchmark)
        timer_kp.toc()
        timer_mot_kp.toc()
        kp_fps = 1. / timer_kp.duration
        mot_kp_fps = 1. / timer_mot_kp.duration
        im = visualize_pose(
            frame,
            keypoint_res,
            visual_thresh=FLAGS.keypoint_threshold,
            returnimg=True,
            ids=online_ids[0])
        im = plot_tracking_dict(
            im,
            num_classes,
            online_tlwhs,
            online_ids,
            online_scores,
            frame_id=frame_id,
            fps=mot_kp_fps)
        writer.write(im)
        if camera_id != -1:
            cv2.imshow('Tracking and keypoint results', im)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
    writer.release()
    print('output_video saved to: {}'.format(out_path))
 def main():
    deploy_file = os.path.join(FLAGS.mot_model_dir, 'infer_cfg.yml')
    with open(deploy_file) as f:
        yml_conf = yaml.safe_load(f)
    arch = yml_conf['arch']
    mot_detector_func = 'SDE_Detector'
    if arch in MOT_JDE_SUPPORT_MODELS:
        mot_detector_func = 'JDE_Detector'
    mot_detector = eval(mot_detector_func)(FLAGS.mot_model_dir,
                                           FLAGS.tracker_config,
                                           device=FLAGS.device,
                                           run_mode=FLAGS.run_mode,
                                           batch_size=1,
                                           trt_min_shape=FLAGS.trt_min_shape,
                                           trt_max_shape=FLAGS.trt_max_shape,
                                           trt_opt_shape=FLAGS.trt_opt_shape,
                                           trt_calib_mode=FLAGS.trt_calib_mode,
                                           cpu_threads=FLAGS.cpu_threads,
                                           enable_mkldnn=FLAGS.enable_mkldnn,
                                           threshold=FLAGS.mot_threshold,
                                           output_dir=FLAGS.output_dir)
    topdown_keypoint_detector = KeyPointDetector(
        FLAGS.keypoint_model_dir,
        device=FLAGS.device,
        run_mode=FLAGS.run_mode,
        batch_size=FLAGS.keypoint_batch_size,
        trt_min_shape=FLAGS.trt_min_shape,
        trt_max_shape=FLAGS.trt_max_shape,
        trt_opt_shape=FLAGS.trt_opt_shape,
        trt_calib_mode=FLAGS.trt_calib_mode,
        cpu_threads=FLAGS.cpu_threads,
        enable_mkldnn=FLAGS.enable_mkldnn,
        threshold=FLAGS.keypoint_threshold,
        output_dir=FLAGS.output_dir,
        use_dark=FLAGS.use_dark)
    keypoint_arch = topdown_keypoint_detector.pred_config.arch
    assert KEYPOINT_SUPPORT_MODELS[
        keypoint_arch] == 'keypoint_topdown', 'MOT-Keypoint unite inference only supports topdown models.'
    # predict from video file or camera video stream
    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
        mot_topdown_unite_predict_video(
            mot_detector, topdown_keypoint_detector, FLAGS.camera_id,
            FLAGS.keypoint_batch_size, FLAGS.save_res)
    else:
        # predict from image
        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
        mot_topdown_unite_predict(mot_detector, topdown_keypoint_detector,
                                  img_list, FLAGS.keypoint_batch_size,
                                  FLAGS.save_res)
        if not FLAGS.run_benchmark:
            mot_detector.det_times.info(average=True)
            topdown_keypoint_detector.det_times.info(average=True)
        else:
            mode = FLAGS.run_mode
            mot_model_dir = FLAGS.mot_model_dir
            mot_model_info = {
                'model_name': mot_model_dir.strip('/').split('/')[-1],
                'precision': mode.split('_')[-1]
            }
            bench_log(mot_detector, img_list, mot_model_info, name='MOT')
            keypoint_model_dir = FLAGS.keypoint_model_dir
            keypoint_model_info = {
                'model_name': keypoint_model_dir.strip('/').split('/')[-1],
                'precision': mode.split('_')[-1]
            }
            bench_log(topdown_keypoint_detector, img_list, keypoint_model_info,
                      FLAGS.keypoint_batch_size, 'KeyPoint')
 if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    FLAGS = parser.parse_args()
    print_arguments(FLAGS)
    FLAGS.device = FLAGS.device.upper()
    assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
                            ], "device should be CPU, GPU, NPU or XPU"
    main()
--- a/third-party/paddle-inference/mot_keypoint_unite_utils.py
+++ b/third-party/paddle-inference/mot_keypoint_unite_utils.py
@@ -0,0 +1,139 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import ast
 import argparse
 def argsparser():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument(
        "--mot_model_dir",
        type=str,
        default=None,
        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
              "'infer_cfg.yml', created by tools/export_model.py."),
        required=True)
    parser.add_argument(
        "--keypoint_model_dir",
        type=str,
        default=None,
        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
              "'infer_cfg.yml', created by tools/export_model.py."),
        required=True)
    parser.add_argument(
        "--image_file", type=str, default=None, help="Path of image file.")
    parser.add_argument(
        "--image_dir",
        type=str,
        default=None,
        help="Dir of image file, `image_file` has a higher priority.")
    parser.add_argument(
        "--keypoint_batch_size",
        type=int,
        default=1,
        help=("batch_size for keypoint inference. In detection-keypoint unit"
              "inference, the batch size in detection is 1. Then collate det "
              "result in batch for keypoint inference."))
    parser.add_argument(
        "--video_file",
        type=str,
        default=None,
        help="Path of video file, `video_file` or `camera_id` has a highest priority."
    )
    parser.add_argument(
        "--camera_id",
        type=int,
        default=-1,
        help="device id of camera to predict.")
    parser.add_argument(
        "--mot_threshold", type=float, default=0.5, help="Threshold of score.")
    parser.add_argument(
        "--keypoint_threshold",
        type=float,
        default=0.5,
        help="Threshold of score.")
    parser.add_argument(
        "--output_dir",
        type=str,
        default="output",
        help="Directory of output visualization files.")
    parser.add_argument(
        "--run_mode",
        type=str,
        default='paddle',
        help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
    parser.add_argument(
        "--device",
        type=str,
        default='cpu',
        help="Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU."
    )
    parser.add_argument(
        "--run_benchmark",
        type=ast.literal_eval,
        default=False,
        help="Whether to predict a image_file repeatedly for benchmark")
    parser.add_argument(
        "--enable_mkldnn",
        type=ast.literal_eval,
        default=False,
        help="Whether use mkldnn with CPU.")
    parser.add_argument(
        "--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
    parser.add_argument(
        "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
    parser.add_argument(
        "--trt_max_shape",
        type=int,
        default=1088,
        help="max_shape for TensorRT.")
    parser.add_argument(
        "--trt_opt_shape",
        type=int,
        default=608,
        help="opt_shape for TensorRT.")
    parser.add_argument(
        "--trt_calib_mode",
        type=bool,
        default=False,
        help="If the model is produced by TRT offline quantitative "
        "calibration, trt_calib_mode need to set True.")
    parser.add_argument(
        '--save_images',
        action='store_true',
        help='Save visualization image results.')
    parser.add_argument(
        '--save_mot_txts',
        action='store_true',
        help='Save tracking results (txt).')
    parser.add_argument(
        '--use_dark',
        type=bool,
        default=True,
        help='whether to use darkpose to get better keypoint position predict ')
    parser.add_argument(
        '--save_res',
        type=bool,
        default=False,
        help=(
            "whether to save predict results to json file"
            "1) store_res: a list of image_data"
            "2) image_data: [imageid, rects, [keypoints, scores]]"
            "3) rects: list of rect [xmin, ymin, xmax, ymax]"
            "4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list"
            "5) scores: mean of all joint conf"))
    parser.add_argument(
        "--tracker_config", type=str, default=None, help=("tracker donfig"))
    return parser
--- a/third-party/paddle-inference/mot_sde_infer.py
+++ b/third-party/paddle-inference/mot_sde_infer.py
@@ -0,0 +1,522 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import time
 import yaml
 import cv2
 import numpy as np
 from collections import defaultdict
 import paddle
 from benchmark_utils import PaddleInferBenchmark
 from preprocess import decode_image
 from utils import argsparser, Timer, get_current_memory_mb
 from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig, load_predictor
 # add python path
 import sys
 parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
 sys.path.insert(0, parent_path)
 from pptracking.python.mot import JDETracker, DeepSORTTracker
 from pptracking.python.mot.utils import MOTTimer, write_mot_results, get_crops, clip_box
 from pptracking.python.mot.visualize import plot_tracking, plot_tracking_dict
 class SDE_Detector(Detector):
    """
    Args:
        model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
        tracker_config (str): tracker config path
        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
        batch_size (int): size of pre batch in inference
        trt_min_shape (int): min shape for dynamic shape in trt
        trt_max_shape (int): max shape for dynamic shape in trt
        trt_opt_shape (int): opt shape for dynamic shape in trt
        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
            calibration, trt_calib_mode need to set True
        cpu_threads (int): cpu threads
        enable_mkldnn (bool): whether to open MKLDNN
        output_dir (string): The path of output, default as 'output'
        threshold (float): Score threshold of the detected bbox, default as 0.5
        save_images (bool): Whether to save visualization image results, default as False
        save_mot_txts (bool): Whether to save tracking results (txt), default as False
        reid_model_dir (str): reid model dir, default None for ByteTrack, but set for DeepSORT
    """
    def __init__(self,
                 model_dir,
                 tracker_config,
                 device='CPU',
                 run_mode='paddle',
                 batch_size=1,
                 trt_min_shape=1,
                 trt_max_shape=1280,
                 trt_opt_shape=640,
                 trt_calib_mode=False,
                 cpu_threads=1,
                 enable_mkldnn=False,
                 output_dir='output',
                 threshold=0.5,
                 save_images=False,
                 save_mot_txts=False,
                 reid_model_dir=None):
        super(SDE_Detector, self).__init__(
            model_dir=model_dir,
            device=device,
            run_mode=run_mode,
            batch_size=batch_size,
            trt_min_shape=trt_min_shape,
            trt_max_shape=trt_max_shape,
            trt_opt_shape=trt_opt_shape,
            trt_calib_mode=trt_calib_mode,
            cpu_threads=cpu_threads,
            enable_mkldnn=enable_mkldnn,
            output_dir=output_dir,
            threshold=threshold, )
        self.save_images = save_images
        self.save_mot_txts = save_mot_txts
        assert batch_size == 1, "MOT model only supports batch_size=1."
        self.det_times = Timer(with_tracker=True)
        self.num_classes = len(self.pred_config.labels)
        # reid config
        self.use_reid = False if reid_model_dir is None else True
        if self.use_reid:
            self.reid_pred_config = self.set_config(reid_model_dir)
            self.reid_predictor, self.config = load_predictor(
                reid_model_dir,
                run_mode=run_mode,
                batch_size=50,  # reid_batch_size
                min_subgraph_size=self.reid_pred_config.min_subgraph_size,
                device=device,
                use_dynamic_shape=self.reid_pred_config.use_dynamic_shape,
                trt_min_shape=trt_min_shape,
                trt_max_shape=trt_max_shape,
                trt_opt_shape=trt_opt_shape,
                trt_calib_mode=trt_calib_mode,
                cpu_threads=cpu_threads,
                enable_mkldnn=enable_mkldnn)
        else:
            self.reid_pred_config = None
            self.reid_predictor = None
        assert tracker_config is not None, 'Note that tracker_config should be set.'
        self.tracker_config = tracker_config
        tracker_cfg = yaml.safe_load(open(self.tracker_config))
        cfg = tracker_cfg[tracker_cfg['type']]
        # tracker config
        self.use_deepsort_tracker = True if tracker_cfg[
            'type'] == 'DeepSORTTracker' else False
        if self.use_deepsort_tracker:
            # use DeepSORTTracker
            if self.reid_pred_config is not None and hasattr(
                    self.reid_pred_config, 'tracker'):
                cfg = self.reid_pred_config.tracker
            budget = cfg.get('budget', 100)
            max_age = cfg.get('max_age', 30)
            max_iou_distance = cfg.get('max_iou_distance', 0.7)
            matching_threshold = cfg.get('matching_threshold', 0.2)
            min_box_area = cfg.get('min_box_area', 0)
            vertical_ratio = cfg.get('vertical_ratio', 0)
            self.tracker = DeepSORTTracker(
                budget=budget,
                max_age=max_age,
                max_iou_distance=max_iou_distance,
                matching_threshold=matching_threshold,
                min_box_area=min_box_area,
                vertical_ratio=vertical_ratio, )
        else:
            # use ByteTracker
            use_byte = cfg.get('use_byte', False)
            det_thresh = cfg.get('det_thresh', 0.3)
            min_box_area = cfg.get('min_box_area', 0)
            vertical_ratio = cfg.get('vertical_ratio', 0)
            match_thres = cfg.get('match_thres', 0.9)
            conf_thres = cfg.get('conf_thres', 0.6)
            low_conf_thres = cfg.get('low_conf_thres', 0.1)
            self.tracker = JDETracker(
                use_byte=use_byte,
                det_thresh=det_thresh,
                num_classes=self.num_classes,
                min_box_area=min_box_area,
                vertical_ratio=vertical_ratio,
                match_thres=match_thres,
                conf_thres=conf_thres,
                low_conf_thres=low_conf_thres, )
    def postprocess(self, inputs, result):
        # postprocess output of predictor
        np_boxes_num = result['boxes_num']
        if np_boxes_num[0] <= 0:
            print('[WARNNING] No object detected.')
            result = {'boxes': np.zeros([0, 6]), 'boxes_num': [0]}
        result = {k: v for k, v in result.items() if v is not None}
        return result
    def reidprocess(self, det_results, repeats=1):
        pred_dets = det_results['boxes']
        pred_xyxys = pred_dets[:, 2:6]
        ori_image = det_results['ori_image']
        ori_image_shape = ori_image.shape[:2]
        pred_xyxys, keep_idx = clip_box(pred_xyxys, ori_image_shape)
        if len(keep_idx[0]) == 0:
            det_results['boxes'] = np.zeros((1, 6), dtype=np.float32)
            det_results['embeddings'] = None
            return det_results
        pred_dets = pred_dets[keep_idx[0]]
        pred_xyxys = pred_dets[:, 2:6]
        w, h = self.tracker.input_size
        crops = get_crops(pred_xyxys, ori_image, w, h)
        # to keep fast speed, only use topk crops
        crops = crops[:50]  # reid_batch_size
        det_results['crops'] = np.array(crops).astype('float32')
        det_results['boxes'] = pred_dets[:50]
        input_names = self.reid_predictor.get_input_names()
        for i in range(len(input_names)):
            input_tensor = self.reid_predictor.get_input_handle(input_names[i])
            input_tensor.copy_from_cpu(det_results[input_names[i]])
        # model prediction
        for i in range(repeats):
            self.reid_predictor.run()
            output_names = self.reid_predictor.get_output_names()
            feature_tensor = self.reid_predictor.get_output_handle(output_names[
                0])
            pred_embs = feature_tensor.copy_to_cpu()
        det_results['embeddings'] = pred_embs
        return det_results
    def tracking(self, det_results):
        pred_dets = det_results['boxes']  # 'cls_id, score, x0, y0, x1, y1'
        pred_embs = det_results.get('embeddings', None)
        if self.use_deepsort_tracker:
            # use DeepSORTTracker, only support singe class
            self.tracker.predict()
            online_targets = self.tracker.update(pred_dets, pred_embs)
            online_tlwhs, online_scores, online_ids = [], [], []
            for t in online_targets:
                if not t.is_confirmed() or t.time_since_update > 1:
                    continue
                tlwh = t.to_tlwh()
                tscore = t.score
                tid = t.track_id
                if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
                        3] > self.tracker.vertical_ratio:
                    continue
                online_tlwhs.append(tlwh)
                online_scores.append(tscore)
                online_ids.append(tid)
            tracking_outs = {
                'online_tlwhs': online_tlwhs,
                'online_scores': online_scores,
                'online_ids': online_ids,
            }
            return tracking_outs
        else:
            # use ByteTracker, support multiple class
            online_tlwhs = defaultdict(list)
            online_scores = defaultdict(list)
            online_ids = defaultdict(list)
            online_targets_dict = self.tracker.update(pred_dets, pred_embs)
            for cls_id in range(self.num_classes):
                online_targets = online_targets_dict[cls_id]
                for t in online_targets:
                    tlwh = t.tlwh
                    tid = t.track_id
                    tscore = t.score
                    if tlwh[2] * tlwh[3] <= self.tracker.min_box_area:
                        continue
                    if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
                            3] > self.tracker.vertical_ratio:
                        continue
                    online_tlwhs[cls_id].append(tlwh)
                    online_ids[cls_id].append(tid)
                    online_scores[cls_id].append(tscore)
            tracking_outs = {
                'online_tlwhs': online_tlwhs,
                'online_scores': online_scores,
                'online_ids': online_ids,
            }
            return tracking_outs
    def predict_image(self,
                      image_list,
                      run_benchmark=False,
                      repeats=1,
                      visual=True,
                      seq_name=None):
        num_classes = self.num_classes
        image_list.sort()
        ids2names = self.pred_config.labels
        mot_results = []
        for frame_id, img_file in enumerate(image_list):
            batch_image_list = [img_file]  # bs=1 in MOT model
            frame, _ = decode_image(img_file, {})
            if run_benchmark:
                # preprocess
                inputs = self.preprocess(batch_image_list)  # warmup
                self.det_times.preprocess_time_s.start()
                inputs = self.preprocess(batch_image_list)
                self.det_times.preprocess_time_s.end()
                # model prediction
                result_warmup = self.predict(repeats=repeats)  # warmup
                self.det_times.inference_time_s.start()
                result = self.predict(repeats=repeats)
                self.det_times.inference_time_s.end(repeats=repeats)
                # postprocess
                result_warmup = self.postprocess(inputs, result)  # warmup
                self.det_times.postprocess_time_s.start()
                det_result = self.postprocess(inputs, result)
                self.det_times.postprocess_time_s.end()
                # tracking
                if self.use_reid:
                    det_result['frame_id'] = frame_id
                    det_result['seq_name'] = seq_name
                    det_result['ori_image'] = frame
                    det_result = self.reidprocess(det_result)
                result_warmup = self.tracking(det_result)
                self.det_times.tracking_time_s.start()
                if self.use_reid:
                    det_result = self.reidprocess(det_result)
                tracking_outs = self.tracking(det_result)
                self.det_times.tracking_time_s.end()
                self.det_times.img_num += 1
                cm, gm, gu = get_current_memory_mb()
                self.cpu_mem += cm
                self.gpu_mem += gm
                self.gpu_util += gu
            else:
                self.det_times.preprocess_time_s.start()
                inputs = self.preprocess(batch_image_list)
                self.det_times.preprocess_time_s.end()
                self.det_times.inference_time_s.start()
                result = self.predict()
                self.det_times.inference_time_s.end()
                self.det_times.postprocess_time_s.start()
                det_result = self.postprocess(inputs, result)
                self.det_times.postprocess_time_s.end()
                # tracking process
                self.det_times.tracking_time_s.start()
                if self.use_reid:
                    det_result['frame_id'] = frame_id
                    det_result['seq_name'] = seq_name
                    det_result['ori_image'] = frame
                    det_result = self.reidprocess(det_result)
                tracking_outs = self.tracking(det_result)
                self.det_times.tracking_time_s.end()
                self.det_times.img_num += 1
            online_tlwhs = tracking_outs['online_tlwhs']
            online_scores = tracking_outs['online_scores']
            online_ids = tracking_outs['online_ids']
            mot_results.append([online_tlwhs, online_scores, online_ids])
            if visual:
                if len(image_list) > 1 and frame_id % 10 == 0:
                    print('Tracking frame {}'.format(frame_id))
                frame, _ = decode_image(img_file, {})
                if isinstance(online_tlwhs, defaultdict):
                    im = plot_tracking_dict(
                        frame,
                        num_classes,
                        online_tlwhs,
                        online_ids,
                        online_scores,
                        frame_id=frame_id,
                        ids2names=ids2names)
                else:
                    im = plot_tracking(
                        frame,
                        online_tlwhs,
                        online_ids,
                        online_scores,
                        frame_id=frame_id,
                        ids2names=ids2names)
                save_dir = os.path.join(self.output_dir, seq_name)
                if not os.path.exists(save_dir):
                    os.makedirs(save_dir)
                cv2.imwrite(
                    os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im)
        return mot_results
    def predict_video(self, video_file, camera_id):
        video_out_name = 'output.mp4'
        if camera_id != -1:
            capture = cv2.VideoCapture(camera_id)
        else:
            capture = cv2.VideoCapture(video_file)
            video_out_name = os.path.split(video_file)[-1]
        # Get Video info : resolution, fps, frame count
        width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fps = int(capture.get(cv2.CAP_PROP_FPS))
        frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
        print("fps: %d, frame_count: %d" % (fps, frame_count))
        if not os.path.exists(self.output_dir):
            os.makedirs(self.output_dir)
        out_path = os.path.join(self.output_dir, video_out_name)
        video_format = 'mp4v'
        fourcc = cv2.VideoWriter_fourcc(*video_format)
        writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
        frame_id = 1
        timer = MOTTimer()
        results = defaultdict(list)
        num_classes = self.num_classes
        data_type = 'mcmot' if num_classes > 1 else 'mot'
        ids2names = self.pred_config.labels
        while (1):
            ret, frame = capture.read()
            if not ret:
                break
            if frame_id % 10 == 0:
                print('Tracking frame: %d' % (frame_id))
            frame_id += 1
            timer.tic()
            seq_name = video_out_name.split('.')[0]
            mot_results = self.predict_image(
                [frame[:, :, ::-1]], visual=False, seq_name=seq_name)
            timer.toc()
            # bs=1 in MOT model
            online_tlwhs, online_scores, online_ids = mot_results[0]
            fps = 1. / timer.duration
            if self.use_deepsort_tracker:
                # use DeepSORTTracker, only support singe class
                results[0].append(
                    (frame_id + 1, online_tlwhs, online_scores, online_ids))
                im = plot_tracking(
                    frame,
                    online_tlwhs,
                    online_ids,
                    online_scores,
                    frame_id=frame_id,
                    fps=fps,
                    ids2names=ids2names)
            else:
                # use ByteTracker, support multiple class
                for cls_id in range(num_classes):
                    results[cls_id].append(
                        (frame_id + 1, online_tlwhs[cls_id],
                         online_scores[cls_id], online_ids[cls_id]))
                im = plot_tracking_dict(
                    frame,
                    num_classes,
                    online_tlwhs,
                    online_ids,
                    online_scores,
                    frame_id=frame_id,
                    fps=fps,
                    ids2names=ids2names)
            writer.write(im)
            if camera_id != -1:
                cv2.imshow('Mask Detection', im)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
        if self.save_mot_txts:
            result_filename = os.path.join(
                self.output_dir, video_out_name.split('.')[-2] + '.txt')
            write_mot_results(result_filename, results)
        writer.release()
 def main():
    deploy_file = os.path.join(FLAGS.model_dir, 'infer_cfg.yml')
    with open(deploy_file) as f:
        yml_conf = yaml.safe_load(f)
    arch = yml_conf['arch']
    detector = SDE_Detector(
        FLAGS.model_dir,
        tracker_config=FLAGS.tracker_config,
        device=FLAGS.device,
        run_mode=FLAGS.run_mode,
        batch_size=1,
        trt_min_shape=FLAGS.trt_min_shape,
        trt_max_shape=FLAGS.trt_max_shape,
        trt_opt_shape=FLAGS.trt_opt_shape,
        trt_calib_mode=FLAGS.trt_calib_mode,
        cpu_threads=FLAGS.cpu_threads,
        enable_mkldnn=FLAGS.enable_mkldnn,
        output_dir=FLAGS.output_dir,
        threshold=FLAGS.threshold,
        save_images=FLAGS.save_images,
        save_mot_txts=FLAGS.save_mot_txts, )
    # predict from video file or camera video stream
    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
        detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
    else:
        # predict from image
        if FLAGS.image_dir is None and FLAGS.image_file is not None:
            assert FLAGS.batch_size == 1, "--batch_size should be 1 in MOT models."
        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
        seq_name = FLAGS.image_dir.split('/')[-1]
        detector.predict_image(
            img_list, FLAGS.run_benchmark, repeats=10, seq_name=seq_name)
        if not FLAGS.run_benchmark:
            detector.det_times.info(average=True)
        else:
            mode = FLAGS.run_mode
            model_dir = FLAGS.model_dir
            model_info = {
                'model_name': model_dir.strip('/').split('/')[-1],
                'precision': mode.split('_')[-1]
            }
            bench_log(detector, img_list, model_info, name='MOT')
 if __name__ == '__main__':
    paddle.enable_static()
    parser = argsparser()
    FLAGS = parser.parse_args()
    print_arguments(FLAGS)
    FLAGS.device = FLAGS.device.upper()
    assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
                            ], "device should be CPU, GPU, NPU or XPU"
    main()
--- a/third-party/paddle-inference/picodet_postprocess.py
+++ b/third-party/paddle-inference/picodet_postprocess.py
@@ -0,0 +1,227 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import numpy as np
 from scipy.special import softmax
 def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200):
    """
    Args:
        box_scores (N, 5): boxes in corner-form and probabilities.
        iou_threshold: intersection over union threshold.
        top_k: keep top_k results. If k <= 0, keep all the results.
        candidate_size: only consider the candidates with the highest scores.
    Returns:
         picked: a list of indexes of the kept boxes
    """
    scores = box_scores[:, -1]
    boxes = box_scores[:, :-1]
    picked = []
    indexes = np.argsort(scores)
    indexes = indexes[-candidate_size:]
    while len(indexes) > 0:
        current = indexes[-1]
        picked.append(current)
        if 0 < top_k == len(picked) or len(indexes) == 1:
            break
        current_box = boxes[current, :]
        indexes = indexes[:-1]
        rest_boxes = boxes[indexes, :]
        iou = iou_of(
            rest_boxes,
            np.expand_dims(
                current_box, axis=0), )
        indexes = indexes[iou <= iou_threshold]
    return box_scores[picked, :]
 def iou_of(boxes0, boxes1, eps=1e-5):
    """Return intersection-over-union (Jaccard index) of boxes.
    Args:
        boxes0 (N, 4): ground truth boxes.
        boxes1 (N or 1, 4): predicted boxes.
        eps: a small number to avoid 0 as denominator.
    Returns:
        iou (N): IoU values.
    """
    overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2])
    overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:])
    overlap_area = area_of(overlap_left_top, overlap_right_bottom)
    area0 = area_of(boxes0[..., :2], boxes0[..., 2:])
    area1 = area_of(boxes1[..., :2], boxes1[..., 2:])
    return overlap_area / (area0 + area1 - overlap_area + eps)
 def area_of(left_top, right_bottom):
    """Compute the areas of rectangles given two corners.
    Args:
        left_top (N, 2): left top corner.
        right_bottom (N, 2): right bottom corner.
    Returns:
        area (N): return the area.
    """
    hw = np.clip(right_bottom - left_top, 0.0, None)
    return hw[..., 0] * hw[..., 1]
 class PicoDetPostProcess(object):
    """
    Args:
        input_shape (int): network input image size
        ori_shape (int): ori image shape of before padding
        scale_factor (float): scale factor of ori image
        enable_mkldnn (bool): whether to open MKLDNN
    """
    def __init__(self,
                 input_shape,
                 ori_shape,
                 scale_factor,
                 strides=[8, 16, 32, 64],
                 score_threshold=0.4,
                 nms_threshold=0.5,
                 nms_top_k=1000,
                 keep_top_k=100):
        self.ori_shape = ori_shape
        self.input_shape = input_shape
        self.scale_factor = scale_factor
        self.strides = strides
        self.score_threshold = score_threshold
        self.nms_threshold = nms_threshold
        self.nms_top_k = nms_top_k
        self.keep_top_k = keep_top_k
    def warp_boxes(self, boxes, ori_shape):
        """Apply transform to boxes
        """
        width, height = ori_shape[1], ori_shape[0]
        n = len(boxes)
        if n:
            # warp points
            xy = np.ones((n * 4, 3))
            xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(
                n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
            # xy = xy @ M.T  # transform
            xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8)  # rescale
            # create new boxes
            x = xy[:, [0, 2, 4, 6]]
            y = xy[:, [1, 3, 5, 7]]
            xy = np.concatenate(
                (x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
            # clip boxes
            xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width)
            xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height)
            return xy.astype(np.float32)
        else:
            return boxes
    def __call__(self, scores, raw_boxes):
        batch_size = raw_boxes[0].shape[0]
        reg_max = int(raw_boxes[0].shape[-1] / 4 - 1)
        out_boxes_num = []
        out_boxes_list = []
        for batch_id in range(batch_size):
            # generate centers
            decode_boxes = []
            select_scores = []
            for stride, box_distribute, score in zip(self.strides, raw_boxes,
                                                     scores):
                box_distribute = box_distribute[batch_id]
                score = score[batch_id]
                # centers
                fm_h = self.input_shape[0] / stride
                fm_w = self.input_shape[1] / stride
                h_range = np.arange(fm_h)
                w_range = np.arange(fm_w)
                ww, hh = np.meshgrid(w_range, h_range)
                ct_row = (hh.flatten() + 0.5) * stride
                ct_col = (ww.flatten() + 0.5) * stride
                center = np.stack((ct_col, ct_row, ct_col, ct_row), axis=1)
                # box distribution to distance
                reg_range = np.arange(reg_max + 1)
                box_distance = box_distribute.reshape((-1, reg_max + 1))
                box_distance = softmax(box_distance, axis=1)
                box_distance = box_distance * np.expand_dims(reg_range, axis=0)
                box_distance = np.sum(box_distance, axis=1).reshape((-1, 4))
                box_distance = box_distance * stride
                # top K candidate
                topk_idx = np.argsort(score.max(axis=1))[::-1]
                topk_idx = topk_idx[:self.nms_top_k]
                center = center[topk_idx]
                score = score[topk_idx]
                box_distance = box_distance[topk_idx]
                # decode box
                decode_box = center + [-1, -1, 1, 1] * box_distance
                select_scores.append(score)
                decode_boxes.append(decode_box)
            # nms
            bboxes = np.concatenate(decode_boxes, axis=0)
            confidences = np.concatenate(select_scores, axis=0)
            picked_box_probs = []
            picked_labels = []
            for class_index in range(0, confidences.shape[1]):
                probs = confidences[:, class_index]
                mask = probs > self.score_threshold
                probs = probs[mask]
                if probs.shape[0] == 0:
                    continue
                subset_boxes = bboxes[mask, :]
                box_probs = np.concatenate(
                    [subset_boxes, probs.reshape(-1, 1)], axis=1)
                box_probs = hard_nms(
                    box_probs,
                    iou_threshold=self.nms_threshold,
                    top_k=self.keep_top_k, )
                picked_box_probs.append(box_probs)
                picked_labels.extend([class_index] * box_probs.shape[0])
            if len(picked_box_probs) == 0:
                out_boxes_list.append(np.empty((0, 4)))
                out_boxes_num.append(0)
            else:
                picked_box_probs = np.concatenate(picked_box_probs)
                # resize output boxes
                picked_box_probs[:, :4] = self.warp_boxes(
                    picked_box_probs[:, :4], self.ori_shape[batch_id])
                im_scale = np.concatenate([
                    self.scale_factor[batch_id][::-1],
                    self.scale_factor[batch_id][::-1]
                ])
                picked_box_probs[:, :4] /= im_scale
                # clas score box
                out_boxes_list.append(
                    np.concatenate(
                        [
                            np.expand_dims(
                                np.array(picked_labels),
                                axis=-1), np.expand_dims(
                                    picked_box_probs[:, 4], axis=-1),
                            picked_box_probs[:, :4]
                        ],
                        axis=1))
                out_boxes_num.append(len(picked_labels))
        out_boxes_list = np.concatenate(out_boxes_list, axis=0)
        out_boxes_num = np.asarray(out_boxes_num).astype(np.int32)
        return out_boxes_list, out_boxes_num
--- a/third-party/paddle-inference/preprocess.py
+++ b/third-party/paddle-inference/preprocess.py
@@ -0,0 +1,549 @@
 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import cv2
 import numpy as np
 import imgaug.augmenters as iaa
 from keypoint_preprocess import get_affine_transform
 from PIL import Image
 def decode_image(im_file, im_info):
    """read rgb image
    Args:
        im_file (str|np.ndarray): input can be image path or np.ndarray
        im_info (dict): info of image
    Returns:
        im (np.ndarray):  processed image (np.ndarray)
        im_info (dict): info of processed image
    """
    if isinstance(im_file, str):
        with open(im_file, 'rb') as f:
            im_read = f.read()
        data = np.frombuffer(im_read, dtype='uint8')
        im = cv2.imdecode(data, 1)  # BGR mode, but need RGB mode
        im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
    else:
        im = im_file
    im_info['im_shape'] = np.array(im.shape[:2], dtype=np.float32)
    im_info['scale_factor'] = np.array([1., 1.], dtype=np.float32)
    return im, im_info
 class Resize_Mult32(object):
    """resize image by target_size and max_size
    Args:
        target_size (int): the target size of image
        keep_ratio (bool): whether keep_ratio or not, default true
        interp (int): method of resize
    """
    def __init__(self, limit_side_len, limit_type, interp=cv2.INTER_LINEAR):
        self.limit_side_len = limit_side_len
        self.limit_type = limit_type
        self.interp = interp
    def __call__(self, im, im_info):
        """
        Args:
            im (np.ndarray): image (np.ndarray)
            im_info (dict): info of image
        Returns:
            im (np.ndarray):  processed image (np.ndarray)
            im_info (dict): info of processed image
        """
        im_channel = im.shape[2]
        im_scale_y, im_scale_x = self.generate_scale(im)
        im = cv2.resize(
            im,
            None,
            None,
            fx=im_scale_x,
            fy=im_scale_y,
            interpolation=self.interp)
        im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
        im_info['scale_factor'] = np.array(
            [im_scale_y, im_scale_x]).astype('float32')
        return im, im_info
    def generate_scale(self, img):
        """
        Args:
            img (np.ndarray): image (np.ndarray)
        Returns:
            im_scale_x: the resize ratio of X
            im_scale_y: the resize ratio of Y
        """
        limit_side_len = self.limit_side_len
        h, w, c = img.shape
        # limit the max side
        if self.limit_type == 'max':
            if h > w:
                ratio = float(limit_side_len) / h
            else:
                ratio = float(limit_side_len) / w
        elif self.limit_type == 'min':
            if h < w:
                ratio = float(limit_side_len) / h
            else:
                ratio = float(limit_side_len) / w
        elif self.limit_type == 'resize_long':
            ratio = float(limit_side_len) / max(h, w)
        else:
            raise Exception('not support limit type, image ')
        resize_h = int(h * ratio)
        resize_w = int(w * ratio)
        resize_h = max(int(round(resize_h / 32) * 32), 32)
        resize_w = max(int(round(resize_w / 32) * 32), 32)
        im_scale_y = resize_h / float(h)
        im_scale_x = resize_w / float(w)
        return im_scale_y, im_scale_x
 class Resize(object):
    """resize image by target_size and max_size
    Args:
        target_size (int): the target size of image
        keep_ratio (bool): whether keep_ratio or not, default true
        interp (int): method of resize
    """
    def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR):
        if isinstance(target_size, int):
            target_size = [target_size, target_size]
        self.target_size = target_size
        self.keep_ratio = keep_ratio
        self.interp = interp
    def __call__(self, im, im_info):
        """
        Args:
            im (np.ndarray): image (np.ndarray)
            im_info (dict): info of image
        Returns:
            im (np.ndarray):  processed image (np.ndarray)
            im_info (dict): info of processed image
        """
        assert len(self.target_size) == 2
        assert self.target_size[0] > 0 and self.target_size[1] > 0
        im_channel = im.shape[2]
        im_scale_y, im_scale_x = self.generate_scale(im)
        im = cv2.resize(
            im,
            None,
            None,
            fx=im_scale_x,
            fy=im_scale_y,
            interpolation=self.interp)
        im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
        im_info['scale_factor'] = np.array(
            [im_scale_y, im_scale_x]).astype('float32')
        return im, im_info
    def generate_scale(self, im):
        """
        Args:
            im (np.ndarray): image (np.ndarray)
        Returns:
            im_scale_x: the resize ratio of X
            im_scale_y: the resize ratio of Y
        """
        origin_shape = im.shape[:2]
        im_c = im.shape[2]
        if self.keep_ratio:
            im_size_min = np.min(origin_shape)
            im_size_max = np.max(origin_shape)
            target_size_min = np.min(self.target_size)
            target_size_max = np.max(self.target_size)
            im_scale = float(target_size_min) / float(im_size_min)
            if np.round(im_scale * im_size_max) > target_size_max:
                im_scale = float(target_size_max) / float(im_size_max)
            im_scale_x = im_scale
            im_scale_y = im_scale
        else:
            resize_h, resize_w = self.target_size
            im_scale_y = resize_h / float(origin_shape[0])
            im_scale_x = resize_w / float(origin_shape[1])
        return im_scale_y, im_scale_x
 class ShortSizeScale(object):
    """
    Scale images by short size.
    Args:
        short_size(float | int): Short size of an image will be scaled to the short_size.
        fixed_ratio(bool): Set whether to zoom according to a fixed ratio. default: True
        do_round(bool): Whether to round up when calculating the zoom ratio. default: False
        backend(str): Choose pillow or cv2 as the graphics processing backend. default: 'pillow'
    """
    def __init__(self,
                 short_size,
                 fixed_ratio=True,
                 keep_ratio=None,
                 do_round=False,
                 backend='pillow'):
        self.short_size = short_size
        assert (fixed_ratio and not keep_ratio) or (
            not fixed_ratio
        ), "fixed_ratio and keep_ratio cannot be true at the same time"
        self.fixed_ratio = fixed_ratio
        self.keep_ratio = keep_ratio
        self.do_round = do_round
        assert backend in [
            'pillow', 'cv2'
        ], "Scale's backend must be pillow or cv2, but get {backend}"
        self.backend = backend
    def __call__(self, img):
        """
        Performs resize operations.
        Args:
            img (PIL.Image): a PIL.Image.
        return:
            resized_img: a PIL.Image after scaling.
        """
        result_img = None
        if isinstance(img, np.ndarray):
            h, w, _ = img.shape
        elif isinstance(img, Image.Image):
            w, h = img.size
        else:
            raise NotImplementedError
        if w <= h:
            ow = self.short_size
            if self.fixed_ratio:  # default is True
                oh = int(self.short_size * 4.0 / 3.0)
            elif not self.keep_ratio:  # no
                oh = self.short_size
            else:
                scale_factor = self.short_size / w
                oh = int(h * float(scale_factor) +
                         0.5) if self.do_round else int(h * self.short_size / w)
                ow = int(w * float(scale_factor) +
                         0.5) if self.do_round else int(w * self.short_size / h)
        else:
            oh = self.short_size
            if self.fixed_ratio:
                ow = int(self.short_size * 4.0 / 3.0)
            elif not self.keep_ratio:  # no
                ow = self.short_size
            else:
                scale_factor = self.short_size / h
                oh = int(h * float(scale_factor) +
                         0.5) if self.do_round else int(h * self.short_size / w)
                ow = int(w * float(scale_factor) +
                         0.5) if self.do_round else int(w * self.short_size / h)
        if type(img) == np.ndarray:
            img = Image.fromarray(img, mode='RGB')
        if self.backend == 'pillow':
            result_img = img.resize((ow, oh), Image.BILINEAR)
        elif self.backend == 'cv2' and (self.keep_ratio is not None):
            result_img = cv2.resize(
                img, (ow, oh), interpolation=cv2.INTER_LINEAR)
        else:
            result_img = Image.fromarray(
                cv2.resize(
                    np.asarray(img), (ow, oh), interpolation=cv2.INTER_LINEAR))
        return result_img
 class NormalizeImage(object):
    """normalize image
    Args:
        mean (list): im - mean
        std (list): im / std
        is_scale (bool): whether need im / 255
        norm_type (str): type in ['mean_std', 'none']
    """
    def __init__(self, mean, std, is_scale=True, norm_type='mean_std'):
        self.mean = mean
        self.std = std
        self.is_scale = is_scale
        self.norm_type = norm_type
    def __call__(self, im, im_info):
        """
        Args:
            im (np.ndarray): image (np.ndarray)
            im_info (dict): info of image
        Returns:
            im (np.ndarray):  processed image (np.ndarray)
            im_info (dict): info of processed image
        """
        im = im.astype(np.float32, copy=False)
        if self.is_scale:
            scale = 1.0 / 255.0
            im *= scale
        if self.norm_type == 'mean_std':
            mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
            std = np.array(self.std)[np.newaxis, np.newaxis, :]
            im -= mean
            im /= std
        return im, im_info
 class Permute(object):
    """permute image
    Args:
        to_bgr (bool): whether convert RGB to BGR 
        channel_first (bool): whether convert HWC to CHW
    """
    def __init__(self, ):
        super(Permute, self).__init__()
    def __call__(self, im, im_info):
        """
        Args:
            im (np.ndarray): image (np.ndarray)
            im_info (dict): info of image
        Returns:
            im (np.ndarray):  processed image (np.ndarray)
            im_info (dict): info of processed image
        """
        im = im.transpose((2, 0, 1)).copy()
        return im, im_info
 class PadStride(object):
    """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config
    Args:
        stride (bool): model with FPN need image shape % stride == 0
    """
    def __init__(self, stride=0):
        self.coarsest_stride = stride
    def __call__(self, im, im_info):
        """
        Args:
            im (np.ndarray): image (np.ndarray)
            im_info (dict): info of image
        Returns:
            im (np.ndarray):  processed image (np.ndarray)
            im_info (dict): info of processed image
        """
        coarsest_stride = self.coarsest_stride
        if coarsest_stride <= 0:
            return im, im_info
        im_c, im_h, im_w = im.shape
        pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride)
        pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride)
        padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32)
        padding_im[:, :im_h, :im_w] = im
        return padding_im, im_info
 class LetterBoxResize(object):
    def __init__(self, target_size):
        """
        Resize image to target size, convert normalized xywh to pixel xyxy
        format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]).
        Args:
            target_size (int|list): image target size.
        """
        super(LetterBoxResize, self).__init__()
        if isinstance(target_size, int):
            target_size = [target_size, target_size]
        self.target_size = target_size
    def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)):
        # letterbox: resize a rectangular image to a padded rectangular
        shape = img.shape[:2]  # [height, width]
        ratio_h = float(height) / shape[0]
        ratio_w = float(width) / shape[1]
        ratio = min(ratio_h, ratio_w)
        new_shape = (round(shape[1] * ratio),
                     round(shape[0] * ratio))  # [width, height]
        padw = (width - new_shape[0]) / 2
        padh = (height - new_shape[1]) / 2
        top, bottom = round(padh - 0.1), round(padh + 0.1)
        left, right = round(padw - 0.1), round(padw + 0.1)
        img = cv2.resize(
            img, new_shape, interpolation=cv2.INTER_AREA)  # resized, no border
        img = cv2.copyMakeBorder(
            img, top, bottom, left, right, cv2.BORDER_CONSTANT,
            value=color)  # padded rectangular
        return img, ratio, padw, padh
    def __call__(self, im, im_info):
        """
        Args:
            im (np.ndarray): image (np.ndarray)
            im_info (dict): info of image
        Returns:
            im (np.ndarray):  processed image (np.ndarray)
            im_info (dict): info of processed image
        """
        assert len(self.target_size) == 2
        assert self.target_size[0] > 0 and self.target_size[1] > 0
        height, width = self.target_size
        h, w = im.shape[:2]
        im, ratio, padw, padh = self.letterbox(im, height=height, width=width)
        new_shape = [round(h * ratio), round(w * ratio)]
        im_info['im_shape'] = np.array(new_shape, dtype=np.float32)
        im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32)
        return im, im_info
 class Pad(object):
    def __init__(self, size, fill_value=[114.0, 114.0, 114.0]):
        """
        Pad image to a specified size.
        Args:
            size (list[int]): image target size
            fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0)
        """
        super(Pad, self).__init__()
        if isinstance(size, int):
            size = [size, size]
        self.size = size
        self.fill_value = fill_value
    def __call__(self, im, im_info):
        im_h, im_w = im.shape[:2]
        h, w = self.size
        if h == im_h and w == im_w:
            im = im.astype(np.float32)
            return im, im_info
        canvas = np.ones((h, w, 3), dtype=np.float32)
        canvas *= np.array(self.fill_value, dtype=np.float32)
        canvas[0:im_h, 0:im_w, :] = im.astype(np.float32)
        im = canvas
        return im, im_info
 class WarpAffine(object):
    """Warp affine the image
    """
    def __init__(self,
                 keep_res=False,
                 pad=31,
                 input_h=512,
                 input_w=512,
                 scale=0.4,
                 shift=0.1,
                 down_ratio=4):
        self.keep_res = keep_res
        self.pad = pad
        self.input_h = input_h
        self.input_w = input_w
        self.scale = scale
        self.shift = shift
        self.down_ratio = down_ratio
    def __call__(self, im, im_info):
        """
        Args:
            im (np.ndarray): image (np.ndarray)
            im_info (dict): info of image
        Returns:
            im (np.ndarray):  processed image (np.ndarray)
            im_info (dict): info of processed image
        """
        img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
        h, w = img.shape[:2]
        if self.keep_res:
            # True in detection eval/infer
            input_h = (h | self.pad) + 1
            input_w = (w | self.pad) + 1
            s = np.array([input_w, input_h], dtype=np.float32)
            c = np.array([w // 2, h // 2], dtype=np.float32)
        else:
            # False in centertrack eval_mot/eval_mot
            s = max(h, w) * 1.0
            input_h, input_w = self.input_h, self.input_w
            c = np.array([w / 2., h / 2.], dtype=np.float32)
        trans_input = get_affine_transform(c, s, 0, [input_w, input_h])
        img = cv2.resize(img, (w, h))
        inp = cv2.warpAffine(
            img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR)
        if not self.keep_res:
            out_h = input_h // self.down_ratio
            out_w = input_w // self.down_ratio
            trans_output = get_affine_transform(c, s, 0, [out_w, out_h])
            im_info.update({
                'center': c,
                'scale': s,
                'out_height': out_h,
                'out_width': out_w,
                'inp_height': input_h,
                'inp_width': input_w,
                'trans_input': trans_input,
                'trans_output': trans_output,
            })
        return inp, im_info
 class CULaneResize(object):
    def __init__(self, img_h, img_w, cut_height, prob=0.5):
        super(CULaneResize, self).__init__()
        self.img_h = img_h
        self.img_w = img_w
        self.cut_height = cut_height
        self.prob = prob
    def __call__(self, im, im_info):
        # cut
        im = im[self.cut_height:, :, :]
        # resize
        transform = iaa.Sometimes(self.prob,
                                  iaa.Resize({
                                      "height": self.img_h,
                                      "width": self.img_w
                                  }))
        im = transform(image=im.copy().astype(np.uint8))
        im = im.astype(np.float32) / 255.
        # check transpose is need whether the func decode_image is equal to CULaneDataSet cv.imread
        im = im.transpose(2, 0, 1)
        return im, im_info
 def preprocess(im, preprocess_ops):
    # process image by preprocess_ops
    im_info = {
        'scale_factor': np.array(
            [1., 1.], dtype=np.float32),
        'im_shape': None,
    }
    im, im_info = decode_image(im, im_info)
    for operator in preprocess_ops:
        im, im_info = operator(im, im_info)
    return im, im_info
--- a/third-party/paddle-inference/tracker_config.yml
+++ b/third-party/paddle-inference/tracker_config.yml
@@ -0,0 +1,32 @@
 # config of tracker for MOT SDE Detector, use 'JDETracker' as default.
 # The tracker of MOT JDE Detector (such as FairMOT) is exported together with the model.
 # Here 'min_box_area' and 'vertical_ratio' are set for pedestrian, you can modify for other objects tracking.
 type: JDETracker # 'JDETracker', 'DeepSORTTracker' or 'CenterTracker'
 # BYTETracker
 JDETracker:
  use_byte: True
  det_thresh: 0.3
  conf_thres: 0.6
  low_conf_thres: 0.1
  match_thres: 0.9
  min_box_area: 0
  vertical_ratio: 0 # 1.6 for pedestrian
 DeepSORTTracker:
  input_size: [64, 192]
  min_box_area: 0
  vertical_ratio: -1
  budget: 100
  max_age: 70
  n_init: 3
  metric_type: cosine
  matching_threshold: 0.2
  max_iou_distance: 0.9
 CenterTracker:
  min_box_area: -1
  vertical_ratio: -1
  track_thresh: 0.4
  pre_thresh: 0.5
--- a/third-party/paddle-inference/utils.py
+++ b/third-party/paddle-inference/utils.py
@@ -0,0 +1,551 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import time
 import os
 import ast
 import argparse
 import numpy as np
 def argsparser():
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument(
        "--model_dir",
        type=str,
        default=None,
        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
              "'infer_cfg.yml', created by tools/export_model.py."),
        required=True)
    parser.add_argument(
        "--image_file", type=str, default=None, help="Path of image file.")
    parser.add_argument(
        "--image_dir",
        type=str,
        default=None,
        help="Dir of image file, `image_file` has a higher priority.")
    parser.add_argument(
        "--batch_size", type=int, default=1, help="batch_size for inference.")
    parser.add_argument(
        "--video_file",
        type=str,
        default=None,
        help="Path of video file, `video_file` or `camera_id` has a highest priority."
    )
    parser.add_argument(
        "--camera_id",
        type=int,
        default=-1,
        help="device id of camera to predict.")
    parser.add_argument(
        "--threshold", type=float, default=0.5, help="Threshold of score.")
    parser.add_argument(
        "--output_dir",
        type=str,
        default="output",
        help="Directory of output visualization files.")
    parser.add_argument(
        "--run_mode",
        type=str,
        default='paddle',
        help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
    parser.add_argument(
        "--device",
        type=str,
        default='cpu',
        help="Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU."
    )
    parser.add_argument(
        "--use_gpu",
        type=ast.literal_eval,
        default=False,
        help="Deprecated, please use `--device`.")
    parser.add_argument(
        "--run_benchmark",
        type=ast.literal_eval,
        default=False,
        help="Whether to predict a image_file repeatedly for benchmark")
    parser.add_argument(
        "--enable_mkldnn",
        type=ast.literal_eval,
        default=False,
        help="Whether use mkldnn with CPU.")
    parser.add_argument(
        "--enable_mkldnn_bfloat16",
        type=ast.literal_eval,
        default=False,
        help="Whether use mkldnn bfloat16 inference with CPU.")
    parser.add_argument(
        "--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
    parser.add_argument(
        "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
    parser.add_argument(
        "--trt_max_shape",
        type=int,
        default=1280,
        help="max_shape for TensorRT.")
    parser.add_argument(
        "--trt_opt_shape",
        type=int,
        default=640,
        help="opt_shape for TensorRT.")
    parser.add_argument(
        "--trt_calib_mode",
        type=bool,
        default=False,
        help="If the model is produced by TRT offline quantitative "
        "calibration, trt_calib_mode need to set True.")
    parser.add_argument(
        '--save_images',
        type=ast.literal_eval,
        default=True,
        help='Save visualization image results.')
    parser.add_argument(
        '--save_mot_txts',
        action='store_true',
        help='Save tracking results (txt).')
    parser.add_argument(
        '--save_mot_txt_per_img',
        action='store_true',
        help='Save tracking results (txt) for each image.')
    parser.add_argument(
        '--scaled',
        type=bool,
        default=False,
        help="Whether coords after detector outputs are scaled, False in JDE YOLOv3 "
        "True in general detector.")
    parser.add_argument(
        "--tracker_config", type=str, default=None, help=("tracker donfig"))
    parser.add_argument(
        "--reid_model_dir",
        type=str,
        default=None,
        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
              "'infer_cfg.yml', created by tools/export_model.py."))
    parser.add_argument(
        "--reid_batch_size",
        type=int,
        default=50,
        help="max batch_size for reid model inference.")
    parser.add_argument(
        '--use_dark',
        type=ast.literal_eval,
        default=True,
        help='whether to use darkpose to get better keypoint position predict ')
    parser.add_argument(
        "--action_file",
        type=str,
        default=None,
        help="Path of input file for action recognition.")
    parser.add_argument(
        "--window_size",
        type=int,
        default=50,
        help="Temporal size of skeleton feature for action recognition.")
    parser.add_argument(
        "--random_pad",
        type=ast.literal_eval,
        default=False,
        help="Whether do random padding for action recognition.")
    parser.add_argument(
        "--save_results",
        action='store_true',
        default=False,
        help="Whether save detection result to file using coco format")
    parser.add_argument(
        '--use_coco_category',
        action='store_true',
        default=False,
        help='Whether to use the coco format dictionary `clsid2catid`')
    parser.add_argument(
        "--slice_infer",
        action='store_true',
        help="Whether to slice the image and merge the inference results for small object detection."
    )
    parser.add_argument(
        '--slice_size',
        nargs='+',
        type=int,
        default=[640, 640],
        help="Height of the sliced image.")
    parser.add_argument(
        "--overlap_ratio",
        nargs='+',
        type=float,
        default=[0.25, 0.25],
        help="Overlap height ratio of the sliced image.")
    parser.add_argument(
        "--combine_method",
        type=str,
        default='nms',
        help="Combine method of the sliced images' detection results, choose in ['nms', 'nmm', 'concat']."
    )
    parser.add_argument(
        "--match_threshold",
        type=float,
        default=0.6,
        help="Combine method matching threshold.")
    parser.add_argument(
        "--match_metric",
        type=str,
        default='ios',
        help="Combine method matching metric, choose in ['iou', 'ios'].")
    parser.add_argument(
        "--collect_trt_shape_info",
        action='store_true',
        default=False,
        help="Whether to collect dynamic shape before using tensorrt.")
    parser.add_argument(
        "--tuned_trt_shape_file",
        type=str,
        default="shape_range_info.pbtxt",
        help="Path of a dynamic shape file for tensorrt.")
    parser.add_argument("--use_fd_format", action="store_true")
    parser.add_argument(
        "--task_type",
        type=str,
        default='Detection',
        help="How to save the coco result, it only work with save_results==True.  Optional inputs are Rotate or Detection, default is Detection."
    )
    return parser
 class Times(object):
    def __init__(self):
        self.time = 0.
        # start time
        self.st = 0.
        # end time
        self.et = 0.
    def start(self):
        self.st = time.time()
    def end(self, repeats=1, accumulative=True):
        self.et = time.time()
        if accumulative:
            self.time += (self.et - self.st) / repeats
        else:
            self.time = (self.et - self.st) / repeats
    def reset(self):
        self.time = 0.
        self.st = 0.
        self.et = 0.
    def value(self):
        return round(self.time, 4)
 class Timer(Times):
    def __init__(self, with_tracker=False):
        super(Timer, self).__init__()
        self.with_tracker = with_tracker
        self.preprocess_time_s = Times()
        self.inference_time_s = Times()
        self.postprocess_time_s = Times()
        self.tracking_time_s = Times()
        self.img_num = 0
    def info(self, average=False):
        pre_time = self.preprocess_time_s.value()
        infer_time = self.inference_time_s.value()
        post_time = self.postprocess_time_s.value()
        track_time = self.tracking_time_s.value()
        total_time = pre_time + infer_time + post_time
        if self.with_tracker:
            total_time = total_time + track_time
        total_time = round(total_time, 4)
        print("------------------ Inference Time Info ----------------------")
        print("total_time(ms): {}, img_num: {}".format(total_time * 1000,
                                                       self.img_num))
        preprocess_time = round(pre_time / max(1, self.img_num),
                                4) if average else pre_time
        postprocess_time = round(post_time / max(1, self.img_num),
                                 4) if average else post_time
        inference_time = round(infer_time / max(1, self.img_num),
                               4) if average else infer_time
        tracking_time = round(track_time / max(1, self.img_num),
                              4) if average else track_time
        average_latency = total_time / max(1, self.img_num)
        qps = 0
        if total_time > 0:
            qps = 1 / average_latency
        print("average latency time(ms): {:.2f}, QPS: {:2f}".format(
            average_latency * 1000, qps))
        if self.with_tracker:
            print(
                "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}, tracking_time(ms): {:.2f}".
                format(preprocess_time * 1000, inference_time * 1000,
                       postprocess_time * 1000, tracking_time * 1000))
        else:
            print(
                "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}".
                format(preprocess_time * 1000, inference_time * 1000,
                       postprocess_time * 1000))
    def report(self, average=False):
        dic = {}
        pre_time = self.preprocess_time_s.value()
        infer_time = self.inference_time_s.value()
        post_time = self.postprocess_time_s.value()
        track_time = self.tracking_time_s.value()
        dic['preprocess_time_s'] = round(pre_time / max(1, self.img_num),
                                         4) if average else pre_time
        dic['inference_time_s'] = round(infer_time / max(1, self.img_num),
                                        4) if average else infer_time
        dic['postprocess_time_s'] = round(post_time / max(1, self.img_num),
                                          4) if average else post_time
        dic['img_num'] = self.img_num
        total_time = pre_time + infer_time + post_time
        if self.with_tracker:
            dic['tracking_time_s'] = round(track_time / max(1, self.img_num),
                                           4) if average else track_time
            total_time = total_time + track_time
        dic['total_time_s'] = round(total_time, 4)
        return dic
 def get_current_memory_mb():
    """
    It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
    And this function Current program is time-consuming.
    """
    import pynvml
    import psutil
    import GPUtil
    gpu_id = int(os.environ.get('CUDA_VISIBLE_DEVICES', 0))
    pid = os.getpid()
    p = psutil.Process(pid)
    info = p.memory_full_info()
    cpu_mem = info.uss / 1024. / 1024.
    gpu_mem = 0
    gpu_percent = 0
    gpus = GPUtil.getGPUs()
    if gpu_id is not None and len(gpus) > 0:
        gpu_percent = gpus[gpu_id].load
        pynvml.nvmlInit()
        handle = pynvml.nvmlDeviceGetHandleByIndex(0)
        meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
        gpu_mem = meminfo.used / 1024. / 1024.
    return round(cpu_mem, 4), round(gpu_mem, 4), round(gpu_percent, 4)
 def multiclass_nms(bboxs, num_classes, match_threshold=0.6, match_metric='iou'):
    final_boxes = []
    for c in range(num_classes):
        idxs = bboxs[:, 0] == c
        if np.count_nonzero(idxs) == 0: continue
        r = nms(bboxs[idxs, 1:], match_threshold, match_metric)
        final_boxes.append(np.concatenate([np.full((r.shape[0], 1), c), r], 1))
    return final_boxes
 def nms(dets, match_threshold=0.6, match_metric='iou'):
    """ Apply NMS to avoid detecting too many overlapping bounding boxes.
        Args:
            dets: shape [N, 5], [score, x1, y1, x2, y2]
            match_metric: 'iou' or 'ios'
            match_threshold: overlap thresh for match metric.
    """
    if dets.shape[0] == 0:
        return dets[[], :]
    scores = dets[:, 0]
    x1 = dets[:, 1]
    y1 = dets[:, 2]
    x2 = dets[:, 3]
    y2 = dets[:, 4]
    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1]
    ndets = dets.shape[0]
    suppressed = np.zeros((ndets), dtype=np.int32)
    for _i in range(ndets):
        i = order[_i]
        if suppressed[i] == 1:
            continue
        ix1 = x1[i]
        iy1 = y1[i]
        ix2 = x2[i]
        iy2 = y2[i]
        iarea = areas[i]
        for _j in range(_i + 1, ndets):
            j = order[_j]
            if suppressed[j] == 1:
                continue
            xx1 = max(ix1, x1[j])
            yy1 = max(iy1, y1[j])
            xx2 = min(ix2, x2[j])
            yy2 = min(iy2, y2[j])
            w = max(0.0, xx2 - xx1 + 1)
            h = max(0.0, yy2 - yy1 + 1)
            inter = w * h
            if match_metric == 'iou':
                union = iarea + areas[j] - inter
                match_value = inter / union
            elif match_metric == 'ios':
                smaller = min(iarea, areas[j])
                match_value = inter / smaller
            else:
                raise ValueError()
            if match_value >= match_threshold:
                suppressed[j] = 1
    keep = np.where(suppressed == 0)[0]
    dets = dets[keep, :]
    return dets
 coco_clsid2catid = {
    0: 1,
    1: 2,
    2: 3,
    3: 4,
    4: 5,
    5: 6,
    6: 7,
    7: 8,
    8: 9,
    9: 10,
    10: 11,
    11: 13,
    12: 14,
    13: 15,
    14: 16,
    15: 17,
    16: 18,
    17: 19,
    18: 20,
    19: 21,
    20: 22,
    21: 23,
    22: 24,
    23: 25,
    24: 27,
    25: 28,
    26: 31,
    27: 32,
    28: 33,
    29: 34,
    30: 35,
    31: 36,
    32: 37,
    33: 38,
    34: 39,
    35: 40,
    36: 41,
    37: 42,
    38: 43,
    39: 44,
    40: 46,
    41: 47,
    42: 48,
    43: 49,
    44: 50,
    45: 51,
    46: 52,
    47: 53,
    48: 54,
    49: 55,
    50: 56,
    51: 57,
    52: 58,
    53: 59,
    54: 60,
    55: 61,
    56: 62,
    57: 63,
    58: 64,
    59: 65,
    60: 67,
    61: 70,
    62: 72,
    63: 73,
    64: 74,
    65: 75,
    66: 76,
    67: 77,
    68: 78,
    69: 79,
    70: 80,
    71: 81,
    72: 82,
    73: 84,
    74: 85,
    75: 86,
    76: 87,
    77: 88,
    78: 89,
    79: 90
 }
 def gaussian_radius(bbox_size, min_overlap):
    height, width = bbox_size
    a1 = 1
    b1 = (height + width)
    c1 = width * height * (1 - min_overlap) / (1 + min_overlap)
    sq1 = np.sqrt(b1**2 - 4 * a1 * c1)
    radius1 = (b1 + sq1) / (2 * a1)
    a2 = 4
    b2 = 2 * (height + width)
    c2 = (1 - min_overlap) * width * height
    sq2 = np.sqrt(b2**2 - 4 * a2 * c2)
    radius2 = (b2 + sq2) / 2
    a3 = 4 * min_overlap
    b3 = -2 * min_overlap * (height + width)
    c3 = (min_overlap - 1) * width * height
    sq3 = np.sqrt(b3**2 - 4 * a3 * c3)
    radius3 = (b3 + sq3) / 2
    return min(radius1, radius2, radius3)
 def gaussian2D(shape, sigma_x=1, sigma_y=1):
    m, n = [(ss - 1.) / 2. for ss in shape]
    y, x = np.ogrid[-m:m + 1, -n:n + 1]
    h = np.exp(-(x * x / (2 * sigma_x * sigma_x) + y * y / (2 * sigma_y *
                                                            sigma_y)))
    h[h < np.finfo(h.dtype).eps * h.max()] = 0
    return h
 def draw_umich_gaussian(heatmap, center, radius, k=1):
    """
    draw_umich_gaussian, refer to https://github.com/xingyizhou/CenterNet/blob/master/src/lib/utils/image.py#L126
    """
    diameter = 2 * radius + 1
    gaussian = gaussian2D(
        (diameter, diameter), sigma_x=diameter / 6, sigma_y=diameter / 6)
    x, y = int(center[0]), int(center[1])
    height, width = heatmap.shape[0:2]
    left, right = min(x, radius), min(width - x, radius + 1)
    top, bottom = min(y, radius), min(height - y, radius + 1)
    masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right]
    masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:
                               radius + right]
    if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0:
        np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap)
    return heatmap
--- a/third-party/paddle-inference/visualize.py
+++ b/third-party/paddle-inference/visualize.py
@@ -0,0 +1,665 @@
 # Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from __future__ import division
 import os
 import cv2
 import math
 import numpy as np
 import PIL
 from PIL import Image, ImageDraw, ImageFile
 ImageFile.LOAD_TRUNCATED_IMAGES = True
 def imagedraw_textsize_c(draw, text):
    if int(PIL.__version__.split('.')[0]) < 10:
        tw, th = draw.textsize(text)
    else:
        left, top, right, bottom = draw.textbbox((0, 0), text)
        tw, th = right - left, bottom - top
    return tw, th
 def visualize_box_mask(im, results, labels, threshold=0.5):
    """
    Args:
        im (str/np.ndarray): path of image/np.ndarray read by cv2
        results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
                        matix element:[class, score, x_min, y_min, x_max, y_max]
                        MaskRCNN's results include 'masks': np.ndarray:
                        shape:[N, im_h, im_w]
        labels (list): labels:['class1', ..., 'classn']
        threshold (float): Threshold of score.
    Returns:
        im (PIL.Image.Image): visualized image
    """
    if isinstance(im, str):
        im = Image.open(im).convert('RGB')
    elif isinstance(im, np.ndarray):
        im = Image.fromarray(im)
    if 'masks' in results and 'boxes' in results and len(results['boxes']) > 0:
        im = draw_mask(
            im, results['boxes'], results['masks'], labels, threshold=threshold)
    if 'boxes' in results and len(results['boxes']) > 0:
        im = draw_box(im, results['boxes'], labels, threshold=threshold)
    if 'segm' in results:
        im = draw_segm(
            im,
            results['segm'],
            results['label'],
            results['score'],
            labels,
            threshold=threshold)
    return im
 def get_color_map_list(num_classes):
    """
    Args:
        num_classes (int): number of class
    Returns:
        color_map (list): RGB color list
    """
    color_map = num_classes * [0, 0, 0]
    for i in range(0, num_classes):
        j = 0
        lab = i
        while lab:
            color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
            color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
            color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
            j += 1
            lab >>= 3
    color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
    return color_map
 def draw_mask(im, np_boxes, np_masks, labels, threshold=0.5):
    """
    Args:
        im (PIL.Image.Image): PIL image
        np_boxes (np.ndarray): shape:[N,6], N: number of box,
            matix element:[class, score, x_min, y_min, x_max, y_max]
        np_masks (np.ndarray): shape:[N, im_h, im_w]
        labels (list): labels:['class1', ..., 'classn']
        threshold (float): threshold of mask
    Returns:
        im (PIL.Image.Image): visualized image
    """
    color_list = get_color_map_list(len(labels))
    w_ratio = 0.4
    alpha = 0.7
    im = np.array(im).astype('float32')
    clsid2color = {}
    expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1)
    np_boxes = np_boxes[expect_boxes, :]
    np_masks = np_masks[expect_boxes, :, :]
    im_h, im_w = im.shape[:2]
    np_masks = np_masks[:, :im_h, :im_w]
    for i in range(len(np_masks)):
        clsid, score = int(np_boxes[i][0]), np_boxes[i][1]
        mask = np_masks[i]
        if clsid not in clsid2color:
            clsid2color[clsid] = color_list[clsid]
        color_mask = clsid2color[clsid]
        for c in range(3):
            color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
        idx = np.nonzero(mask)
        color_mask = np.array(color_mask)
        im[idx[0], idx[1], :] *= 1.0 - alpha
        im[idx[0], idx[1], :] += alpha * color_mask
    return Image.fromarray(im.astype('uint8'))
 def draw_box(im, np_boxes, labels, threshold=0.5):
    """
    Args:
        im (PIL.Image.Image): PIL image
        np_boxes (np.ndarray): shape:[N,6], N: number of box,
                               matix element:[class, score, x_min, y_min, x_max, y_max]
        labels (list): labels:['class1', ..., 'classn']
        threshold (float): threshold of box
    Returns:
        im (PIL.Image.Image): visualized image
    """
    draw_thickness = min(im.size) // 320
    draw = ImageDraw.Draw(im)
    clsid2color = {}
    color_list = get_color_map_list(len(labels))
    expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1)
    np_boxes = np_boxes[expect_boxes, :]
    vis_order = False
    if len(np_boxes) > 0 and len(np_boxes[0]) == 7:
        np_boxes = sorted(np_boxes, key=lambda x: x[6])
        vis_order = True
    centers = []
    for dt in np_boxes:
        if len(dt) == 7:
            clsid, bbox, score, read_order = int(dt[0]), dt[2:6], dt[1], int(dt[6])
        else:
            clsid, bbox, score = int(dt[0]), dt[2:], dt[1]
        if clsid not in clsid2color:
            clsid2color[clsid] = color_list[clsid]
        color = tuple(clsid2color[clsid])
        if len(bbox) == 4:
            xmin, ymin, xmax, ymax = bbox
            print('class_id:{:d}, confidence:{:.4f}, left_top:[{:.2f},{:.2f}],'
                  'right_bottom:[{:.2f},{:.2f}]'.format(
                      int(clsid), score, xmin, ymin, xmax, ymax))
            # draw bbox
            draw.line(
                [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
                 (xmin, ymin)],
                width=draw_thickness,
                fill=color)
            cx, cy = int((xmin + xmax)/2), int((ymin + ymax)/2)
            centers.append((cx, cy))
        elif len(bbox) == 8:
            x1, y1, x2, y2, x3, y3, x4, y4 = bbox
            draw.line(
                [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)],
                width=2,
                fill=color)
            xmin = min(x1, x2, x3, x4)
            ymin = min(y1, y2, y3, y4)
        # draw label
        text = "{} {:.4f}".format(labels[clsid], score)
        tw, th = imagedraw_textsize_c(draw, text)
        draw.rectangle(
            [(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color)
        draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
    if vis_order:
        for i in range(len(centers)-1):
            draw.line([centers[i], centers[i+1]], fill=(255, 0, 0), width=2)
    return im
 def draw_segm(im,
              np_segms,
              np_label,
              np_score,
              labels,
              threshold=0.5,
              alpha=0.7):
    """
    Draw segmentation on image
    """
    mask_color_id = 0
    w_ratio = .4
    color_list = get_color_map_list(len(labels))
    im = np.array(im).astype('float32')
    clsid2color = {}
    np_segms = np_segms.astype(np.uint8)
    for i in range(np_segms.shape[0]):
        mask, score, clsid = np_segms[i], np_score[i], np_label[i]
        if score < threshold:
            continue
        if clsid not in clsid2color:
            clsid2color[clsid] = color_list[clsid]
        color_mask = clsid2color[clsid]
        for c in range(3):
            color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
        idx = np.nonzero(mask)
        color_mask = np.array(color_mask)
        idx0 = np.minimum(idx[0], im.shape[0] - 1)
        idx1 = np.minimum(idx[1], im.shape[1] - 1)
        im[idx0, idx1, :] *= 1.0 - alpha
        im[idx0, idx1, :] += alpha * color_mask
        sum_x = np.sum(mask, axis=0)
        x = np.where(sum_x > 0.5)[0]
        sum_y = np.sum(mask, axis=1)
        y = np.where(sum_y > 0.5)[0]
        x0, x1, y0, y1 = x[0], x[-1], y[0], y[-1]
        cv2.rectangle(im, (x0, y0), (x1, y1),
                      tuple(color_mask.astype('int32').tolist()), 1)
        bbox_text = '%s %.2f' % (labels[clsid], score)
        t_size = cv2.getTextSize(bbox_text, 0, 0.3, thickness=1)[0]
        cv2.rectangle(im, (x0, y0), (x0 + t_size[0], y0 - t_size[1] - 3),
                      tuple(color_mask.astype('int32').tolist()), -1)
        cv2.putText(
            im,
            bbox_text, (x0, y0 - 2),
            cv2.FONT_HERSHEY_SIMPLEX,
            0.3, (0, 0, 0),
            1,
            lineType=cv2.LINE_AA)
    return Image.fromarray(im.astype('uint8'))
 def get_color(idx):
    idx = idx * 3
    color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255)
    return color
 def visualize_pose(imgfile,
                   results,
                   visual_thresh=0.6,
                   save_name='pose.jpg',
                   save_dir='output',
                   returnimg=False,
                   ids=None):
    try:
        import matplotlib.pyplot as plt
        import matplotlib
        plt.switch_backend('agg')
    except Exception as e:
        print('Matplotlib not found, please install matplotlib.'
              'for example: `pip install matplotlib`.')
        raise e
    skeletons, scores = results['keypoint']
    skeletons = np.array(skeletons)
    kpt_nums = 17
    if len(skeletons) > 0:
        kpt_nums = skeletons.shape[1]
    if kpt_nums == 17:  #plot coco keypoint
        EDGES = [(0, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7), (6, 8),
                 (7, 9), (8, 10), (5, 11), (6, 12), (11, 13), (12, 14),
                 (13, 15), (14, 16), (11, 12)]
    else:  #plot mpii keypoint
        EDGES = [(0, 1), (1, 2), (3, 4), (4, 5), (2, 6), (3, 6), (6, 7), (7, 8),
                 (8, 9), (10, 11), (11, 12), (13, 14), (14, 15), (8, 12),
                 (8, 13)]
    NUM_EDGES = len(EDGES)
    colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
            [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
            [170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]
    cmap = matplotlib.cm.get_cmap('hsv')
    plt.figure()
    img = cv2.imread(imgfile) if type(imgfile) == str else imgfile
    color_set = results['colors'] if 'colors' in results else None
    if 'bbox' in results and ids is None:
        bboxs = results['bbox']
        for j, rect in enumerate(bboxs):
            xmin, ymin, xmax, ymax = rect
            color = colors[0] if color_set is None else colors[color_set[j] %
                                                               len(colors)]
            cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, 1)
    canvas = img.copy()
    for i in range(kpt_nums):
        for j in range(len(skeletons)):
            if skeletons[j][i, 2] < visual_thresh:
                continue
            if ids is None:
                color = colors[i] if color_set is None else colors[color_set[j]
                                                                   %
                                                                   len(colors)]
            else:
                color = get_color(ids[j])
            cv2.circle(
                canvas,
                tuple(skeletons[j][i, 0:2].astype('int32')),
                2,
                color,
                thickness=-1)
    to_plot = cv2.addWeighted(img, 0.3, canvas, 0.7, 0)
    fig = matplotlib.pyplot.gcf()
    stickwidth = 2
    for i in range(NUM_EDGES):
        for j in range(len(skeletons)):
            edge = EDGES[i]
            if skeletons[j][edge[0], 2] < visual_thresh or skeletons[j][edge[
                    1], 2] < visual_thresh:
                continue
            cur_canvas = canvas.copy()
            X = [skeletons[j][edge[0], 1], skeletons[j][edge[1], 1]]
            Y = [skeletons[j][edge[0], 0], skeletons[j][edge[1], 0]]
            mX = np.mean(X)
            mY = np.mean(Y)
            length = ((X[0] - X[1])**2 + (Y[0] - Y[1])**2)**0.5
            angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
            polygon = cv2.ellipse2Poly((int(mY), int(mX)),
                                       (int(length / 2), stickwidth),
                                       int(angle), 0, 360, 1)
            if ids is None:
                color = colors[i] if color_set is None else colors[color_set[j]
                                                                   %
                                                                   len(colors)]
            else:
                color = get_color(ids[j])
            cv2.fillConvexPoly(cur_canvas, polygon, color)
            canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0)
    if returnimg:
        return canvas
    save_name = os.path.join(
        save_dir, os.path.splitext(os.path.basename(imgfile))[0] + '_vis.jpg')
    plt.imsave(save_name, canvas[:, :, ::-1])
    print("keypoint visualize image saved to: " + save_name)
    plt.close()
 def visualize_attr(im, results, boxes=None, is_mtmct=False):
    if isinstance(im, str):
        im = Image.open(im)
        im = np.ascontiguousarray(np.copy(im))
        im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
    else:
        im = np.ascontiguousarray(np.copy(im))
    im_h, im_w = im.shape[:2]
    text_scale = max(0.5, im.shape[0] / 3000.)
    text_thickness = 1
    line_inter = im.shape[0] / 40.
    for i, res in enumerate(results):
        if boxes is None:
            text_w = 3
            text_h = 1
        elif is_mtmct:
            box = boxes[i]  # multi camera, bbox shape is x,y, w,h
            text_w = int(box[0]) + 3
            text_h = int(box[1])
        else:
            box = boxes[i]  # single camera, bbox shape is 0, 0, x,y, w,h
            text_w = int(box[2]) + 3
            text_h = int(box[3])
        for text in res:
            text_h += int(line_inter)
            text_loc = (text_w, text_h)
            cv2.putText(
                im,
                text,
                text_loc,
                cv2.FONT_ITALIC,
                text_scale, (0, 255, 255),
                thickness=text_thickness)
    return im
 def visualize_action(im,
                     mot_boxes,
                     action_visual_collector=None,
                     action_text="",
                     video_action_score=None,
                     video_action_text=""):
    im = cv2.imread(im) if isinstance(im, str) else im
    im_h, im_w = im.shape[:2]
    text_scale = max(1, im.shape[1] / 400.)
    text_thickness = 2
    if action_visual_collector:
        id_action_dict = {}
        for collector, action_type in zip(action_visual_collector, action_text):
            id_detected = collector.get_visualize_ids()
            for pid in id_detected:
                id_action_dict[pid] = id_action_dict.get(pid, [])
                id_action_dict[pid].append(action_type)
        for mot_box in mot_boxes:
            # mot_box is a format with [mot_id, class, score, xmin, ymin, w, h] 
            if mot_box[0] in id_action_dict:
                text_position = (int(mot_box[3] + mot_box[5] * 0.75),
                                 int(mot_box[4] - 10))
                display_text = ', '.join(id_action_dict[mot_box[0]])
                cv2.putText(im, display_text, text_position,
                            cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 0, 255), 2)
    if video_action_score:
        cv2.putText(
            im,
            video_action_text + ': %.2f' % video_action_score,
            (int(im_w / 2), int(15 * text_scale) + 5),
            cv2.FONT_ITALIC,
            text_scale, (0, 0, 255),
            thickness=text_thickness)
    return im
 def visualize_vehicleplate(im, results, boxes=None):
    if isinstance(im, str):
        im = Image.open(im)
        im = np.ascontiguousarray(np.copy(im))
        im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
    else:
        im = np.ascontiguousarray(np.copy(im))
    im_h, im_w = im.shape[:2]
    text_scale = max(1.0, im.shape[0] / 400.)
    text_thickness = 2
    line_inter = im.shape[0] / 40.
    for i, res in enumerate(results):
        if boxes is None:
            text_w = 3
            text_h = 1
        else:
            box = boxes[i]
            text = res
            if text == "":
                continue
            text_w = int(box[2])
            text_h = int(box[5] + box[3])
            text_loc = (text_w, text_h)
            cv2.putText(
                im,
                "LP: " + text,
                text_loc,
                cv2.FONT_ITALIC,
                text_scale, (0, 255, 255),
                thickness=text_thickness)
    return im
 def draw_press_box_lanes(im, np_boxes, labels, threshold=0.5):
    """
    Args:
        im (PIL.Image.Image): PIL image
        np_boxes (np.ndarray): shape:[N,6], N: number of box,
                               matix element:[class, score, x_min, y_min, x_max, y_max]
        labels (list): labels:['class1', ..., 'classn']
        threshold (float): threshold of box
    Returns:
        im (PIL.Image.Image): visualized image
    """
    if isinstance(im, str):
        im = Image.open(im).convert('RGB')
    elif isinstance(im, np.ndarray):
        im = Image.fromarray(im)
    draw_thickness = min(im.size) // 320
    draw = ImageDraw.Draw(im)
    clsid2color = {}
    color_list = get_color_map_list(len(labels))
    if np_boxes.shape[1] == 7:
        np_boxes = np_boxes[:, 1:]
    expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1)
    np_boxes = np_boxes[expect_boxes, :]
    for dt in np_boxes:
        clsid, bbox, score = int(dt[0]), dt[2:], dt[1]
        if clsid not in clsid2color:
            clsid2color[clsid] = color_list[clsid]
        color = tuple(clsid2color[clsid])
        if len(bbox) == 4:
            xmin, ymin, xmax, ymax = bbox
            # draw bbox
            draw.line(
                [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
                 (xmin, ymin)],
                width=draw_thickness,
                fill=(0, 0, 255))
        elif len(bbox) == 8:
            x1, y1, x2, y2, x3, y3, x4, y4 = bbox
            draw.line(
                [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)],
                width=2,
                fill=color)
            xmin = min(x1, x2, x3, x4)
            ymin = min(y1, y2, y3, y4)
        # draw label
        text = "{}".format(labels[clsid])
        tw, th = imagedraw_textsize_c(draw, text)
        draw.rectangle(
            [(xmin + 1, ymax - th), (xmin + tw + 1, ymax)], fill=color)
        draw.text((xmin + 1, ymax - th), text, fill=(0, 0, 255))
    return im
 def visualize_vehiclepress(im, results, threshold=0.5):
    results = np.array(results)
    labels = ['violation']
    im = draw_press_box_lanes(im, results, labels, threshold=threshold)
    return im
 def visualize_lane(im, lanes):
    if isinstance(im, str):
        im = Image.open(im).convert('RGB')
    elif isinstance(im, np.ndarray):
        im = Image.fromarray(im)
    draw_thickness = min(im.size) // 320
    draw = ImageDraw.Draw(im)
    if len(lanes) > 0:
        for lane in lanes:
            draw.line(
                [(lane[0], lane[1]), (lane[2], lane[3])],
                width=draw_thickness,
                fill=(0, 0, 255))
    return im
 def visualize_vehicle_retrograde(im, mot_res, vehicle_retrograde_res):
    if isinstance(im, str):
        im = Image.open(im).convert('RGB')
    elif isinstance(im, np.ndarray):
        im = Image.fromarray(im)
    draw_thickness = min(im.size) // 320
    draw = ImageDraw.Draw(im)
    lane = vehicle_retrograde_res['fence_line']
    if lane is not None:
        draw.line(
            [(lane[0], lane[1]), (lane[2], lane[3])],
            width=draw_thickness,
            fill=(0, 0, 0))
    mot_id = vehicle_retrograde_res['output']
    if mot_id is None or len(mot_id) == 0:
        return im
    if mot_res is None:
        return im
    np_boxes = mot_res['boxes']
    if np_boxes is not None:
        for dt in np_boxes:
            if dt[0] not in mot_id:
                continue
            bbox = dt[3:]
            if len(bbox) == 4:
                xmin, ymin, xmax, ymax = bbox
                # draw bbox
                draw.line(
                    [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
                     (xmin, ymin)],
                    width=draw_thickness,
                    fill=(0, 255, 0))
            # draw label
            text = "retrograde"
            tw, th = imagedraw_textsize_c(draw, text)
            draw.rectangle(
                [(xmax + 1, ymin - th), (xmax + tw + 1, ymin)],
                fill=(0, 255, 0))
            draw.text((xmax + 1, ymin - th), text, fill=(0, 255, 0))
    return im
 COLORS = [
    (255, 0, 0),
    (0, 255, 0),
    (0, 0, 255),
    (255, 255, 0),
    (255, 0, 255),
    (0, 255, 255),
    (128, 255, 0),
    (255, 128, 0),
    (128, 0, 255),
    (255, 0, 128),
    (0, 128, 255),
    (0, 255, 128),
    (128, 255, 255),
    (255, 128, 255),
    (255, 255, 128),
    (60, 180, 0),
    (180, 60, 0),
    (0, 60, 180),
    (0, 180, 60),
    (60, 0, 180),
    (180, 0, 60),
    (255, 0, 0),
    (0, 255, 0),
    (0, 0, 255),
    (255, 255, 0),
    (255, 0, 255),
    (0, 255, 255),
    (128, 255, 0),
    (255, 128, 0),
    (128, 0, 255),
 ]
 def imshow_lanes(img, lanes, show=False, out_file=None, width=4):
    lanes_xys = []
    for _, lane in enumerate(lanes):
        xys = []
        for x, y in lane:
            if x <= 0 or y <= 0:
                continue
            x, y = int(x), int(y)
            xys.append((x, y))
        lanes_xys.append(xys)
    lanes_xys.sort(key=lambda xys: xys[0][0] if len(xys) > 0 else 0)
    for idx, xys in enumerate(lanes_xys):
        for i in range(1, len(xys)):
            cv2.line(img, xys[i - 1], xys[i], COLORS[idx], thickness=width)
    if show:
        cv2.imshow('view', img)
        cv2.waitKey(0)
    if out_file:
        if not os.path.exists(os.path.dirname(out_file)):
            os.makedirs(os.path.dirname(out_file))
        cv2.imwrite(out_file, img)