feat: 新增PaddlePaddle检测支持，重构项目架构

1. 新增concurrently依赖用于并行启动服务 2. 新增服务器启动脚本统一管理环境变量和虚拟环境 3. 新增PaddlePaddle推理引擎和配套工具代码 4. 新增抽烟检测Paddle模型支持，完善模型管理 5. 重构开发启动脚本，优化开发体验 6. 更新.gitignore排除不必要的外部目录和缓存 7. 完善文档说明，新增PaddlePaddle部署指南
2026-05-21 10:39:26 +08:00
parent 7aa71c5f83
commit e97bd503ec
31 changed files with 8759 additions and 199 deletions
--- a/third-party/paddle-inference/README.md
+++ b/third-party/paddle-inference/README.md
@@ -0,0 +1,104 @@
+# Python端预测部署
+
+在PaddlePaddle中预测引擎和训练引擎底层有着不同的优化方法, 预测引擎使用了AnalysisPredictor，专门针对推理进行了优化，是基于[C++预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html)的Python接口，该引擎可以对模型进行多项图优化，减少不必要的内存拷贝。如果用户在部署已训练模型的过程中对性能有较高的要求，我们提供了独立于PaddleDetection的预测脚本，方便用户直接集成部署。
+
+
+Python端预测部署主要包含两个步骤：
+- 导出预测模型
+- 基于Python进行预测
+
+## 1. 导出预测模型
+
+PaddleDetection在训练过程包括网络的前向和优化器相关参数，而在部署过程中，我们只需要前向参数，具体参考:[导出模型](../EXPORT_MODEL.md)，例如
+
+```bash
+# 导出YOLOv3检测模型
+python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \
+ -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams
+
+# 导出HigherHRNet(bottom-up)关键点检测模型
+python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams
+
+# 导出HRNet(top-down)关键点检测模型
+python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_384x288.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams
+
+# 导出FairMOT多目标跟踪模型
+python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+
+# 导出ByteTrack多目标跟踪模型(相当于只导出检测器)
+python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
+```
+
+导出后目录下，包括`infer_cfg.yml`, `model.pdiparams`,  `model.pdiparams.info`, `model.pdmodel`四个文件。
+
+
+## 2. 基于Python的预测
+
+### 2.1 通用检测
+在终端输入以下命令进行预测：
+```bash
+python deploy/python/infer.py --model_dir=./output_inference/yolov3_darknet53_270e_coco --image_file=./demo/000000014439.jpg --device=GPU
+```
+
+### 2.2 关键点检测
+在终端输入以下命令进行预测：
+```bash
+# keypoint top-down(HRNet)/bottom-up(HigherHRNet)单独推理，该模式下top-down模型HRNet只支持单人截图预测
+python deploy/python/keypoint_infer.py --model_dir=output_inference/hrnet_w32_384x288/ --image_file=./demo/hrnet_demo.jpg --device=GPU --threshold=0.5
+python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=GPU --threshold=0.5
+
+# detector 检测 + keypoint top-down模型联合部署（联合推理只支持top-down关键点模型）
+python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/yolov3_darknet53_270e_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file={your video name}.mp4  --device=GPU
+```
+**注意:**
+ - 关键点检测模型导出和预测具体可参照[keypoint](../../configs/keypoint/README.md)，可分别在各个模型的文档中查找具体用法；
+ - 此目录下的关键点检测部署为基础前向功能，更多关键点检测功能可使用PP-Human项目，参照[pipeline](../pipeline/README.md)；
+
+
+### 2.3 多目标跟踪
+在终端输入以下命令进行预测：
+```bash
+# FairMOT跟踪
+python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU
+
+# ByteTrack跟踪
+python deploy/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/python/tracker_config.yml --video_file={your video name}.mp4 --device=GPU --scaled=True
+
+# FairMOT多目标跟踪联合HRNet关键点检测（联合推理只支持top-down关键点模型）
+python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file={your video name}.mp4 --device=GPU
+```
+
+**注意:**
+ - 多目标跟踪模型导出和预测具体可参照[mot]](../../configs/mot/README.md)，可分别在各个模型的文档中查找具体用法；
+ - 此目录下的跟踪部署为基础前向功能以及联合关键点部署，更多跟踪功能可使用PP-Human项目，参照[pipeline](../pipeline/README.md)，或PP-Tracking项目(绘制轨迹、出入口流量计数)，参照[pptracking](../pptracking/README.md)；
+
+
+参数说明如下:
+
+| 参数 | 是否必须| 含义                                                                                          |
+|-------|-------|---------------------------------------------------------------------------------------------|
+| --model_dir | Yes| 上述导出的模型路径                                                                                   |
+| --image_file | Option | 需要预测的图片                                                                                     |
+| --image_dir  | Option | 要预测的图片文件夹路径                                                                                 |
+| --video_file | Option | 需要预测的视频                                                                                     |
+| --camera_id | Option | 用来预测的摄像头ID，默认为-1(表示不使用摄像头预测，可设置为：0 - (摄像头数目-1) )，预测过程中在可视化界面按`q`退出输出预测结果到：output/output.mp4 |
+| --device | Option | 运行时的设备，可选择`CPU/GPU/XPU`，默认为`CPU`                                                            |
+| --run_mode | Option | 使用GPU时，默认为paddle, 可选（paddle/trt_fp32/trt_fp16/trt_int8）                                     |
+| --batch_size | Option | 预测时的batch size，在指定`image_dir`时有效，默认为1                                                       |
+| --threshold | Option| 预测得分的阈值，默认为0.5                                                                              |
+| --output_dir | Option| 可视化结果保存的根目录，默认为output/                                                                      |
+| --run_benchmark | Option| 是否运行benchmark，同时需指定`--image_file`或`--image_dir`，默认为False                                    |
+| --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速，默认为False                                                                 |
+| --cpu_threads | Option| 设置cpu线程数，默认为1                                                                               |
+| --trt_calib_mode | Option| TensorRT是否使用校准功能，默认为False。使用TensorRT的int8功能时，需设置为True，使用PaddleSlim量化后的模型时需要设置为False         |
+| --save_images | Option| 是否保存可视化结果                                                                                   |
+| --save_results | Option| 是否在文件夹下将图片的预测结果以JSON的形式保存                                                                   |
+
+
+说明：
+
+- 参数优先级顺序：`camera_id` > `video_file` > `image_dir` > `image_file`。
+- run_mode：paddle代表使用AnalysisPredictor，精度float32来推理，其他参数指用AnalysisPredictor，TensorRT不同精度来推理。
+- 如果安装的PaddlePaddle不支持基于TensorRT进行预测，需要自行编译，详细可参考[预测库编译教程](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html)。
+- --run_benchmark如果设置为True，则需要安装依赖`pip install pynvml psutil GPUtil`。
+- 如果需要使用导出模型在coco数据集上进行评估，请在推理时添加`--save_results`和`--use_coco_category`参数用以保存coco评估所需要的json文件
--- a/third-party/paddle-inference/benchmark_utils.py
+++ b/third-party/paddle-inference/benchmark_utils.py
@@ -0,0 +1,289 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import logging
+
+import paddle
+import paddle.inference as paddle_infer
+
+from pathlib import Path
+
+CUR_DIR = os.path.dirname(os.path.abspath(__file__))
+LOG_PATH_ROOT = f"{CUR_DIR}/../../output"
+
+
+class PaddleInferBenchmark(object):
+    def __init__(self,
+                 config,
+                 model_info: dict={},
+                 data_info: dict={},
+                 perf_info: dict={},
+                 resource_info: dict={},
+                 **kwargs):
+        """
+        Construct PaddleInferBenchmark Class to format logs.
+        args:
+            config(paddle.inference.Config): paddle inference config
+            model_info(dict): basic model info
+                {'model_name': 'resnet50'
+                 'precision': 'fp32'}
+            data_info(dict): input data info
+                {'batch_size': 1
+                 'shape': '3,224,224'
+                 'data_num': 1000}
+            perf_info(dict): performance result
+                {'preprocess_time_s': 1.0
+                'inference_time_s': 2.0
+                'postprocess_time_s': 1.0
+                'total_time_s': 4.0}
+            resource_info(dict): 
+                cpu and gpu resources
+                {'cpu_rss': 100
+                 'gpu_rss': 100
+                 'gpu_util': 60}
+        """
+        # PaddleInferBenchmark Log Version
+        self.log_version = "1.0.3"
+
+        # Paddle Version
+        self.paddle_version = paddle.__version__
+        self.paddle_commit = paddle.__git_commit__
+        paddle_infer_info = paddle_infer.get_version()
+        self.paddle_branch = paddle_infer_info.strip().split(': ')[-1]
+
+        # model info
+        self.model_info = model_info
+
+        # data info
+        self.data_info = data_info
+
+        # perf info
+        self.perf_info = perf_info
+
+        try:
+            # required value
+            self.model_name = model_info['model_name']
+            self.precision = model_info['precision']
+
+            self.batch_size = data_info['batch_size']
+            self.shape = data_info['shape']
+            self.data_num = data_info['data_num']
+
+            self.inference_time_s = round(perf_info['inference_time_s'], 4)
+        except:
+            self.print_help()
+            raise ValueError(
+                "Set argument wrong, please check input argument and its type")
+
+        self.preprocess_time_s = perf_info.get('preprocess_time_s', 0)
+        self.postprocess_time_s = perf_info.get('postprocess_time_s', 0)
+        self.with_tracker = True if 'tracking_time_s' in perf_info else False
+        self.tracking_time_s = perf_info.get('tracking_time_s', 0)
+        self.total_time_s = perf_info.get('total_time_s', 0)
+
+        self.inference_time_s_90 = perf_info.get("inference_time_s_90", "")
+        self.inference_time_s_99 = perf_info.get("inference_time_s_99", "")
+        self.succ_rate = perf_info.get("succ_rate", "")
+        self.qps = perf_info.get("qps", "")
+
+        # conf info
+        self.config_status = self.parse_config(config)
+
+        # mem info
+        if isinstance(resource_info, dict):
+            self.cpu_rss_mb = int(resource_info.get('cpu_rss_mb', 0))
+            self.cpu_vms_mb = int(resource_info.get('cpu_vms_mb', 0))
+            self.cpu_shared_mb = int(resource_info.get('cpu_shared_mb', 0))
+            self.cpu_dirty_mb = int(resource_info.get('cpu_dirty_mb', 0))
+            self.cpu_util = round(resource_info.get('cpu_util', 0), 2)
+
+            self.gpu_rss_mb = int(resource_info.get('gpu_rss_mb', 0))
+            self.gpu_util = round(resource_info.get('gpu_util', 0), 2)
+            self.gpu_mem_util = round(resource_info.get('gpu_mem_util', 0), 2)
+        else:
+            self.cpu_rss_mb = 0
+            self.cpu_vms_mb = 0
+            self.cpu_shared_mb = 0
+            self.cpu_dirty_mb = 0
+            self.cpu_util = 0
+
+            self.gpu_rss_mb = 0
+            self.gpu_util = 0
+            self.gpu_mem_util = 0
+
+        # init benchmark logger
+        self.benchmark_logger()
+
+    def benchmark_logger(self):
+        """
+        benchmark logger
+        """
+        # remove other logging handler
+        for handler in logging.root.handlers[:]:
+            logging.root.removeHandler(handler)
+
+        # Init logger
+        FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+        log_output = f"{LOG_PATH_ROOT}/{self.model_name}.log"
+        Path(f"{LOG_PATH_ROOT}").mkdir(parents=True, exist_ok=True)
+        logging.basicConfig(
+            level=logging.INFO,
+            format=FORMAT,
+            handlers=[
+                logging.FileHandler(
+                    filename=log_output, mode='w'),
+                logging.StreamHandler(),
+            ])
+        self.logger = logging.getLogger(__name__)
+        self.logger.info(
+            f"Paddle Inference benchmark log will be saved to {log_output}")
+
+    def parse_config(self, config) -> dict:
+        """
+        parse paddle predictor config
+        args:
+            config(paddle.inference.Config): paddle inference config
+        return:
+            config_status(dict): dict style config info
+        """
+        if isinstance(config, paddle_infer.Config):
+            config_status = {}
+            config_status['runtime_device'] = "gpu" if config.use_gpu(
+            ) else "cpu"
+            config_status['ir_optim'] = config.ir_optim()
+            config_status['enable_tensorrt'] = config.tensorrt_engine_enabled()
+            config_status['precision'] = self.precision
+            config_status['enable_mkldnn'] = config.mkldnn_enabled()
+            config_status[
+                'cpu_math_library_num_threads'] = config.cpu_math_library_num_threads(
+                )
+        elif isinstance(config, dict):
+            config_status['runtime_device'] = config.get('runtime_device', "")
+            config_status['ir_optim'] = config.get('ir_optim', "")
+            config_status['enable_tensorrt'] = config.get('enable_tensorrt', "")
+            config_status['precision'] = config.get('precision', "")
+            config_status['enable_mkldnn'] = config.get('enable_mkldnn', "")
+            config_status['cpu_math_library_num_threads'] = config.get(
+                'cpu_math_library_num_threads', "")
+        else:
+            self.print_help()
+            raise ValueError(
+                "Set argument config wrong, please check input argument and its type"
+            )
+        return config_status
+
+    def report(self, identifier=None):
+        """
+        print log report
+        args:
+            identifier(string): identify log
+        """
+        if identifier:
+            identifier = f"[{identifier}]"
+        else:
+            identifier = ""
+
+        self.logger.info("\n")
+        self.logger.info(
+            "---------------------- Paddle info ----------------------")
+        self.logger.info(f"{identifier} paddle_version: {self.paddle_version}")
+        self.logger.info(f"{identifier} paddle_commit: {self.paddle_commit}")
+        self.logger.info(f"{identifier} paddle_branch: {self.paddle_branch}")
+        self.logger.info(f"{identifier} log_api_version: {self.log_version}")
+        self.logger.info(
+            "----------------------- Conf info -----------------------")
+        self.logger.info(
+            f"{identifier} runtime_device: {self.config_status['runtime_device']}"
+        )
+        self.logger.info(
+            f"{identifier} ir_optim: {self.config_status['ir_optim']}")
+        self.logger.info(f"{identifier} enable_memory_optim: {True}")
+        self.logger.info(
+            f"{identifier} enable_tensorrt: {self.config_status['enable_tensorrt']}"
+        )
+        self.logger.info(
+            f"{identifier} enable_mkldnn: {self.config_status['enable_mkldnn']}")
+        self.logger.info(
+            f"{identifier} cpu_math_library_num_threads: {self.config_status['cpu_math_library_num_threads']}"
+        )
+        self.logger.info(
+            "----------------------- Model info ----------------------")
+        self.logger.info(f"{identifier} model_name: {self.model_name}")
+        self.logger.info(f"{identifier} precision: {self.precision}")
+        self.logger.info(
+            "----------------------- Data info -----------------------")
+        self.logger.info(f"{identifier} batch_size: {self.batch_size}")
+        self.logger.info(f"{identifier} input_shape: {self.shape}")
+        self.logger.info(f"{identifier} data_num: {self.data_num}")
+        self.logger.info(
+            "----------------------- Perf info -----------------------")
+        self.logger.info(
+            f"{identifier} cpu_rss(MB): {self.cpu_rss_mb}, cpu_vms: {self.cpu_vms_mb}, cpu_shared_mb: {self.cpu_shared_mb}, cpu_dirty_mb: {self.cpu_dirty_mb}, cpu_util: {self.cpu_util}%"
+        )
+        self.logger.info(
+            f"{identifier} gpu_rss(MB): {self.gpu_rss_mb}, gpu_util: {self.gpu_util}%, gpu_mem_util: {self.gpu_mem_util}%"
+        )
+        self.logger.info(
+            f"{identifier} total time spent(s): {self.total_time_s}")
+
+        if self.with_tracker:
+            self.logger.info(
+                f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, "
+                f"inference_time(ms): {round(self.inference_time_s*1000, 1)}, "
+                f"postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}, "
+                f"tracking_time(ms): {round(self.tracking_time_s*1000, 1)}")
+        else:
+            self.logger.info(
+                f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, "
+                f"inference_time(ms): {round(self.inference_time_s*1000, 1)}, "
+                f"postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}"
+            )
+        if self.inference_time_s_90:
+            self.looger.info(
+                f"{identifier} 90%_cost: {self.inference_time_s_90}, 99%_cost: {self.inference_time_s_99}, succ_rate: {self.succ_rate}"
+            )
+        if self.qps:
+            self.logger.info(f"{identifier} QPS: {self.qps}")
+
+    def print_help(self):
+        """
+        print function help
+        """
+        print("""Usage: 
+            ==== Print inference benchmark logs. ====
+            config = paddle.inference.Config()
+            model_info = {'model_name': 'resnet50'
+                          'precision': 'fp32'}
+            data_info = {'batch_size': 1
+                         'shape': '3,224,224'
+                         'data_num': 1000}
+            perf_info = {'preprocess_time_s': 1.0
+                         'inference_time_s': 2.0
+                         'postprocess_time_s': 1.0
+                         'total_time_s': 4.0}
+            resource_info = {'cpu_rss_mb': 100
+                             'gpu_rss_mb': 100
+                             'gpu_util': 60}
+            log = PaddleInferBenchmark(config, model_info, data_info, perf_info, resource_info)
+            log('Test')
+            """)
+
+    def __call__(self, identifier=None):
+        """
+        __call__
+        args:
+            identifier(string): identify log
+        """
+        self.report(identifier)
--- a/third-party/paddle-inference/clrnet_postprocess.py
+++ b/third-party/paddle-inference/clrnet_postprocess.py
@@ -0,0 +1,262 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+import paddle
+import paddle.nn as nn
+from scipy.special import softmax
+from scipy.interpolate import InterpolatedUnivariateSpline
+
+
+def line_iou(pred, target, img_w, length=15, aligned=True):
+    '''
+    Calculate the line iou value between predictions and targets
+    Args:
+        pred: lane predictions, shape: (num_pred, 72)
+        target: ground truth, shape: (num_target, 72)
+        img_w: image width
+        length: extended radius
+        aligned: True for iou loss calculation, False for pair-wise ious in assign
+    '''
+    px1 = pred - length
+    px2 = pred + length
+    tx1 = target - length
+    tx2 = target + length
+
+    if aligned:
+        invalid_mask = target
+        ovr = paddle.minimum(px2, tx2) - paddle.maximum(px1, tx1)
+        union = paddle.maximum(px2, tx2) - paddle.minimum(px1, tx1)
+    else:
+        num_pred = pred.shape[0]
+        invalid_mask = target.tile([num_pred, 1, 1])
+
+        ovr = (paddle.minimum(px2[:, None, :], tx2[None, ...]) - paddle.maximum(
+            px1[:, None, :], tx1[None, ...]))
+        union = (paddle.maximum(px2[:, None, :], tx2[None, ...]) -
+                 paddle.minimum(px1[:, None, :], tx1[None, ...]))
+
+    invalid_masks = (invalid_mask < 0) | (invalid_mask >= img_w)
+
+    ovr[invalid_masks] = 0.
+    union[invalid_masks] = 0.
+    iou = ovr.sum(axis=-1) / (union.sum(axis=-1) + 1e-9)
+    return iou
+
+
+class Lane:
+    def __init__(self, points=None, invalid_value=-2., metadata=None):
+        super(Lane, self).__init__()
+        self.curr_iter = 0
+        self.points = points
+        self.invalid_value = invalid_value
+        self.function = InterpolatedUnivariateSpline(
+            points[:, 1], points[:, 0], k=min(3, len(points) - 1))
+        self.min_y = points[:, 1].min() - 0.01
+        self.max_y = points[:, 1].max() + 0.01
+        self.metadata = metadata or {}
+
+    def __repr__(self):
+        return '[Lane]\n' + str(self.points) + '\n[/Lane]'
+
+    def __call__(self, lane_ys):
+        lane_xs = self.function(lane_ys)
+
+        lane_xs[(lane_ys < self.min_y) | (lane_ys > self.max_y
+                                          )] = self.invalid_value
+        return lane_xs
+
+    def to_array(self, sample_y_range, img_w, img_h):
+        self.sample_y = range(sample_y_range[0], sample_y_range[1],
+                              sample_y_range[2])
+        sample_y = self.sample_y
+        img_w, img_h = img_w, img_h
+        ys = np.array(sample_y) / float(img_h)
+        xs = self(ys)
+        valid_mask = (xs >= 0) & (xs < 1)
+        lane_xs = xs[valid_mask] * img_w
+        lane_ys = ys[valid_mask] * img_h
+        lane = np.concatenate(
+            (lane_xs.reshape(-1, 1), lane_ys.reshape(-1, 1)), axis=1)
+        return lane
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        if self.curr_iter < len(self.points):
+            self.curr_iter += 1
+            return self.points[self.curr_iter - 1]
+        self.curr_iter = 0
+        raise StopIteration
+
+
+class CLRNetPostProcess(object):
+    """
+    Args:
+        input_shape (int): network input image size
+        ori_shape (int): ori image shape of before padding
+        scale_factor (float): scale factor of ori image
+        enable_mkldnn (bool): whether to open MKLDNN
+    """
+
+    def __init__(self, img_w, ori_img_h, cut_height, conf_threshold, nms_thres,
+                 max_lanes, num_points):
+        self.img_w = img_w
+        self.conf_threshold = conf_threshold
+        self.nms_thres = nms_thres
+        self.max_lanes = max_lanes
+        self.num_points = num_points
+        self.n_strips = num_points - 1
+        self.n_offsets = num_points
+        self.ori_img_h = ori_img_h
+        self.cut_height = cut_height
+
+        self.prior_ys = paddle.linspace(
+            start=1, stop=0, num=self.n_offsets).astype('float64')
+
+    def predictions_to_pred(self, predictions):
+        """
+        Convert predictions to internal Lane structure for evaluation.
+        """
+        lanes = []
+        for lane in predictions:
+            lane_xs = lane[6:].clone()
+            start = min(
+                max(0, int(round(lane[2].item() * self.n_strips))),
+                self.n_strips)
+            length = int(round(lane[5].item()))
+            end = start + length - 1
+            end = min(end, len(self.prior_ys) - 1)
+            if start > 0:
+                mask = ((lane_xs[:start] >= 0.) &
+                        (lane_xs[:start] <= 1.)).cpu().detach().numpy()[::-1]
+                mask = ~((mask.cumprod()[::-1]).astype(np.bool_))
+                lane_xs[:start][mask] = -2
+            if end < len(self.prior_ys) - 1:
+                lane_xs[end + 1:] = -2
+
+            lane_ys = self.prior_ys[lane_xs >= 0].clone()
+            lane_xs = lane_xs[lane_xs >= 0]
+            lane_xs = lane_xs.flip(axis=0).astype('float64')
+            lane_ys = lane_ys.flip(axis=0)
+
+            lane_ys = (lane_ys *
+                       (self.ori_img_h - self.cut_height) + self.cut_height
+                       ) / self.ori_img_h
+            if len(lane_xs) <= 1:
+                continue
+            points = paddle.stack(
+                x=(lane_xs.reshape([-1, 1]), lane_ys.reshape([-1, 1])),
+                axis=1).squeeze(axis=2)
+            lane = Lane(
+                points=points.cpu().numpy(),
+                metadata={
+                    'start_x': lane[3],
+                    'start_y': lane[2],
+                    'conf': lane[1]
+                })
+            lanes.append(lane)
+        return lanes
+
+    def lane_nms(self, predictions, scores, nms_overlap_thresh, top_k):
+        """
+        NMS for lane detection.
+        predictions: paddle.Tensor [num_lanes,conf,y,x,lenght,72offsets] [12,77]
+        scores: paddle.Tensor [num_lanes]
+        nms_overlap_thresh: float
+        top_k: int
+        """
+        # sort by scores to get idx
+        idx = scores.argsort(descending=True)
+        keep = []
+
+        condidates = predictions.clone()
+        condidates = condidates.index_select(idx)
+
+        while len(condidates) > 0:
+            keep.append(idx[0])
+            if len(keep) >= top_k or len(condidates) == 1:
+                break
+
+            ious = []
+            for i in range(1, len(condidates)):
+                ious.append(1 - line_iou(
+                    condidates[i].unsqueeze(0),
+                    condidates[0].unsqueeze(0),
+                    img_w=self.img_w,
+                    length=15))
+            ious = paddle.to_tensor(ious)
+
+            mask = ious <= nms_overlap_thresh
+            id = paddle.where(mask == False)[0]
+
+            if id.shape[0] == 0:
+                break
+            condidates = condidates[1:].index_select(id)
+            idx = idx[1:].index_select(id)
+        keep = paddle.stack(keep)
+
+        return keep
+
+    def get_lanes(self, output, as_lanes=True):
+        """
+        Convert model output to lanes.
+        """
+        softmax = nn.Softmax(axis=1)
+        decoded = []
+
+        for predictions in output:
+            if len(predictions) == 0:
+                decoded.append([])
+                continue
+            threshold = self.conf_threshold
+            scores = softmax(predictions[:, :2])[:, 1]
+            keep_inds = scores >= threshold
+            predictions = predictions[keep_inds]
+            scores = scores[keep_inds]
+
+            if predictions.shape[0] == 0:
+                decoded.append([])
+                continue
+            nms_predictions = predictions.detach().clone()
+            nms_predictions = paddle.concat(
+                x=[nms_predictions[..., :4], nms_predictions[..., 5:]], axis=-1)
+
+            nms_predictions[..., 4] = nms_predictions[..., 4] * self.n_strips
+            nms_predictions[..., 5:] = nms_predictions[..., 5:] * (
+                self.img_w - 1)
+
+            keep = self.lane_nms(
+                nms_predictions[..., 5:],
+                scores,
+                nms_overlap_thresh=self.nms_thres,
+                top_k=self.max_lanes)
+
+            predictions = predictions.index_select(keep)
+
+            if predictions.shape[0] == 0:
+                decoded.append([])
+                continue
+            predictions[:, 5] = paddle.round(predictions[:, 5] * self.n_strips)
+            if as_lanes:
+                pred = self.predictions_to_pred(predictions)
+            else:
+                pred = predictions
+            decoded.append(pred)
+        return decoded
+
+    def __call__(self, lanes_list):
+        lanes = self.get_lanes(lanes_list)
+        return lanes
--- a/third-party/paddle-inference/det_keypoint_unite_infer.py
+++ b/third-party/paddle-inference/det_keypoint_unite_infer.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import json
+import cv2
+import math
+import numpy as np
+import paddle
+import yaml
+
+from det_keypoint_unite_utils import argsparser
+from preprocess import decode_image
+from infer import Detector, DetectorPicoDet, PredictConfig, print_arguments, get_test_images, bench_log
+from keypoint_infer import KeyPointDetector, PredictConfig_KeyPoint
+from visualize import visualize_pose
+from benchmark_utils import PaddleInferBenchmark
+from utils import get_current_memory_mb
+from keypoint_postprocess import translate_to_ori_images
+
+KEYPOINT_SUPPORT_MODELS = {
+    'HigherHRNet': 'keypoint_bottomup',
+    'HRNet': 'keypoint_topdown'
+}
+
+
+def predict_with_given_det(image, det_res, keypoint_detector,
+                           keypoint_batch_size, run_benchmark):
+    keypoint_res = {}
+
+    rec_images, records, det_rects = keypoint_detector.get_person_from_rect(
+        image, det_res)
+
+    if len(det_rects) == 0:
+        keypoint_res['keypoint'] = [[], []]
+        return keypoint_res
+
+    keypoint_vector = []
+    score_vector = []
+
+    rect_vector = det_rects
+    keypoint_results = keypoint_detector.predict_image(
+        rec_images, run_benchmark, repeats=10, visual=False)
+    keypoint_vector, score_vector = translate_to_ori_images(keypoint_results,
+                                                            np.array(records))
+    keypoint_res['keypoint'] = [
+        keypoint_vector.tolist(), score_vector.tolist()
+    ] if len(keypoint_vector) > 0 else [[], []]
+    keypoint_res['bbox'] = rect_vector
+    return keypoint_res
+
+
+def topdown_unite_predict(detector,
+                          topdown_keypoint_detector,
+                          image_list,
+                          keypoint_batch_size=1,
+                          save_res=False):
+    det_timer = detector.get_timer()
+    store_res = []
+    for i, img_file in enumerate(image_list):
+        # Decode image in advance in det + pose prediction
+        det_timer.preprocess_time_s.start()
+        image, _ = decode_image(img_file, {})
+        det_timer.preprocess_time_s.end()
+
+        if FLAGS.run_benchmark:
+            results = detector.predict_image(
+                [image], run_benchmark=True, repeats=10)
+
+            cm, gm, gu = get_current_memory_mb()
+            detector.cpu_mem += cm
+            detector.gpu_mem += gm
+            detector.gpu_util += gu
+        else:
+            results = detector.predict_image([image], visual=False)
+        results = detector.filter_box(results, FLAGS.det_threshold)
+        if results['boxes_num'] > 0:
+            keypoint_res = predict_with_given_det(
+                image, results, topdown_keypoint_detector, keypoint_batch_size,
+                FLAGS.run_benchmark)
+
+            if save_res:
+                save_name = img_file if isinstance(img_file, str) else i
+                store_res.append([
+                    save_name, keypoint_res['bbox'],
+                    [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]]
+                ])
+        else:
+            results["keypoint"] = [[], []]
+            keypoint_res = results
+        if FLAGS.run_benchmark:
+            cm, gm, gu = get_current_memory_mb()
+            topdown_keypoint_detector.cpu_mem += cm
+            topdown_keypoint_detector.gpu_mem += gm
+            topdown_keypoint_detector.gpu_util += gu
+        else:
+            if not os.path.exists(FLAGS.output_dir):
+                os.makedirs(FLAGS.output_dir)
+            visualize_pose(
+                img_file,
+                keypoint_res,
+                visual_thresh=FLAGS.keypoint_threshold,
+                save_dir=FLAGS.output_dir)
+    if save_res:
+        """
+        1) store_res: a list of image_data
+        2) image_data: [imageid, rects, [keypoints, scores]]
+        3) rects: list of rect [xmin, ymin, xmax, ymax]
+        4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list
+        5) scores: mean of all joint conf
+        """
+        with open("det_keypoint_unite_image_results.json", 'w') as wf:
+            json.dump(store_res, wf, indent=4)
+
+
+def topdown_unite_predict_video(detector,
+                                topdown_keypoint_detector,
+                                camera_id,
+                                keypoint_batch_size=1,
+                                save_res=False):
+    video_name = 'output.mp4'
+    if camera_id != -1:
+        capture = cv2.VideoCapture(camera_id)
+    else:
+        capture = cv2.VideoCapture(FLAGS.video_file)
+        video_name = os.path.split(FLAGS.video_file)[-1]
+    # Get Video info : resolution, fps, frame count
+    width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+    fps = int(capture.get(cv2.CAP_PROP_FPS))
+    frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
+    print("fps: %d, frame_count: %d" % (fps, frame_count))
+
+    if not os.path.exists(FLAGS.output_dir):
+        os.makedirs(FLAGS.output_dir)
+    out_path = os.path.join(FLAGS.output_dir, video_name)
+    fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
+    writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
+    index = 0
+    store_res = []
+    keypoint_smoothing = KeypointSmoothing(
+        width, height, filter_type=FLAGS.filter_type, beta=0.05)
+
+    while (1):
+        ret, frame = capture.read()
+        if not ret:
+            break
+        index += 1
+        print('detect frame: %d' % (index))
+
+        frame2 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+
+        results = detector.predict_image([frame2], visual=False)
+        results = detector.filter_box(results, FLAGS.det_threshold)
+        if results['boxes_num'] == 0:
+            writer.write(frame)
+            continue
+
+        keypoint_res = predict_with_given_det(
+            frame2, results, topdown_keypoint_detector, keypoint_batch_size,
+            FLAGS.run_benchmark)
+
+        if FLAGS.smooth and len(keypoint_res['keypoint'][0]) == 1:
+            current_keypoints = np.array(keypoint_res['keypoint'][0][0])
+            smooth_keypoints = keypoint_smoothing.smooth_process(
+                current_keypoints)
+
+            keypoint_res['keypoint'][0][0] = smooth_keypoints.tolist()
+
+        im = visualize_pose(
+            frame,
+            keypoint_res,
+            visual_thresh=FLAGS.keypoint_threshold,
+            returnimg=True)
+
+        if save_res:
+            store_res.append([
+                index, keypoint_res['bbox'],
+                [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]]
+            ])
+
+        writer.write(im)
+        if camera_id != -1:
+            cv2.imshow('Mask Detection', im)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                break
+    writer.release()
+    print('output_video saved to: {}'.format(out_path))
+    if save_res:
+        """
+        1) store_res: a list of frame_data
+        2) frame_data: [frameid, rects, [keypoints, scores]]
+        3) rects: list of rect [xmin, ymin, xmax, ymax]
+        4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list
+        5) scores: mean of all joint conf
+        """
+        with open("det_keypoint_unite_video_results.json", 'w') as wf:
+            json.dump(store_res, wf, indent=4)
+
+
+class KeypointSmoothing(object):
+    # The following code are modified from:
+    # https://github.com/jaantollander/OneEuroFilter
+
+    def __init__(self,
+                 width,
+                 height,
+                 filter_type,
+                 alpha=0.5,
+                 fc_d=0.1,
+                 fc_min=0.1,
+                 beta=0.1,
+                 thres_mult=0.3):
+        super(KeypointSmoothing, self).__init__()
+        self.image_width = width
+        self.image_height = height
+        self.threshold = np.array([
+            0.005, 0.005, 0.005, 0.005, 0.005, 0.01, 0.01, 0.01, 0.01, 0.01,
+            0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01
+        ]) * thres_mult
+        self.filter_type = filter_type
+        self.alpha = alpha
+        self.dx_prev_hat = None
+        self.x_prev_hat = None
+        self.fc_d = fc_d
+        self.fc_min = fc_min
+        self.beta = beta
+
+        if self.filter_type == 'OneEuro':
+            self.smooth_func = self.one_euro_filter
+        elif self.filter_type == 'EMA':
+            self.smooth_func = self.ema_filter
+        else:
+            raise ValueError('filter type must be one_euro or ema')
+
+    def smooth_process(self, current_keypoints):
+        if self.x_prev_hat is None:
+            self.x_prev_hat = current_keypoints[:, :2]
+            self.dx_prev_hat = np.zeros(current_keypoints[:, :2].shape)
+            return current_keypoints
+        else:
+            result = current_keypoints
+            num_keypoints = len(current_keypoints)
+            for i in range(num_keypoints):
+                result[i, :2] = self.smooth(current_keypoints[i, :2],
+                                            self.threshold[i], i)
+            return result
+
+    def smooth(self, current_keypoint, threshold, index):
+        distance = np.sqrt(
+            np.square((current_keypoint[0] - self.x_prev_hat[index][0]) /
+                      self.image_width) + np.square((current_keypoint[
+                          1] - self.x_prev_hat[index][1]) / self.image_height))
+        if distance < threshold:
+            result = self.x_prev_hat[index]
+        else:
+            result = self.smooth_func(current_keypoint, self.x_prev_hat[index],
+                                      index)
+
+        return result
+
+    def one_euro_filter(self, x_cur, x_pre, index):
+        te = 1
+        self.alpha = self.smoothing_factor(te, self.fc_d)
+        dx_cur = (x_cur - x_pre) / te
+        dx_cur_hat = self.exponential_smoothing(dx_cur, self.dx_prev_hat[index])
+
+        fc = self.fc_min + self.beta * np.abs(dx_cur_hat)
+        self.alpha = self.smoothing_factor(te, fc)
+        x_cur_hat = self.exponential_smoothing(x_cur, x_pre)
+        self.dx_prev_hat[index] = dx_cur_hat
+        self.x_prev_hat[index] = x_cur_hat
+        return x_cur_hat
+
+    def ema_filter(self, x_cur, x_pre, index):
+        x_cur_hat = self.exponential_smoothing(x_cur, x_pre)
+        self.x_prev_hat[index] = x_cur_hat
+        return x_cur_hat
+
+    def smoothing_factor(self, te, fc):
+        r = 2 * math.pi * fc * te
+        return r / (r + 1)
+
+    def exponential_smoothing(self, x_cur, x_pre, index=0):
+        return self.alpha * x_cur + (1 - self.alpha) * x_pre
+
+
+def main():
+    deploy_file = os.path.join(FLAGS.det_model_dir, 'infer_cfg.yml')
+    with open(deploy_file) as f:
+        yml_conf = yaml.safe_load(f)
+    arch = yml_conf['arch']
+    detector_func = 'Detector'
+    if arch == 'PicoDet':
+        detector_func = 'DetectorPicoDet'
+
+    detector = eval(detector_func)(FLAGS.det_model_dir,
+                                   device=FLAGS.device,
+                                   run_mode=FLAGS.run_mode,
+                                   trt_min_shape=FLAGS.trt_min_shape,
+                                   trt_max_shape=FLAGS.trt_max_shape,
+                                   trt_opt_shape=FLAGS.trt_opt_shape,
+                                   trt_calib_mode=FLAGS.trt_calib_mode,
+                                   cpu_threads=FLAGS.cpu_threads,
+                                   enable_mkldnn=FLAGS.enable_mkldnn,
+                                   threshold=FLAGS.det_threshold)
+
+    topdown_keypoint_detector = KeyPointDetector(
+        FLAGS.keypoint_model_dir,
+        device=FLAGS.device,
+        run_mode=FLAGS.run_mode,
+        batch_size=FLAGS.keypoint_batch_size,
+        trt_min_shape=FLAGS.trt_min_shape,
+        trt_max_shape=FLAGS.trt_max_shape,
+        trt_opt_shape=FLAGS.trt_opt_shape,
+        trt_calib_mode=FLAGS.trt_calib_mode,
+        cpu_threads=FLAGS.cpu_threads,
+        enable_mkldnn=FLAGS.enable_mkldnn,
+        use_dark=FLAGS.use_dark)
+    keypoint_arch = topdown_keypoint_detector.pred_config.arch
+    assert KEYPOINT_SUPPORT_MODELS[
+        keypoint_arch] == 'keypoint_topdown', 'Detection-Keypoint unite inference only supports topdown models.'
+
+    # predict from video file or camera video stream
+    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
+        topdown_unite_predict_video(detector, topdown_keypoint_detector,
+                                    FLAGS.camera_id, FLAGS.keypoint_batch_size,
+                                    FLAGS.save_res)
+    else:
+        # predict from image
+        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
+        topdown_unite_predict(detector, topdown_keypoint_detector, img_list,
+                              FLAGS.keypoint_batch_size, FLAGS.save_res)
+        if not FLAGS.run_benchmark:
+            detector.det_times.info(average=True)
+            topdown_keypoint_detector.det_times.info(average=True)
+        else:
+            mode = FLAGS.run_mode
+            det_model_dir = FLAGS.det_model_dir
+            det_model_info = {
+                'model_name': det_model_dir.strip('/').split('/')[-1],
+                'precision': mode.split('_')[-1]
+            }
+            bench_log(detector, img_list, det_model_info, name='Det')
+            keypoint_model_dir = FLAGS.keypoint_model_dir
+            keypoint_model_info = {
+                'model_name': keypoint_model_dir.strip('/').split('/')[-1],
+                'precision': mode.split('_')[-1]
+            }
+            bench_log(topdown_keypoint_detector, img_list, keypoint_model_info,
+                      FLAGS.keypoint_batch_size, 'KeyPoint')
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    print_arguments(FLAGS)
+    FLAGS.device = FLAGS.device.upper()
+    assert FLAGS.device in ['CPU', 'GPU', 'XPU'
+                            ], "device should be CPU, GPU or XPU"
+
+    main()
--- a/third-party/paddle-inference/det_keypoint_unite_utils.py
+++ b/third-party/paddle-inference/det_keypoint_unite_utils.py
@@ -0,0 +1,141 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import ast
+import argparse
+
+
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--det_model_dir",
+        type=str,
+        default=None,
+        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
+              "'infer_cfg.yml', created by tools/export_model.py."),
+        required=True)
+    parser.add_argument(
+        "--keypoint_model_dir",
+        type=str,
+        default=None,
+        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
+              "'infer_cfg.yml', created by tools/export_model.py."),
+        required=True)
+    parser.add_argument(
+        "--image_file", type=str, default=None, help="Path of image file.")
+    parser.add_argument(
+        "--image_dir",
+        type=str,
+        default=None,
+        help="Dir of image file, `image_file` has a higher priority.")
+    parser.add_argument(
+        "--keypoint_batch_size",
+        type=int,
+        default=8,
+        help=("batch_size for keypoint inference. In detection-keypoint unit"
+              "inference, the batch size in detection is 1. Then collate det "
+              "result in batch for keypoint inference."))
+    parser.add_argument(
+        "--video_file",
+        type=str,
+        default=None,
+        help="Path of video file, `video_file` or `camera_id` has a highest priority."
+    )
+    parser.add_argument(
+        "--camera_id",
+        type=int,
+        default=-1,
+        help="device id of camera to predict.")
+    parser.add_argument(
+        "--det_threshold", type=float, default=0.5, help="Threshold of score.")
+    parser.add_argument(
+        "--keypoint_threshold",
+        type=float,
+        default=0.5,
+        help="Threshold of score.")
+    parser.add_argument(
+        "--output_dir",
+        type=str,
+        default="output",
+        help="Directory of output visualization files.")
+    parser.add_argument(
+        "--run_mode",
+        type=str,
+        default='paddle',
+        help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU."
+    )
+    parser.add_argument(
+        "--run_benchmark",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether to predict a image_file repeatedly for benchmark")
+    parser.add_argument(
+        "--enable_mkldnn",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether use mkldnn with CPU.")
+    parser.add_argument(
+        "--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
+    parser.add_argument(
+        "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_max_shape",
+        type=int,
+        default=1280,
+        help="max_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_opt_shape",
+        type=int,
+        default=640,
+        help="opt_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_calib_mode",
+        type=bool,
+        default=False,
+        help="If the model is produced by TRT offline quantitative "
+        "calibration, trt_calib_mode need to set True.")
+    parser.add_argument(
+        '--use_dark',
+        type=ast.literal_eval,
+        default=True,
+        help='whether to use darkpose to get better keypoint position predict ')
+    parser.add_argument(
+        '--save_res',
+        type=bool,
+        default=False,
+        help=(
+            "whether to save predict results to json file"
+            "1) store_res: a list of image_data"
+            "2) image_data: [imageid, rects, [keypoints, scores]]"
+            "3) rects: list of rect [xmin, ymin, xmax, ymax]"
+            "4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list"
+            "5) scores: mean of all joint conf"))
+    parser.add_argument(
+        '--smooth',
+        type=ast.literal_eval,
+        default=False,
+        help='smoothing keypoints for each frame, new incoming keypoints will be more stable.'
+    )
+    parser.add_argument(
+        '--filter_type',
+        type=str,
+        default='OneEuro',
+        help='when set --smooth True, choose filter type you want to use, it can be [OneEuro] or [EMA].'
+    )
+    return parser
--- a/third-party/paddle-inference/infer.py
+++ b/third-party/paddle-inference/infer.py
--- a/third-party/paddle-inference/keypoint_infer.py
+++ b/third-party/paddle-inference/keypoint_infer.py
@@ -0,0 +1,433 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import yaml
+import glob
+from functools import reduce
+
+from PIL import Image
+import cv2
+import math
+import numpy as np
+import paddle
+
+import sys
+# add deploy path of PaddleDetection to sys.path
+parent_path = os.path.abspath(os.path.join(__file__, *(['..'])))
+sys.path.insert(0, parent_path)
+
+from preprocess import preprocess, NormalizeImage, Permute
+from keypoint_preprocess import EvalAffine, TopDownEvalAffine, expand_crop
+from keypoint_postprocess import HrHRNetPostProcess, HRNetPostProcess
+from visualize import visualize_pose
+from paddle.inference import Config
+from paddle.inference import create_predictor
+from utils import argsparser, Timer, get_current_memory_mb
+from benchmark_utils import PaddleInferBenchmark
+from infer import Detector, get_test_images, print_arguments
+
+# Global dictionary
+KEYPOINT_SUPPORT_MODELS = {
+    'HigherHRNet': 'keypoint_bottomup',
+    'HRNet': 'keypoint_topdown'
+}
+
+
+class KeyPointDetector(Detector):
+    """
+    Args:
+        model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
+        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
+        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
+        batch_size (int): size of pre batch in inference
+        trt_min_shape (int): min shape for dynamic shape in trt
+        trt_max_shape (int): max shape for dynamic shape in trt
+        trt_opt_shape (int): opt shape for dynamic shape in trt
+        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
+            calibration, trt_calib_mode need to set True
+        cpu_threads (int): cpu threads
+        enable_mkldnn (bool): whether to open MKLDNN
+        use_dark(bool): whether to use postprocess in DarkPose
+    """
+
+    def __init__(self,
+                 model_dir,
+                 device='CPU',
+                 run_mode='paddle',
+                 batch_size=1,
+                 trt_min_shape=1,
+                 trt_max_shape=1280,
+                 trt_opt_shape=640,
+                 trt_calib_mode=False,
+                 cpu_threads=1,
+                 enable_mkldnn=False,
+                 output_dir='output',
+                 threshold=0.5,
+                 use_dark=True,
+                 use_fd_format=False):
+        super(KeyPointDetector, self).__init__(
+            model_dir=model_dir,
+            device=device,
+            run_mode=run_mode,
+            batch_size=batch_size,
+            trt_min_shape=trt_min_shape,
+            trt_max_shape=trt_max_shape,
+            trt_opt_shape=trt_opt_shape,
+            trt_calib_mode=trt_calib_mode,
+            cpu_threads=cpu_threads,
+            enable_mkldnn=enable_mkldnn,
+            output_dir=output_dir,
+            threshold=threshold,
+            use_fd_format=use_fd_format)
+        self.use_dark = use_dark
+
+    def set_config(self, model_dir, use_fd_format):
+        return PredictConfig_KeyPoint(model_dir, use_fd_format=use_fd_format)
+
+    def get_person_from_rect(self, image, results):
+        # crop the person result from image
+        self.det_times.preprocess_time_s.start()
+        valid_rects = results['boxes']
+        rect_images = []
+        new_rects = []
+        org_rects = []
+        for rect in valid_rects:
+            rect_image, new_rect, org_rect = expand_crop(image, rect)
+            if rect_image is None or rect_image.size == 0:
+                continue
+            rect_images.append(rect_image)
+            new_rects.append(new_rect)
+            org_rects.append(org_rect)
+        self.det_times.preprocess_time_s.end()
+        return rect_images, new_rects, org_rects
+
+    def postprocess(self, inputs, result):
+        np_heatmap = result['heatmap']
+        np_masks = result['masks']
+        # postprocess output of predictor
+        if KEYPOINT_SUPPORT_MODELS[
+                self.pred_config.arch] == 'keypoint_bottomup':
+            results = {}
+            h, w = inputs['im_shape'][0]
+            preds = [np_heatmap]
+            if np_masks is not None:
+                preds += np_masks
+            preds += [h, w]
+            keypoint_postprocess = HrHRNetPostProcess()
+            kpts, scores = keypoint_postprocess(*preds)
+            results['keypoint'] = kpts
+            results['score'] = scores
+            return results
+        elif KEYPOINT_SUPPORT_MODELS[
+                self.pred_config.arch] == 'keypoint_topdown':
+            results = {}
+            imshape = inputs['im_shape'][:, ::-1]
+            center = np.round(imshape / 2.)
+            scale = imshape / 200.
+            keypoint_postprocess = HRNetPostProcess(use_dark=self.use_dark)
+            kpts, scores = keypoint_postprocess(np_heatmap, center, scale)
+            results['keypoint'] = kpts
+            results['score'] = scores
+            return results
+        else:
+            raise ValueError("Unsupported arch: {}, expect {}".format(
+                self.pred_config.arch, KEYPOINT_SUPPORT_MODELS))
+
+    def predict(self, repeats=1):
+        '''
+        Args:
+            repeats (int): repeat number for prediction
+        Returns:
+            results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
+                            matix element:[class, score, x_min, y_min, x_max, y_max]
+                            MaskRCNN's results include 'masks': np.ndarray:
+                            shape: [N, im_h, im_w]
+        '''
+        # model prediction
+        np_heatmap, np_masks = None, None
+        for i in range(repeats):
+            self.predictor.run()
+            output_names = self.predictor.get_output_names()
+            heatmap_tensor = self.predictor.get_output_handle(output_names[0])
+            np_heatmap = heatmap_tensor.copy_to_cpu()
+            if self.pred_config.tagmap:
+                masks_tensor = self.predictor.get_output_handle(output_names[1])
+                heat_k = self.predictor.get_output_handle(output_names[2])
+                inds_k = self.predictor.get_output_handle(output_names[3])
+                np_masks = [
+                    masks_tensor.copy_to_cpu(), heat_k.copy_to_cpu(),
+                    inds_k.copy_to_cpu()
+                ]
+        result = dict(heatmap=np_heatmap, masks=np_masks)
+        return result
+
+    def predict_image(self,
+                      image_list,
+                      run_benchmark=False,
+                      repeats=1,
+                      visual=True):
+        results = []
+        batch_loop_cnt = math.ceil(float(len(image_list)) / self.batch_size)
+        for i in range(batch_loop_cnt):
+            start_index = i * self.batch_size
+            end_index = min((i + 1) * self.batch_size, len(image_list))
+            batch_image_list = image_list[start_index:end_index]
+            if run_benchmark:
+                # preprocess
+                inputs = self.preprocess(batch_image_list)  # warmup
+                self.det_times.preprocess_time_s.start()
+                inputs = self.preprocess(batch_image_list)
+                self.det_times.preprocess_time_s.end()
+
+                # model prediction
+                result_warmup = self.predict(repeats=repeats)  # warmup
+                self.det_times.inference_time_s.start()
+                result = self.predict(repeats=repeats)
+                self.det_times.inference_time_s.end(repeats=repeats)
+
+                # postprocess
+                result_warmup = self.postprocess(inputs, result)  # warmup
+                self.det_times.postprocess_time_s.start()
+                result = self.postprocess(inputs, result)
+                self.det_times.postprocess_time_s.end()
+                self.det_times.img_num += len(batch_image_list)
+
+                cm, gm, gu = get_current_memory_mb()
+                self.cpu_mem += cm
+                self.gpu_mem += gm
+                self.gpu_util += gu
+
+            else:
+                # preprocess
+                self.det_times.preprocess_time_s.start()
+                inputs = self.preprocess(batch_image_list)
+                self.det_times.preprocess_time_s.end()
+
+                # model prediction
+                self.det_times.inference_time_s.start()
+                result = self.predict()
+                self.det_times.inference_time_s.end()
+
+                # postprocess
+                self.det_times.postprocess_time_s.start()
+                result = self.postprocess(inputs, result)
+                self.det_times.postprocess_time_s.end()
+                self.det_times.img_num += len(batch_image_list)
+
+                if visual:
+                    if not os.path.exists(self.output_dir):
+                        os.makedirs(self.output_dir)
+                    visualize(
+                        batch_image_list,
+                        result,
+                        visual_thresh=self.threshold,
+                        save_dir=self.output_dir)
+
+            results.append(result)
+            if visual:
+                print('Test iter {}'.format(i))
+        results = self.merge_batch_result(results)
+        return results
+
+    def predict_video(self, video_file, camera_id):
+        video_name = 'output.mp4'
+        if camera_id != -1:
+            capture = cv2.VideoCapture(camera_id)
+        else:
+            capture = cv2.VideoCapture(video_file)
+            video_name = os.path.split(video_file)[-1]
+        # Get Video info : resolution, fps, frame count
+        width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+        height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        fps = int(capture.get(cv2.CAP_PROP_FPS))
+        frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
+        print("fps: %d, frame_count: %d" % (fps, frame_count))
+
+        if not os.path.exists(self.output_dir):
+            os.makedirs(self.output_dir)
+        out_path = os.path.join(self.output_dir, video_name)
+        fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
+        writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
+        index = 1
+        while (1):
+            ret, frame = capture.read()
+            if not ret:
+                break
+            print('detect frame: %d' % (index))
+            index += 1
+            results = self.predict_image([frame[:, :, ::-1]], visual=False)
+            im_results = {}
+            im_results['keypoint'] = [results['keypoint'], results['score']]
+            im = visualize_pose(
+                frame, im_results, visual_thresh=self.threshold, returnimg=True)
+            writer.write(im)
+            if camera_id != -1:
+                cv2.imshow('Mask Detection', im)
+                if cv2.waitKey(1) & 0xFF == ord('q'):
+                    break
+        writer.release()
+
+
+def create_inputs(imgs, im_info):
+    """generate input for different model type
+    Args:
+        imgs (list(numpy)): list of image (np.ndarray)
+        im_info (list(dict)): list of image info
+    Returns:
+        inputs (dict): input of model
+    """
+    inputs = {}
+    inputs['image'] = np.stack(imgs, axis=0).astype('float32')
+    im_shape = []
+    for e in im_info:
+        im_shape.append(np.array((e['im_shape'])).astype('float32'))
+    inputs['im_shape'] = np.stack(im_shape, axis=0)
+    return inputs
+
+
+class PredictConfig_KeyPoint():
+    """set config of preprocess, postprocess and visualize
+    Args:
+        model_dir (str): root path of model.yml
+    """
+
+    def __init__(self, model_dir, use_fd_format=False):
+        # parsing Yaml config for Preprocess
+        fd_deploy_file = os.path.join(model_dir, 'inference.yml')
+        ppdet_deploy_file = os.path.join(model_dir, 'infer_cfg.yml')
+        if use_fd_format:
+            if not os.path.exists(fd_deploy_file) and os.path.exists(
+                    ppdet_deploy_file):
+                raise RuntimeError(
+                    "Non-FD format model detected. Please set `use_fd_format` to False."
+                )
+            deploy_file = fd_deploy_file
+        else:
+            if not os.path.exists(ppdet_deploy_file) and os.path.exists(
+                    fd_deploy_file):
+                raise RuntimeError(
+                    "FD format model detected. Please set `use_fd_format` to False."
+                )
+            deploy_file = ppdet_deploy_file
+        with open(deploy_file) as f:
+            yml_conf = yaml.safe_load(f)
+        self.check_model(yml_conf)
+        self.arch = yml_conf['arch']
+        self.archcls = KEYPOINT_SUPPORT_MODELS[yml_conf['arch']]
+        self.preprocess_infos = yml_conf['Preprocess']
+        self.min_subgraph_size = yml_conf['min_subgraph_size']
+        self.labels = yml_conf['label_list']
+        self.tagmap = False
+        self.use_dynamic_shape = yml_conf['use_dynamic_shape']
+        if 'keypoint_bottomup' == self.archcls:
+            self.tagmap = True
+        self.print_config()
+
+    def check_model(self, yml_conf):
+        """
+        Raises:
+            ValueError: loaded model not in supported model type 
+        """
+        for support_model in KEYPOINT_SUPPORT_MODELS:
+            if support_model in yml_conf['arch']:
+                return True
+        raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[
+            'arch'], KEYPOINT_SUPPORT_MODELS))
+
+    def print_config(self):
+        print('-----------  Model Configuration -----------')
+        print('%s: %s' % ('Model Arch', self.arch))
+        print('%s: ' % ('Transform Order'))
+        for op_info in self.preprocess_infos:
+            print('--%s: %s' % ('transform op', op_info['type']))
+        print('--------------------------------------------')
+
+
+def visualize(image_list, results, visual_thresh=0.6, save_dir='output'):
+    im_results = {}
+    for i, image_file in enumerate(image_list):
+        skeletons = results['keypoint']
+        scores = results['score']
+        skeleton = skeletons[i:i + 1]
+        score = scores[i:i + 1]
+        im_results['keypoint'] = [skeleton, score]
+        visualize_pose(
+            image_file,
+            im_results,
+            visual_thresh=visual_thresh,
+            save_dir=save_dir)
+
+
+def main():
+    detector = KeyPointDetector(
+        FLAGS.model_dir,
+        device=FLAGS.device,
+        run_mode=FLAGS.run_mode,
+        batch_size=FLAGS.batch_size,
+        trt_min_shape=FLAGS.trt_min_shape,
+        trt_max_shape=FLAGS.trt_max_shape,
+        trt_opt_shape=FLAGS.trt_opt_shape,
+        trt_calib_mode=FLAGS.trt_calib_mode,
+        cpu_threads=FLAGS.cpu_threads,
+        enable_mkldnn=FLAGS.enable_mkldnn,
+        threshold=FLAGS.threshold,
+        output_dir=FLAGS.output_dir,
+        use_dark=FLAGS.use_dark,
+        use_fd_format=FLAGS.use_fd_format)
+
+    # predict from video file or camera video stream
+    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
+        detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
+    else:
+        # predict from image
+        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
+        detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10)
+        if not FLAGS.run_benchmark:
+            detector.det_times.info(average=True)
+        else:
+            mems = {
+                'cpu_rss_mb': detector.cpu_mem / len(img_list),
+                'gpu_rss_mb': detector.gpu_mem / len(img_list),
+                'gpu_util': detector.gpu_util * 100 / len(img_list)
+            }
+            perf_info = detector.det_times.report(average=True)
+            model_dir = FLAGS.model_dir
+            mode = FLAGS.run_mode
+            model_info = {
+                'model_name': model_dir.strip('/').split('/')[-1],
+                'precision': mode.split('_')[-1]
+            }
+            data_info = {
+                'batch_size': 1,
+                'shape': "dynamic_shape",
+                'data_num': perf_info['img_num']
+            }
+            det_log = PaddleInferBenchmark(detector.config, model_info,
+                                           data_info, perf_info, mems)
+            det_log('KeyPoint')
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    print_arguments(FLAGS)
+    FLAGS.device = FLAGS.device.upper()
+    assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
+                            ], "device should be CPU, GPU, XPU or NPU"
+    assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device"
+
+    main()
--- a/third-party/paddle-inference/keypoint_postprocess.py
+++ b/third-party/paddle-inference/keypoint_postprocess.py
@@ -0,0 +1,369 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. 
+#   
+# Licensed under the Apache License, Version 2.0 (the "License");   
+# you may not use this file except in compliance with the License.  
+# You may obtain a copy of the License at   
+#   
+#     http://www.apache.org/licenses/LICENSE-2.0    
+#   
+# Unless required by applicable law or agreed to in writing, software   
+# distributed under the License is distributed on an "AS IS" BASIS, 
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
+# See the License for the specific language governing permissions and   
+# limitations under the License.
+
+from scipy.optimize import linear_sum_assignment
+from collections import abc, defaultdict
+import cv2
+import numpy as np
+import math
+import paddle
+import paddle.nn as nn
+from keypoint_preprocess import get_affine_mat_kernel, get_affine_transform
+
+
+class HrHRNetPostProcess(object):
+    """
+    HrHRNet postprocess contain:
+        1) get topk keypoints in the output heatmap
+        2) sample the tagmap's value corresponding to each of the topk coordinate
+        3) match different joints to combine to some people with Hungary algorithm
+        4) adjust the coordinate by +-0.25 to decrease error std
+        5) salvage missing joints by check positivity of heatmap - tagdiff_norm
+    Args:
+        max_num_people (int): max number of people support in postprocess
+        heat_thresh (float): value of topk below this threshhold will be ignored
+        tag_thresh (float): coord's value sampled in tagmap below this threshold belong to same people for init
+
+        inputs(list[heatmap]): the output list of model, [heatmap, heatmap_maxpool, tagmap], heatmap_maxpool used to get topk
+        original_height, original_width (float): the original image size
+    """
+
+    def __init__(self, max_num_people=30, heat_thresh=0.2, tag_thresh=1.):
+        self.max_num_people = max_num_people
+        self.heat_thresh = heat_thresh
+        self.tag_thresh = tag_thresh
+
+    def lerp(self, j, y, x, heatmap):
+        H, W = heatmap.shape[-2:]
+        left = np.clip(x - 1, 0, W - 1)
+        right = np.clip(x + 1, 0, W - 1)
+        up = np.clip(y - 1, 0, H - 1)
+        down = np.clip(y + 1, 0, H - 1)
+        offset_y = np.where(heatmap[j, down, x] > heatmap[j, up, x], 0.25,
+                            -0.25)
+        offset_x = np.where(heatmap[j, y, right] > heatmap[j, y, left], 0.25,
+                            -0.25)
+        return offset_y + 0.5, offset_x + 0.5
+
+    def __call__(self, heatmap, tagmap, heat_k, inds_k, original_height,
+                 original_width):
+
+        N, J, H, W = heatmap.shape
+        assert N == 1, "only support batch size 1"
+        heatmap = heatmap[0]
+        tagmap = tagmap[0]
+        heats = heat_k[0]
+        inds_np = inds_k[0]
+        y = inds_np // W
+        x = inds_np % W
+        tags = tagmap[np.arange(J)[None, :].repeat(self.max_num_people),
+                      y.flatten(), x.flatten()].reshape(J, -1, tagmap.shape[-1])
+        coords = np.stack((y, x), axis=2)
+        # threshold
+        mask = heats > self.heat_thresh
+        # cluster
+        cluster = defaultdict(lambda: {
+            'coords': np.zeros((J, 2), dtype=np.float32),
+            'scores': np.zeros(J, dtype=np.float32),
+            'tags': []
+        })
+        for jid, m in enumerate(mask):
+            num_valid = m.sum()
+            if num_valid == 0:
+                continue
+            valid_inds = np.where(m)[0]
+            valid_tags = tags[jid, m, :]
+            if len(cluster) == 0:  # initialize
+                for i in valid_inds:
+                    tag = tags[jid, i]
+                    key = tag[0]
+                    cluster[key]['tags'].append(tag)
+                    cluster[key]['scores'][jid] = heats[jid, i]
+                    cluster[key]['coords'][jid] = coords[jid, i]
+                continue
+            candidates = list(cluster.keys())[:self.max_num_people]
+            centroids = [
+                np.mean(
+                    cluster[k]['tags'], axis=0) for k in candidates
+            ]
+            num_clusters = len(centroids)
+            # shape is (num_valid, num_clusters, tag_dim)
+            dist = valid_tags[:, None, :] - np.array(centroids)[None, ...]
+            l2_dist = np.linalg.norm(dist, ord=2, axis=2)
+            # modulate dist with heat value, see `use_detection_val`
+            cost = np.round(l2_dist) * 100 - heats[jid, m, None]
+            # pad the cost matrix, otherwise new pose are ignored
+            if num_valid > num_clusters:
+                cost = np.pad(cost, ((0, 0), (0, num_valid - num_clusters)),
+                              'constant',
+                              constant_values=((0, 0), (0, 1e-10)))
+            rows, cols = linear_sum_assignment(cost)
+            for y, x in zip(rows, cols):
+                tag = tags[jid, y]
+                if y < num_valid and x < num_clusters and \
+                   l2_dist[y, x] < self.tag_thresh:
+                    key = candidates[x]  # merge to cluster
+                else:
+                    key = tag[0]  # initialize new cluster
+                cluster[key]['tags'].append(tag)
+                cluster[key]['scores'][jid] = heats[jid, y]
+                cluster[key]['coords'][jid] = coords[jid, y]
+
+        # shape is [k, J, 2] and [k, J]
+        pose_tags = np.array([cluster[k]['tags'] for k in cluster])
+        pose_coords = np.array([cluster[k]['coords'] for k in cluster])
+        pose_scores = np.array([cluster[k]['scores'] for k in cluster])
+        valid = pose_scores > 0
+
+        pose_kpts = np.zeros((pose_scores.shape[0], J, 3), dtype=np.float32)
+        if valid.sum() == 0:
+            return pose_kpts, pose_kpts
+
+        # refine coords
+        valid_coords = pose_coords[valid].astype(np.int32)
+        y = valid_coords[..., 0].flatten()
+        x = valid_coords[..., 1].flatten()
+        _, j = np.nonzero(valid)
+        offsets = self.lerp(j, y, x, heatmap)
+        pose_coords[valid, 0] += offsets[0]
+        pose_coords[valid, 1] += offsets[1]
+
+        # mean score before salvage
+        mean_score = pose_scores.mean(axis=1)
+        pose_kpts[valid, 2] = pose_scores[valid]
+
+        # salvage missing joints
+        if True:
+            for pid, coords in enumerate(pose_coords):
+                tag_mean = np.array(pose_tags[pid]).mean(axis=0)
+                norm = np.sum((tagmap - tag_mean)**2, axis=3)**0.5
+                score = heatmap - np.round(norm)  # (J, H, W)
+                flat_score = score.reshape(J, -1)
+                max_inds = np.argmax(flat_score, axis=1)
+                max_scores = np.max(flat_score, axis=1)
+                salvage_joints = (pose_scores[pid] == 0) & (max_scores > 0)
+                if salvage_joints.sum() == 0:
+                    continue
+                y = max_inds[salvage_joints] // W
+                x = max_inds[salvage_joints] % W
+                offsets = self.lerp(salvage_joints.nonzero()[0], y, x, heatmap)
+                y = y.astype(np.float32) + offsets[0]
+                x = x.astype(np.float32) + offsets[1]
+                pose_coords[pid][salvage_joints, 0] = y
+                pose_coords[pid][salvage_joints, 1] = x
+                pose_kpts[pid][salvage_joints, 2] = max_scores[salvage_joints]
+        pose_kpts[..., :2] = transpred(pose_coords[..., :2][..., ::-1],
+                                       original_height, original_width,
+                                       min(H, W))
+        return pose_kpts, mean_score
+
+
+def transpred(kpts, h, w, s):
+    trans, _ = get_affine_mat_kernel(h, w, s, inv=True)
+
+    return warp_affine_joints(kpts[..., :2].copy(), trans)
+
+
+def warp_affine_joints(joints, mat):
+    """Apply affine transformation defined by the transform matrix on the
+    joints.
+
+    Args:
+        joints (np.ndarray[..., 2]): Origin coordinate of joints.
+        mat (np.ndarray[3, 2]): The affine matrix.
+
+    Returns:
+        matrix (np.ndarray[..., 2]): Result coordinate of joints.
+    """
+    joints = np.array(joints)
+    shape = joints.shape
+    joints = joints.reshape(-1, 2)
+    return np.dot(np.concatenate(
+        (joints, joints[:, 0:1] * 0 + 1), axis=1),
+                  mat.T).reshape(shape)
+
+
+class HRNetPostProcess(object):
+    def __init__(self, use_dark=True):
+        self.use_dark = use_dark
+
+    def flip_back(self, output_flipped, matched_parts):
+        assert output_flipped.ndim == 4,\
+                'output_flipped should be [batch_size, num_joints, height, width]'
+
+        output_flipped = output_flipped[:, :, :, ::-1]
+
+        for pair in matched_parts:
+            tmp = output_flipped[:, pair[0], :, :].copy()
+            output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
+            output_flipped[:, pair[1], :, :] = tmp
+
+        return output_flipped
+
+    def get_max_preds(self, heatmaps):
+        """get predictions from score maps
+
+        Args:
+            heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
+
+        Returns:
+            preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
+            maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints
+        """
+        assert isinstance(heatmaps,
+                          np.ndarray), 'heatmaps should be numpy.ndarray'
+        assert heatmaps.ndim == 4, 'batch_images should be 4-ndim'
+
+        batch_size = heatmaps.shape[0]
+        num_joints = heatmaps.shape[1]
+        width = heatmaps.shape[3]
+        heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1))
+        idx = np.argmax(heatmaps_reshaped, 2)
+        maxvals = np.amax(heatmaps_reshaped, 2)
+
+        maxvals = maxvals.reshape((batch_size, num_joints, 1))
+        idx = idx.reshape((batch_size, num_joints, 1))
+
+        preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
+
+        preds[:, :, 0] = (preds[:, :, 0]) % width
+        preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
+
+        pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
+        pred_mask = pred_mask.astype(np.float32)
+
+        preds *= pred_mask
+
+        return preds, maxvals
+
+    def gaussian_blur(self, heatmap, kernel):
+        border = (kernel - 1) // 2
+        batch_size = heatmap.shape[0]
+        num_joints = heatmap.shape[1]
+        height = heatmap.shape[2]
+        width = heatmap.shape[3]
+        for i in range(batch_size):
+            for j in range(num_joints):
+                origin_max = np.max(heatmap[i, j])
+                dr = np.zeros((height + 2 * border, width + 2 * border))
+                dr[border:-border, border:-border] = heatmap[i, j].copy()
+                dr = cv2.GaussianBlur(dr, (kernel, kernel), 0)
+                heatmap[i, j] = dr[border:-border, border:-border].copy()
+                heatmap[i, j] *= origin_max / np.max(heatmap[i, j])
+        return heatmap
+
+    def dark_parse(self, hm, coord):
+        heatmap_height = hm.shape[0]
+        heatmap_width = hm.shape[1]
+        px = int(coord[0])
+        py = int(coord[1])
+        if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2:
+            dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1])
+            dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px])
+            dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2])
+            dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \
+                + hm[py-1][px-1])
+            dyy = 0.25 * (
+                hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px])
+            derivative = np.matrix([[dx], [dy]])
+            hessian = np.matrix([[dxx, dxy], [dxy, dyy]])
+            if dxx * dyy - dxy**2 != 0:
+                hessianinv = hessian.I
+                offset = -hessianinv * derivative
+                offset = np.squeeze(np.array(offset.T), axis=0)
+                coord += offset
+        return coord
+
+    def dark_postprocess(self, hm, coords, kernelsize):
+        """
+        refer to https://github.com/ilovepose/DarkPose/lib/core/inference.py
+
+        """
+        hm = self.gaussian_blur(hm, kernelsize)
+        hm = np.maximum(hm, 1e-10)
+        hm = np.log(hm)
+        for n in range(coords.shape[0]):
+            for p in range(coords.shape[1]):
+                coords[n, p] = self.dark_parse(hm[n][p], coords[n][p])
+        return coords
+
+    def get_final_preds(self, heatmaps, center, scale, kernelsize=3):
+        """the highest heatvalue location with a quarter offset in the
+        direction from the highest response to the second highest response.
+
+        Args:
+            heatmaps (numpy.ndarray): The predicted heatmaps
+            center (numpy.ndarray): The boxes center
+            scale (numpy.ndarray): The scale factor
+
+        Returns:
+            preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
+            maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints
+        """
+
+        coords, maxvals = self.get_max_preds(heatmaps)
+
+        heatmap_height = heatmaps.shape[2]
+        heatmap_width = heatmaps.shape[3]
+
+        if self.use_dark:
+            coords = self.dark_postprocess(heatmaps, coords, kernelsize)
+        else:
+            for n in range(coords.shape[0]):
+                for p in range(coords.shape[1]):
+                    hm = heatmaps[n][p]
+                    px = int(math.floor(coords[n][p][0] + 0.5))
+                    py = int(math.floor(coords[n][p][1] + 0.5))
+                    if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1:
+                        diff = np.array([
+                            hm[py][px + 1] - hm[py][px - 1],
+                            hm[py + 1][px] - hm[py - 1][px]
+                        ])
+                        coords[n][p] += np.sign(diff) * .25
+        preds = coords.copy()
+
+        # Transform back
+        for i in range(coords.shape[0]):
+            preds[i] = transform_preds(coords[i], center[i], scale[i],
+                                       [heatmap_width, heatmap_height])
+
+        return preds, maxvals
+
+    def __call__(self, output, center, scale):
+        preds, maxvals = self.get_final_preds(output, center, scale)
+        return np.concatenate(
+            (preds, maxvals), axis=-1), np.mean(
+                maxvals, axis=1)
+
+
+def transform_preds(coords, center, scale, output_size):
+    target_coords = np.zeros(coords.shape)
+    trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1)
+    for p in range(coords.shape[0]):
+        target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
+    return target_coords
+
+
+def affine_transform(pt, t):
+    new_pt = np.array([pt[0], pt[1], 1.]).T
+    new_pt = np.dot(t, new_pt)
+    return new_pt[:2]
+
+
+def translate_to_ori_images(keypoint_result, batch_records):
+    kpts = keypoint_result['keypoint']
+    scores = keypoint_result['score']
+    kpts[..., 0] += batch_records[:, 0:1]
+    kpts[..., 1] += batch_records[:, 1:2]
+    return kpts, scores
--- a/third-party/paddle-inference/keypoint_preprocess.py
+++ b/third-party/paddle-inference/keypoint_preprocess.py
@@ -0,0 +1,243 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+this code is based on https://github.com/open-mmlab/mmpose/mmpose/core/post_processing/post_transforms.py
+"""
+import cv2
+import numpy as np
+
+
+class EvalAffine(object):
+    def __init__(self, size, stride=64):
+        super(EvalAffine, self).__init__()
+        self.size = size
+        self.stride = stride
+
+    def __call__(self, image, im_info):
+        s = self.size
+        h, w, _ = image.shape
+        trans, size_resized = get_affine_mat_kernel(h, w, s, inv=False)
+        image_resized = cv2.warpAffine(image, trans, size_resized)
+        return image_resized, im_info
+
+
+def get_affine_mat_kernel(h, w, s, inv=False):
+    if w < h:
+        w_ = s
+        h_ = int(np.ceil((s / w * h) / 64.) * 64)
+        scale_w = w
+        scale_h = h_ / w_ * w
+
+    else:
+        h_ = s
+        w_ = int(np.ceil((s / h * w) / 64.) * 64)
+        scale_h = h
+        scale_w = w_ / h_ * h
+
+    center = np.array([np.round(w / 2.), np.round(h / 2.)])
+
+    size_resized = (w_, h_)
+    trans = get_affine_transform(
+        center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv)
+
+    return trans, size_resized
+
+
+def get_affine_transform(center,
+                         input_size,
+                         rot,
+                         output_size,
+                         shift=(0., 0.),
+                         inv=False):
+    """Get the affine transform matrix, given the center/scale/rot/output_size.
+
+    Args:
+        center (np.ndarray[2, ]): Center of the bounding box (x, y).
+        scale (np.ndarray[2, ]): Scale of the bounding box
+            wrt [width, height].
+        rot (float): Rotation angle (degree).
+        output_size (np.ndarray[2, ]): Size of the destination heatmaps.
+        shift (0-100%): Shift translation ratio wrt the width/height.
+            Default (0., 0.).
+        inv (bool): Option to inverse the affine transform direction.
+            (inv=False: src->dst or inv=True: dst->src)
+
+    Returns:
+        np.ndarray: The transform matrix.
+    """
+    assert len(center) == 2
+    assert len(output_size) == 2
+    assert len(shift) == 2
+    if not isinstance(input_size, (np.ndarray, list)):
+        input_size = np.array([input_size, input_size], dtype=np.float32)
+    scale_tmp = input_size
+
+    shift = np.array(shift)
+    src_w = scale_tmp[0]
+    dst_w = output_size[0]
+    dst_h = output_size[1]
+
+    rot_rad = np.pi * rot / 180
+    src_dir = rotate_point([0., src_w * -0.5], rot_rad)
+    dst_dir = np.array([0., dst_w * -0.5])
+
+    src = np.zeros((3, 2), dtype=np.float32)
+    src[0, :] = center + scale_tmp * shift
+    src[1, :] = center + src_dir + scale_tmp * shift
+    src[2, :] = _get_3rd_point(src[0, :], src[1, :])
+
+    dst = np.zeros((3, 2), dtype=np.float32)
+    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
+    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
+    dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
+
+    if inv:
+        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
+    else:
+        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
+
+    return trans
+
+
+def get_warp_matrix(theta, size_input, size_dst, size_target):
+    """This code is based on 
+        https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py
+
+        Calculate the transformation matrix under the constraint of unbiased.
+    Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
+    Data Processing for Human Pose Estimation (CVPR 2020).
+
+    Args:
+        theta (float): Rotation angle in degrees.
+        size_input (np.ndarray): Size of input image [w, h].
+        size_dst (np.ndarray): Size of output image [w, h].
+        size_target (np.ndarray): Size of ROI in input plane [w, h].
+
+    Returns:
+        matrix (np.ndarray): A matrix for transformation.
+    """
+    theta = np.deg2rad(theta)
+    matrix = np.zeros((2, 3), dtype=np.float32)
+    scale_x = size_dst[0] / size_target[0]
+    scale_y = size_dst[1] / size_target[1]
+    matrix[0, 0] = np.cos(theta) * scale_x
+    matrix[0, 1] = -np.sin(theta) * scale_x
+    matrix[0, 2] = scale_x * (
+        -0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] *
+        np.sin(theta) + 0.5 * size_target[0])
+    matrix[1, 0] = np.sin(theta) * scale_y
+    matrix[1, 1] = np.cos(theta) * scale_y
+    matrix[1, 2] = scale_y * (
+        -0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] *
+        np.cos(theta) + 0.5 * size_target[1])
+    return matrix
+
+
+def rotate_point(pt, angle_rad):
+    """Rotate a point by an angle.
+
+    Args:
+        pt (list[float]): 2 dimensional point to be rotated
+        angle_rad (float): rotation angle by radian
+
+    Returns:
+        list[float]: Rotated point.
+    """
+    assert len(pt) == 2
+    sn, cs = np.sin(angle_rad), np.cos(angle_rad)
+    new_x = pt[0] * cs - pt[1] * sn
+    new_y = pt[0] * sn + pt[1] * cs
+    rotated_pt = [new_x, new_y]
+
+    return rotated_pt
+
+
+def _get_3rd_point(a, b):
+    """To calculate the affine matrix, three pairs of points are required. This
+    function is used to get the 3rd point, given 2D points a & b.
+
+    The 3rd point is defined by rotating vector `a - b` by 90 degrees
+    anticlockwise, using b as the rotation center.
+
+    Args:
+        a (np.ndarray): point(x,y)
+        b (np.ndarray): point(x,y)
+
+    Returns:
+        np.ndarray: The 3rd point.
+    """
+    assert len(a) == 2
+    assert len(b) == 2
+    direction = a - b
+    third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
+
+    return third_pt
+
+
+class TopDownEvalAffine(object):
+    """apply affine transform to image and coords
+
+    Args:
+        trainsize (list): [w, h], the standard size used to train
+        use_udp (bool): whether to use Unbiased Data Processing.
+        records(dict): the dict contained the image and coords
+
+    Returns:
+        records (dict): contain the image and coords after tranformed
+
+    """
+
+    def __init__(self, trainsize, use_udp=False):
+        self.trainsize = trainsize
+        self.use_udp = use_udp
+
+    def __call__(self, image, im_info):
+        rot = 0
+        imshape = im_info['im_shape'][::-1]
+        center = im_info['center'] if 'center' in im_info else imshape / 2.
+        scale = im_info['scale'] if 'scale' in im_info else imshape
+        if self.use_udp:
+            trans = get_warp_matrix(
+                rot, center * 2.0,
+                [self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+        else:
+            trans = get_affine_transform(center, scale, rot, self.trainsize)
+            image = cv2.warpAffine(
+                image,
+                trans, (int(self.trainsize[0]), int(self.trainsize[1])),
+                flags=cv2.INTER_LINEAR)
+
+        return image, im_info
+
+
+def expand_crop(images, rect, expand_ratio=0.3):
+    imgh, imgw, c = images.shape
+    label, conf, xmin, ymin, xmax, ymax = [int(x) for x in rect.tolist()]
+    if label != 0:
+        return None, None, None
+    org_rect = [xmin, ymin, xmax, ymax]
+    h_half = (ymax - ymin) * (1 + expand_ratio) / 2.
+    w_half = (xmax - xmin) * (1 + expand_ratio) / 2.
+    if h_half > w_half * 4 / 3:
+        w_half = h_half * 0.75
+    center = [(ymin + ymax) / 2., (xmin + xmax) / 2.]
+    ymin = max(0, int(center[0] - h_half))
+    ymax = min(imgh - 1, int(center[0] + h_half))
+    xmin = max(0, int(center[1] - w_half))
+    xmax = min(imgw - 1, int(center[1] + w_half))
+    return images[ymin:ymax, xmin:xmax, :], [xmin, ymin, xmax, ymax], org_rect
--- a/third-party/paddle-inference/mot_centertrack_infer.py
+++ b/third-party/paddle-inference/mot_centertrack_infer.py
@@ -0,0 +1,501 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import copy
+import math
+import time
+import yaml
+import cv2
+import numpy as np
+from collections import defaultdict
+import paddle
+
+from benchmark_utils import PaddleInferBenchmark
+from utils import gaussian_radius, gaussian2D, draw_umich_gaussian
+from preprocess import preprocess, decode_image, WarpAffine, NormalizeImage, Permute
+from utils import argsparser, Timer, get_current_memory_mb
+from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig
+from keypoint_preprocess import get_affine_transform
+
+# add python path
+import sys
+parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
+sys.path.insert(0, parent_path)
+
+from pptracking.python.mot import CenterTracker
+from pptracking.python.mot.utils import MOTTimer, write_mot_results
+from pptracking.python.mot.visualize import plot_tracking
+
+
+def transform_preds_with_trans(coords, trans):
+    target_coords = np.ones((coords.shape[0], 3), np.float32)
+    target_coords[:, :2] = coords
+    target_coords = np.dot(trans, target_coords.transpose()).transpose()
+    return target_coords[:, :2]
+
+
+def affine_transform(pt, t):
+    new_pt = np.array([pt[0], pt[1], 1.]).T
+    new_pt = np.dot(t, new_pt)
+    return new_pt[:2]
+
+
+def affine_transform_bbox(bbox, trans, width, height):
+    bbox = np.array(copy.deepcopy(bbox), dtype=np.float32)
+    bbox[:2] = affine_transform(bbox[:2], trans)
+    bbox[2:] = affine_transform(bbox[2:], trans)
+    bbox[[0, 2]] = np.clip(bbox[[0, 2]], 0, width - 1)
+    bbox[[1, 3]] = np.clip(bbox[[1, 3]], 0, height - 1)
+    return bbox
+
+
+class CenterTrack(Detector):
+    """
+    Args:
+        model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
+        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
+        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
+        batch_size (int): size of pre batch in inference
+        trt_min_shape (int): min shape for dynamic shape in trt
+        trt_max_shape (int): max shape for dynamic shape in trt
+        trt_opt_shape (int): opt shape for dynamic shape in trt
+        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
+            calibration, trt_calib_mode need to set True
+        cpu_threads (int): cpu threads
+        enable_mkldnn (bool): whether to open MKLDNN
+        output_dir (string): The path of output, default as 'output'
+        threshold (float): Score threshold of the detected bbox, default as 0.5
+        save_images (bool): Whether to save visualization image results, default as False
+        save_mot_txts (bool): Whether to save tracking results (txt), default as False
+    """
+
+    def __init__(
+            self,
+            model_dir,
+            tracker_config=None,
+            device='CPU',
+            run_mode='paddle',
+            batch_size=1,
+            trt_min_shape=1,
+            trt_max_shape=960,
+            trt_opt_shape=544,
+            trt_calib_mode=False,
+            cpu_threads=1,
+            enable_mkldnn=False,
+            output_dir='output',
+            threshold=0.5,
+            save_images=False,
+            save_mot_txts=False, ):
+        super(CenterTrack, self).__init__(
+            model_dir=model_dir,
+            device=device,
+            run_mode=run_mode,
+            batch_size=batch_size,
+            trt_min_shape=trt_min_shape,
+            trt_max_shape=trt_max_shape,
+            trt_opt_shape=trt_opt_shape,
+            trt_calib_mode=trt_calib_mode,
+            cpu_threads=cpu_threads,
+            enable_mkldnn=enable_mkldnn,
+            output_dir=output_dir,
+            threshold=threshold, )
+        self.save_images = save_images
+        self.save_mot_txts = save_mot_txts
+        assert batch_size == 1, "MOT model only supports batch_size=1."
+        self.det_times = Timer(with_tracker=True)
+        self.num_classes = len(self.pred_config.labels)
+
+        # tracker config
+        cfg = self.pred_config.tracker
+        min_box_area = cfg.get('min_box_area', -1)
+        vertical_ratio = cfg.get('vertical_ratio', -1)
+        track_thresh = cfg.get('track_thresh', 0.4)
+        pre_thresh = cfg.get('pre_thresh', 0.5)
+
+        self.tracker = CenterTracker(
+            num_classes=self.num_classes,
+            min_box_area=min_box_area,
+            vertical_ratio=vertical_ratio,
+            track_thresh=track_thresh,
+            pre_thresh=pre_thresh)
+
+        self.pre_image = None
+
+    def get_additional_inputs(self, dets, meta, with_hm=True):
+        # Render input heatmap from previous trackings.
+        trans_input = meta['trans_input']
+        inp_width, inp_height = int(meta['inp_width']), int(meta['inp_height'])
+        input_hm = np.zeros((1, inp_height, inp_width), dtype=np.float32)
+
+        for det in dets:
+            if det['score'] < self.tracker.pre_thresh:
+                continue
+            bbox = affine_transform_bbox(det['bbox'], trans_input, inp_width,
+                                         inp_height)
+            h, w = bbox[3] - bbox[1], bbox[2] - bbox[0]
+            if (h > 0 and w > 0):
+                radius = gaussian_radius(
+                    (math.ceil(h), math.ceil(w)), min_overlap=0.7)
+                radius = max(0, int(radius))
+                ct = np.array(
+                    [(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2],
+                    dtype=np.float32)
+                ct_int = ct.astype(np.int32)
+                if with_hm:
+                    input_hm[0] = draw_umich_gaussian(input_hm[0], ct_int,
+                                                      radius)
+        if with_hm:
+            input_hm = input_hm[np.newaxis]
+        return input_hm
+
+    def preprocess(self, image_list):
+        preprocess_ops = []
+        for op_info in self.pred_config.preprocess_infos:
+            new_op_info = op_info.copy()
+            op_type = new_op_info.pop('type')
+            preprocess_ops.append(eval(op_type)(**new_op_info))
+
+        assert len(image_list) == 1, 'MOT only support bs=1'
+        im_path = image_list[0]
+        im, im_info = preprocess(im_path, preprocess_ops)
+        #inputs = create_inputs(im, im_info)
+        inputs = {}
+        inputs['image'] = np.array((im, )).astype('float32')
+        inputs['im_shape'] = np.array((im_info['im_shape'], )).astype('float32')
+        inputs['scale_factor'] = np.array(
+            (im_info['scale_factor'], )).astype('float32')
+
+        inputs['trans_input'] = im_info['trans_input']
+        inputs['inp_width'] = im_info['inp_width']
+        inputs['inp_height'] = im_info['inp_height']
+        inputs['center'] = im_info['center']
+        inputs['scale'] = im_info['scale']
+        inputs['out_height'] = im_info['out_height']
+        inputs['out_width'] = im_info['out_width']
+
+        if self.pre_image is None:
+            self.pre_image = inputs['image']
+            # initializing tracker for the first frame
+            self.tracker.init_track([])
+        inputs['pre_image'] = self.pre_image
+        self.pre_image = inputs['image']  # Note: update for next image
+
+        # render input heatmap from tracker status
+        pre_hm = self.get_additional_inputs(
+            self.tracker.tracks, inputs, with_hm=True)
+        inputs['pre_hm'] = pre_hm  #.to_tensor(pre_hm)
+
+        input_names = self.predictor.get_input_names()
+        for i in range(len(input_names)):
+            input_tensor = self.predictor.get_input_handle(input_names[i])
+            if input_names[i] == 'x':
+                input_tensor.copy_from_cpu(inputs['image'])
+            else:
+                input_tensor.copy_from_cpu(inputs[input_names[i]])
+
+        return inputs
+
+    def postprocess(self, inputs, result):
+        # postprocess output of predictor
+        np_bboxes = result['bboxes']
+        if np_bboxes.shape[0] <= 0:
+            print('[WARNNING] No object detected and tracked.')
+            result = {'bboxes': np.zeros([0, 6]), 'cts': None, 'tracking': None}
+            return result
+        result = {k: v for k, v in result.items() if v is not None}
+        return result
+
+    def centertrack_post_process(self, dets, meta, out_thresh):
+        if not ('bboxes' in dets):
+            return [{}]
+
+        preds = []
+        c, s = meta['center'], meta['scale']
+        h, w = meta['out_height'], meta['out_width']
+        trans = get_affine_transform(
+            center=c,
+            input_size=s,
+            rot=0,
+            output_size=[w, h],
+            shift=(0., 0.),
+            inv=True).astype(np.float32)
+        for i, dets_bbox in enumerate(dets['bboxes']):
+            if dets_bbox[1] < out_thresh:
+                break
+            item = {}
+            item['score'] = dets_bbox[1]
+            item['class'] = int(dets_bbox[0]) + 1
+            item['ct'] = transform_preds_with_trans(
+                dets['cts'][i].reshape([1, 2]), trans).reshape(2)
+
+            if 'tracking' in dets:
+                tracking = transform_preds_with_trans(
+                    (dets['tracking'][i] + dets['cts'][i]).reshape([1, 2]),
+                    trans).reshape(2)
+                item['tracking'] = tracking - item['ct']
+
+            if 'bboxes' in dets:
+                bbox = transform_preds_with_trans(
+                    dets_bbox[2:6].reshape([2, 2]), trans).reshape(4)
+                item['bbox'] = bbox
+
+            preds.append(item)
+        return preds
+
+    def tracking(self, inputs, det_results):
+        result = self.centertrack_post_process(det_results, inputs,
+                                               self.tracker.out_thresh)
+        online_targets = self.tracker.update(result)
+
+        online_tlwhs, online_scores, online_ids = [], [], []
+        for t in online_targets:
+            bbox = t['bbox']
+            tlwh = [bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1]]
+            tscore = float(t['score'])
+            tid = int(t['tracking_id'])
+            if tlwh[2] * tlwh[3] > 0:
+                online_tlwhs.append(tlwh)
+                online_ids.append(tid)
+                online_scores.append(tscore)
+        return online_tlwhs, online_scores, online_ids
+
+    def predict(self, repeats=1):
+        '''
+        Args:
+            repeats (int): repeats number for prediction
+        Returns:
+            result (dict): include 'bboxes', 'cts' and 'tracking':
+                np.ndarray: shape:[N,6],[N,2] and [N,2], N: number of box
+        '''
+        # model prediction
+        np_bboxes, np_cts, np_tracking = None, None, None
+        for i in range(repeats):
+            self.predictor.run()
+            output_names = self.predictor.get_output_names()
+            bboxes_tensor = self.predictor.get_output_handle(output_names[0])
+            np_bboxes = bboxes_tensor.copy_to_cpu()
+            cts_tensor = self.predictor.get_output_handle(output_names[1])
+            np_cts = cts_tensor.copy_to_cpu()
+            tracking_tensor = self.predictor.get_output_handle(output_names[2])
+            np_tracking = tracking_tensor.copy_to_cpu()
+
+        result = dict(bboxes=np_bboxes, cts=np_cts, tracking=np_tracking)
+        return result
+
+    def predict_image(self,
+                      image_list,
+                      run_benchmark=False,
+                      repeats=1,
+                      visual=True,
+                      seq_name=None):
+        mot_results = []
+        num_classes = self.num_classes
+        image_list.sort()
+        ids2names = self.pred_config.labels
+        data_type = 'mcmot' if num_classes > 1 else 'mot'
+        for frame_id, img_file in enumerate(image_list):
+            batch_image_list = [img_file]  # bs=1 in MOT model
+            if run_benchmark:
+                # preprocess
+                inputs = self.preprocess(batch_image_list)  # warmup
+                self.det_times.preprocess_time_s.start()
+                inputs = self.preprocess(batch_image_list)
+                self.det_times.preprocess_time_s.end()
+
+                # model prediction
+                result_warmup = self.predict(repeats=repeats)  # warmup
+                self.det_times.inference_time_s.start()
+                result = self.predict(repeats=repeats)
+                self.det_times.inference_time_s.end(repeats=repeats)
+
+                # postprocess
+                result_warmup = self.postprocess(inputs, result)  # warmup
+                self.det_times.postprocess_time_s.start()
+                det_result = self.postprocess(inputs, result)
+                self.det_times.postprocess_time_s.end()
+
+                # tracking
+                result_warmup = self.tracking(inputs, det_result)
+                self.det_times.tracking_time_s.start()
+                online_tlwhs, online_scores, online_ids = self.tracking(
+                    inputs, det_result)
+                self.det_times.tracking_time_s.end()
+                self.det_times.img_num += 1
+
+                cm, gm, gu = get_current_memory_mb()
+                self.cpu_mem += cm
+                self.gpu_mem += gm
+                self.gpu_util += gu
+
+            else:
+                self.det_times.preprocess_time_s.start()
+                inputs = self.preprocess(batch_image_list)
+                self.det_times.preprocess_time_s.end()
+
+                self.det_times.inference_time_s.start()
+                result = self.predict()
+                self.det_times.inference_time_s.end()
+
+                self.det_times.postprocess_time_s.start()
+                det_result = self.postprocess(inputs, result)
+                self.det_times.postprocess_time_s.end()
+
+                # tracking process
+                self.det_times.tracking_time_s.start()
+                online_tlwhs, online_scores, online_ids = self.tracking(
+                    inputs, det_result)
+                self.det_times.tracking_time_s.end()
+                self.det_times.img_num += 1
+
+            if visual:
+                if len(image_list) > 1 and frame_id % 10 == 0:
+                    print('Tracking frame {}'.format(frame_id))
+                frame, _ = decode_image(img_file, {})
+
+                im = plot_tracking(
+                    frame,
+                    online_tlwhs,
+                    online_ids,
+                    online_scores,
+                    frame_id=frame_id,
+                    ids2names=ids2names)
+                if seq_name is None:
+                    seq_name = image_list[0].split('/')[-2]
+                save_dir = os.path.join(self.output_dir, seq_name)
+                if not os.path.exists(save_dir):
+                    os.makedirs(save_dir)
+                cv2.imwrite(
+                    os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im)
+
+            mot_results.append([online_tlwhs, online_scores, online_ids])
+        return mot_results
+
+    def predict_video(self, video_file, camera_id):
+        video_out_name = 'mot_output.mp4'
+        if camera_id != -1:
+            capture = cv2.VideoCapture(camera_id)
+        else:
+            capture = cv2.VideoCapture(video_file)
+            video_out_name = os.path.split(video_file)[-1]
+        # Get Video info : resolution, fps, frame count
+        width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+        height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        fps = int(capture.get(cv2.CAP_PROP_FPS))
+        frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
+        print("fps: %d, frame_count: %d" % (fps, frame_count))
+
+        if not os.path.exists(self.output_dir):
+            os.makedirs(self.output_dir)
+        out_path = os.path.join(self.output_dir, video_out_name)
+        video_format = 'mp4v'
+        fourcc = cv2.VideoWriter_fourcc(*video_format)
+        writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
+
+        frame_id = 1
+        timer = MOTTimer()
+        results = defaultdict(list)  # centertrack onpy support single class
+        num_classes = self.num_classes
+        data_type = 'mcmot' if num_classes > 1 else 'mot'
+        ids2names = self.pred_config.labels
+        while (1):
+            ret, frame = capture.read()
+            if not ret:
+                break
+            if frame_id % 10 == 0:
+                print('Tracking frame: %d' % (frame_id))
+            frame_id += 1
+
+            timer.tic()
+            seq_name = video_out_name.split('.')[0]
+            mot_results = self.predict_image(
+                [frame[:, :, ::-1]], visual=False, seq_name=seq_name)
+            timer.toc()
+
+            fps = 1. / timer.duration
+            online_tlwhs, online_scores, online_ids = mot_results[0]
+            results[0].append(
+                (frame_id + 1, online_tlwhs, online_scores, online_ids))
+            im = plot_tracking(
+                frame,
+                online_tlwhs,
+                online_ids,
+                online_scores,
+                frame_id=frame_id,
+                fps=fps,
+                ids2names=ids2names)
+
+            writer.write(im)
+            if camera_id != -1:
+                cv2.imshow('Mask Detection', im)
+                if cv2.waitKey(1) & 0xFF == ord('q'):
+                    break
+
+        if self.save_mot_txts:
+            result_filename = os.path.join(
+                self.output_dir, video_out_name.split('.')[-2] + '.txt')
+
+            write_mot_results(result_filename, results, data_type, num_classes)
+
+        writer.release()
+
+
+def main():
+    detector = CenterTrack(
+        FLAGS.model_dir,
+        tracker_config=None,
+        device=FLAGS.device,
+        run_mode=FLAGS.run_mode,
+        batch_size=1,
+        trt_min_shape=FLAGS.trt_min_shape,
+        trt_max_shape=FLAGS.trt_max_shape,
+        trt_opt_shape=FLAGS.trt_opt_shape,
+        trt_calib_mode=FLAGS.trt_calib_mode,
+        cpu_threads=FLAGS.cpu_threads,
+        enable_mkldnn=FLAGS.enable_mkldnn,
+        output_dir=FLAGS.output_dir,
+        threshold=FLAGS.threshold,
+        save_images=FLAGS.save_images,
+        save_mot_txts=FLAGS.save_mot_txts)
+
+    # predict from video file or camera video stream
+    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
+        detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
+    else:
+        # predict from image
+        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
+        detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10)
+
+        if not FLAGS.run_benchmark:
+            detector.det_times.info(average=True)
+        else:
+            mode = FLAGS.run_mode
+            model_dir = FLAGS.model_dir
+            model_info = {
+                'model_name': model_dir.strip('/').split('/')[-1],
+                'precision': mode.split('_')[-1]
+            }
+            bench_log(detector, img_list, model_info, name='MOT')
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    print_arguments(FLAGS)
+    FLAGS.device = FLAGS.device.upper()
+    assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
+                            ], "device should be CPU, GPU, NPU or XPU"
+
+    main()
--- a/third-party/paddle-inference/mot_jde_infer.py
+++ b/third-party/paddle-inference/mot_jde_infer.py
@@ -0,0 +1,381 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import yaml
+import cv2
+import numpy as np
+from collections import defaultdict
+import paddle
+
+from benchmark_utils import PaddleInferBenchmark
+from preprocess import decode_image
+from utils import argsparser, Timer, get_current_memory_mb
+from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig
+
+# add python path
+import sys
+parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
+sys.path.insert(0, parent_path)
+
+from pptracking.python.mot import JDETracker
+from pptracking.python.mot.utils import MOTTimer, write_mot_results
+from pptracking.python.mot.visualize import plot_tracking_dict
+
+# Global dictionary
+MOT_JDE_SUPPORT_MODELS = {
+    'JDE',
+    'FairMOT',
+}
+
+
+class JDE_Detector(Detector):
+    """
+    Args:
+        model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
+        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
+        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
+        batch_size (int): size of pre batch in inference
+        trt_min_shape (int): min shape for dynamic shape in trt
+        trt_max_shape (int): max shape for dynamic shape in trt
+        trt_opt_shape (int): opt shape for dynamic shape in trt
+        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
+            calibration, trt_calib_mode need to set True
+        cpu_threads (int): cpu threads
+        enable_mkldnn (bool): whether to open MKLDNN
+        output_dir (string): The path of output, default as 'output'
+        threshold (float): Score threshold of the detected bbox, default as 0.5
+        save_images (bool): Whether to save visualization image results, default as False
+        save_mot_txts (bool): Whether to save tracking results (txt), default as False
+    """
+
+    def __init__(
+            self,
+            model_dir,
+            tracker_config=None,
+            device='CPU',
+            run_mode='paddle',
+            batch_size=1,
+            trt_min_shape=1,
+            trt_max_shape=1088,
+            trt_opt_shape=608,
+            trt_calib_mode=False,
+            cpu_threads=1,
+            enable_mkldnn=False,
+            output_dir='output',
+            threshold=0.5,
+            save_images=False,
+            save_mot_txts=False, ):
+        super(JDE_Detector, self).__init__(
+            model_dir=model_dir,
+            device=device,
+            run_mode=run_mode,
+            batch_size=batch_size,
+            trt_min_shape=trt_min_shape,
+            trt_max_shape=trt_max_shape,
+            trt_opt_shape=trt_opt_shape,
+            trt_calib_mode=trt_calib_mode,
+            cpu_threads=cpu_threads,
+            enable_mkldnn=enable_mkldnn,
+            output_dir=output_dir,
+            threshold=threshold, )
+        self.save_images = save_images
+        self.save_mot_txts = save_mot_txts
+        assert batch_size == 1, "MOT model only supports batch_size=1."
+        self.det_times = Timer(with_tracker=True)
+        self.num_classes = len(self.pred_config.labels)
+
+        # tracker config
+        assert self.pred_config.tracker, "The exported JDE Detector model should have tracker."
+        cfg = self.pred_config.tracker
+        min_box_area = cfg.get('min_box_area', 0.0)
+        vertical_ratio = cfg.get('vertical_ratio', 0.0)
+        conf_thres = cfg.get('conf_thres', 0.0)
+        tracked_thresh = cfg.get('tracked_thresh', 0.7)
+        metric_type = cfg.get('metric_type', 'euclidean')
+
+        self.tracker = JDETracker(
+            num_classes=self.num_classes,
+            min_box_area=min_box_area,
+            vertical_ratio=vertical_ratio,
+            conf_thres=conf_thres,
+            tracked_thresh=tracked_thresh,
+            metric_type=metric_type)
+
+    def postprocess(self, inputs, result):
+        # postprocess output of predictor
+        np_boxes = result['pred_dets']
+        if np_boxes.shape[0] <= 0:
+            print('[WARNNING] No object detected.')
+            result = {'pred_dets': np.zeros([0, 6]), 'pred_embs': None}
+        result = {k: v for k, v in result.items() if v is not None}
+        return result
+
+    def tracking(self, det_results):
+        pred_dets = det_results['pred_dets']  # cls_id, score, x0, y0, x1, y1
+        pred_embs = det_results['pred_embs']
+        online_targets_dict = self.tracker.update(pred_dets, pred_embs)
+
+        online_tlwhs = defaultdict(list)
+        online_scores = defaultdict(list)
+        online_ids = defaultdict(list)
+        for cls_id in range(self.num_classes):
+            online_targets = online_targets_dict[cls_id]
+            for t in online_targets:
+                tlwh = t.tlwh
+                tid = t.track_id
+                tscore = t.score
+                if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: continue
+                if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
+                        3] > self.tracker.vertical_ratio:
+                    continue
+                online_tlwhs[cls_id].append(tlwh)
+                online_ids[cls_id].append(tid)
+                online_scores[cls_id].append(tscore)
+        return online_tlwhs, online_scores, online_ids
+
+    def predict(self, repeats=1):
+        '''
+        Args:
+            repeats (int): repeats number for prediction
+        Returns:
+            result (dict): include 'pred_dets': np.ndarray: shape:[N,6], N: number of box,
+                            matix element:[class, score, x_min, y_min, x_max, y_max]
+                            FairMOT(JDE)'s result include 'pred_embs': np.ndarray:
+                            shape: [N, 128]
+        '''
+        # model prediction
+        np_pred_dets, np_pred_embs = None, None
+        for i in range(repeats):
+            self.predictor.run()
+            output_names = self.predictor.get_output_names()
+            boxes_tensor = self.predictor.get_output_handle(output_names[0])
+            np_pred_dets = boxes_tensor.copy_to_cpu()
+            embs_tensor = self.predictor.get_output_handle(output_names[1])
+            np_pred_embs = embs_tensor.copy_to_cpu()
+
+        result = dict(pred_dets=np_pred_dets, pred_embs=np_pred_embs)
+        return result
+
+    def predict_image(self,
+                      image_list,
+                      run_benchmark=False,
+                      repeats=1,
+                      visual=True,
+                      seq_name=None):
+        mot_results = []
+        num_classes = self.num_classes
+        image_list.sort()
+        ids2names = self.pred_config.labels
+        data_type = 'mcmot' if num_classes > 1 else 'mot'
+        for frame_id, img_file in enumerate(image_list):
+            batch_image_list = [img_file]  # bs=1 in MOT model
+            if run_benchmark:
+                # preprocess
+                inputs = self.preprocess(batch_image_list)  # warmup
+                self.det_times.preprocess_time_s.start()
+                inputs = self.preprocess(batch_image_list)
+                self.det_times.preprocess_time_s.end()
+
+                # model prediction
+                result_warmup = self.predict(repeats=repeats)  # warmup
+                self.det_times.inference_time_s.start()
+                result = self.predict(repeats=repeats)
+                self.det_times.inference_time_s.end(repeats=repeats)
+
+                # postprocess
+                result_warmup = self.postprocess(inputs, result)  # warmup
+                self.det_times.postprocess_time_s.start()
+                det_result = self.postprocess(inputs, result)
+                self.det_times.postprocess_time_s.end()
+
+                # tracking
+                result_warmup = self.tracking(det_result)
+                self.det_times.tracking_time_s.start()
+                online_tlwhs, online_scores, online_ids = self.tracking(
+                    det_result)
+                self.det_times.tracking_time_s.end()
+                self.det_times.img_num += 1
+
+                cm, gm, gu = get_current_memory_mb()
+                self.cpu_mem += cm
+                self.gpu_mem += gm
+                self.gpu_util += gu
+
+            else:
+                self.det_times.preprocess_time_s.start()
+                inputs = self.preprocess(batch_image_list)
+                self.det_times.preprocess_time_s.end()
+
+                self.det_times.inference_time_s.start()
+                result = self.predict()
+                self.det_times.inference_time_s.end()
+
+                self.det_times.postprocess_time_s.start()
+                det_result = self.postprocess(inputs, result)
+                self.det_times.postprocess_time_s.end()
+
+                # tracking process
+                self.det_times.tracking_time_s.start()
+                online_tlwhs, online_scores, online_ids = self.tracking(
+                    det_result)
+                self.det_times.tracking_time_s.end()
+                self.det_times.img_num += 1
+
+            if visual:
+                if len(image_list) > 1 and frame_id % 10 == 0:
+                    print('Tracking frame {}'.format(frame_id))
+                frame, _ = decode_image(img_file, {})
+
+                im = plot_tracking_dict(
+                    frame,
+                    num_classes,
+                    online_tlwhs,
+                    online_ids,
+                    online_scores,
+                    frame_id=frame_id,
+                    ids2names=ids2names)
+                if seq_name is None:
+                    seq_name = image_list[0].split('/')[-2]
+                save_dir = os.path.join(self.output_dir, seq_name)
+                if not os.path.exists(save_dir):
+                    os.makedirs(save_dir)
+                cv2.imwrite(
+                    os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im)
+
+            mot_results.append([online_tlwhs, online_scores, online_ids])
+        return mot_results
+
+    def predict_video(self, video_file, camera_id):
+        video_out_name = 'mot_output.mp4'
+        if camera_id != -1:
+            capture = cv2.VideoCapture(camera_id)
+        else:
+            capture = cv2.VideoCapture(video_file)
+            video_out_name = os.path.split(video_file)[-1]
+        # Get Video info : resolution, fps, frame count
+        width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+        height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        fps = int(capture.get(cv2.CAP_PROP_FPS))
+        frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
+        print("fps: %d, frame_count: %d" % (fps, frame_count))
+
+        if not os.path.exists(self.output_dir):
+            os.makedirs(self.output_dir)
+        out_path = os.path.join(self.output_dir, video_out_name)
+        video_format = 'mp4v'
+        fourcc = cv2.VideoWriter_fourcc(*video_format)
+        writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
+
+        frame_id = 1
+        timer = MOTTimer()
+        results = defaultdict(list)  # support single class and multi classes
+        num_classes = self.num_classes
+        data_type = 'mcmot' if num_classes > 1 else 'mot'
+        ids2names = self.pred_config.labels
+        while (1):
+            ret, frame = capture.read()
+            if not ret:
+                break
+            if frame_id % 10 == 0:
+                print('Tracking frame: %d' % (frame_id))
+            frame_id += 1
+
+            timer.tic()
+            seq_name = video_out_name.split('.')[0]
+            mot_results = self.predict_image(
+                [frame[:, :, ::-1]], visual=False, seq_name=seq_name)
+            timer.toc()
+
+            online_tlwhs, online_scores, online_ids = mot_results[0]
+            for cls_id in range(num_classes):
+                results[cls_id].append(
+                    (frame_id + 1, online_tlwhs[cls_id], online_scores[cls_id],
+                     online_ids[cls_id]))
+
+            fps = 1. / timer.duration
+            im = plot_tracking_dict(
+                frame,
+                num_classes,
+                online_tlwhs,
+                online_ids,
+                online_scores,
+                frame_id=frame_id,
+                fps=fps,
+                ids2names=ids2names)
+
+            writer.write(im)
+            if camera_id != -1:
+                cv2.imshow('Mask Detection', im)
+                if cv2.waitKey(1) & 0xFF == ord('q'):
+                    break
+
+        if self.save_mot_txts:
+            result_filename = os.path.join(
+                self.output_dir, video_out_name.split('.')[-2] + '.txt')
+
+            write_mot_results(result_filename, results, data_type, num_classes)
+
+        writer.release()
+
+
+def main():
+    detector = JDE_Detector(
+        FLAGS.model_dir,
+        tracker_config=None,
+        device=FLAGS.device,
+        run_mode=FLAGS.run_mode,
+        batch_size=1,
+        trt_min_shape=FLAGS.trt_min_shape,
+        trt_max_shape=FLAGS.trt_max_shape,
+        trt_opt_shape=FLAGS.trt_opt_shape,
+        trt_calib_mode=FLAGS.trt_calib_mode,
+        cpu_threads=FLAGS.cpu_threads,
+        enable_mkldnn=FLAGS.enable_mkldnn,
+        output_dir=FLAGS.output_dir,
+        threshold=FLAGS.threshold,
+        save_images=FLAGS.save_images,
+        save_mot_txts=FLAGS.save_mot_txts)
+
+    # predict from video file or camera video stream
+    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
+        detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
+    else:
+        # predict from image
+        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
+        detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10)
+
+        if not FLAGS.run_benchmark:
+            detector.det_times.info(average=True)
+        else:
+            mode = FLAGS.run_mode
+            model_dir = FLAGS.model_dir
+            model_info = {
+                'model_name': model_dir.strip('/').split('/')[-1],
+                'precision': mode.split('_')[-1]
+            }
+            bench_log(detector, img_list, model_info, name='MOT')
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    print_arguments(FLAGS)
+    FLAGS.device = FLAGS.device.upper()
+    assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
+                            ], "device should be CPU, GPU, NPU or XPU"
+
+    main()
--- a/third-party/paddle-inference/mot_keypoint_unite_infer.py
+++ b/third-party/paddle-inference/mot_keypoint_unite_infer.py
@@ -0,0 +1,301 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import json
+import cv2
+import math
+import numpy as np
+import paddle
+import yaml
+import copy
+from collections import defaultdict
+
+from mot_keypoint_unite_utils import argsparser
+from preprocess import decode_image
+from infer import print_arguments, get_test_images, bench_log
+from mot_sde_infer import SDE_Detector
+from mot_jde_infer import JDE_Detector, MOT_JDE_SUPPORT_MODELS
+from keypoint_infer import KeyPointDetector, KEYPOINT_SUPPORT_MODELS
+from det_keypoint_unite_infer import predict_with_given_det
+from visualize import visualize_pose
+from benchmark_utils import PaddleInferBenchmark
+from utils import get_current_memory_mb
+from keypoint_postprocess import translate_to_ori_images
+
+# add python path
+import sys
+parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
+sys.path.insert(0, parent_path)
+
+from pptracking.python.mot.visualize import plot_tracking, plot_tracking_dict
+from pptracking.python.mot.utils import MOTTimer as FPSTimer
+
+
+def convert_mot_to_det(tlwhs, scores):
+    results = {}
+    num_mot = len(tlwhs)
+    xyxys = copy.deepcopy(tlwhs)
+    for xyxy in xyxys.copy():
+        xyxy[2:] = xyxy[2:] + xyxy[:2]
+    # support single class now
+    results['boxes'] = np.vstack(
+        [np.hstack([0, scores[i], xyxys[i]]) for i in range(num_mot)])
+    results['boxes_num'] = np.array([num_mot])
+    return results
+
+
+def mot_topdown_unite_predict(mot_detector,
+                              topdown_keypoint_detector,
+                              image_list,
+                              keypoint_batch_size=1,
+                              save_res=False):
+    det_timer = mot_detector.get_timer()
+    store_res = []
+    image_list.sort()
+    num_classes = mot_detector.num_classes
+    for i, img_file in enumerate(image_list):
+        # Decode image in advance in mot + pose prediction
+        det_timer.preprocess_time_s.start()
+        image, _ = decode_image(img_file, {})
+        det_timer.preprocess_time_s.end()
+
+        if FLAGS.run_benchmark:
+            mot_results = mot_detector.predict_image(
+                [image], run_benchmark=True, repeats=10)
+
+            cm, gm, gu = get_current_memory_mb()
+            mot_detector.cpu_mem += cm
+            mot_detector.gpu_mem += gm
+            mot_detector.gpu_util += gu
+        else:
+            mot_results = mot_detector.predict_image([image], visual=False)
+
+        online_tlwhs, online_scores, online_ids = mot_results[
+            0]  # only support bs=1 in MOT model
+        results = convert_mot_to_det(
+            online_tlwhs[0],
+            online_scores[0])  # only support single class for mot + pose
+        if results['boxes_num'] == 0:
+            continue
+
+        keypoint_res = predict_with_given_det(
+            image, results, topdown_keypoint_detector, keypoint_batch_size,
+            FLAGS.run_benchmark)
+
+        if save_res:
+            save_name = img_file if isinstance(img_file, str) else i
+            store_res.append([
+                save_name, keypoint_res['bbox'],
+                [keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]]
+            ])
+        if FLAGS.run_benchmark:
+            cm, gm, gu = get_current_memory_mb()
+            topdown_keypoint_detector.cpu_mem += cm
+            topdown_keypoint_detector.gpu_mem += gm
+            topdown_keypoint_detector.gpu_util += gu
+        else:
+            if not os.path.exists(FLAGS.output_dir):
+                os.makedirs(FLAGS.output_dir)
+            visualize_pose(
+                img_file,
+                keypoint_res,
+                visual_thresh=FLAGS.keypoint_threshold,
+                save_dir=FLAGS.output_dir)
+
+    if save_res:
+        """
+        1) store_res: a list of image_data
+        2) image_data: [imageid, rects, [keypoints, scores]]
+        3) rects: list of rect [xmin, ymin, xmax, ymax]
+        4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list
+        5) scores: mean of all joint conf
+        """
+        with open("det_keypoint_unite_image_results.json", 'w') as wf:
+            json.dump(store_res, wf, indent=4)
+
+
+def mot_topdown_unite_predict_video(mot_detector,
+                                    topdown_keypoint_detector,
+                                    camera_id,
+                                    keypoint_batch_size=1,
+                                    save_res=False):
+    video_name = 'output.mp4'
+    if camera_id != -1:
+        capture = cv2.VideoCapture(camera_id)
+    else:
+        capture = cv2.VideoCapture(FLAGS.video_file)
+        video_name = os.path.split(FLAGS.video_file)[-1]
+    # Get Video info : resolution, fps, frame count
+    width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+    fps = int(capture.get(cv2.CAP_PROP_FPS))
+    frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
+    print("fps: %d, frame_count: %d" % (fps, frame_count))
+
+    if not os.path.exists(FLAGS.output_dir):
+        os.makedirs(FLAGS.output_dir)
+    out_path = os.path.join(FLAGS.output_dir, video_name)
+    fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
+    writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
+    frame_id = 0
+    timer_mot, timer_kp, timer_mot_kp = FPSTimer(), FPSTimer(), FPSTimer()
+
+    num_classes = mot_detector.num_classes
+    assert num_classes == 1, 'Only one category mot model supported for uniting keypoint deploy.'
+    data_type = 'mot'
+
+    while (1):
+        ret, frame = capture.read()
+        if not ret:
+            break
+        if frame_id % 10 == 0:
+            print('Tracking frame: %d' % (frame_id))
+        frame_id += 1
+        timer_mot_kp.tic()
+
+        # mot model
+        timer_mot.tic()
+
+        frame2 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+
+        mot_results = mot_detector.predict_image([frame2], visual=False)
+        timer_mot.toc()
+        online_tlwhs, online_scores, online_ids = mot_results[0]
+        results = convert_mot_to_det(
+            online_tlwhs[0],
+            online_scores[0])  # only support single class for mot + pose
+        if results['boxes_num'] == 0:
+            continue
+
+        # keypoint model
+        timer_kp.tic()
+        keypoint_res = predict_with_given_det(
+            frame2, results, topdown_keypoint_detector, keypoint_batch_size,
+            FLAGS.run_benchmark)
+        timer_kp.toc()
+        timer_mot_kp.toc()
+
+        kp_fps = 1. / timer_kp.duration
+        mot_kp_fps = 1. / timer_mot_kp.duration
+
+        im = visualize_pose(
+            frame,
+            keypoint_res,
+            visual_thresh=FLAGS.keypoint_threshold,
+            returnimg=True,
+            ids=online_ids[0])
+
+        im = plot_tracking_dict(
+            im,
+            num_classes,
+            online_tlwhs,
+            online_ids,
+            online_scores,
+            frame_id=frame_id,
+            fps=mot_kp_fps)
+
+        writer.write(im)
+        if camera_id != -1:
+            cv2.imshow('Tracking and keypoint results', im)
+            if cv2.waitKey(1) & 0xFF == ord('q'):
+                break
+
+    writer.release()
+    print('output_video saved to: {}'.format(out_path))
+
+
+def main():
+    deploy_file = os.path.join(FLAGS.mot_model_dir, 'infer_cfg.yml')
+    with open(deploy_file) as f:
+        yml_conf = yaml.safe_load(f)
+    arch = yml_conf['arch']
+    mot_detector_func = 'SDE_Detector'
+    if arch in MOT_JDE_SUPPORT_MODELS:
+        mot_detector_func = 'JDE_Detector'
+
+    mot_detector = eval(mot_detector_func)(FLAGS.mot_model_dir,
+                                           FLAGS.tracker_config,
+                                           device=FLAGS.device,
+                                           run_mode=FLAGS.run_mode,
+                                           batch_size=1,
+                                           trt_min_shape=FLAGS.trt_min_shape,
+                                           trt_max_shape=FLAGS.trt_max_shape,
+                                           trt_opt_shape=FLAGS.trt_opt_shape,
+                                           trt_calib_mode=FLAGS.trt_calib_mode,
+                                           cpu_threads=FLAGS.cpu_threads,
+                                           enable_mkldnn=FLAGS.enable_mkldnn,
+                                           threshold=FLAGS.mot_threshold,
+                                           output_dir=FLAGS.output_dir)
+
+    topdown_keypoint_detector = KeyPointDetector(
+        FLAGS.keypoint_model_dir,
+        device=FLAGS.device,
+        run_mode=FLAGS.run_mode,
+        batch_size=FLAGS.keypoint_batch_size,
+        trt_min_shape=FLAGS.trt_min_shape,
+        trt_max_shape=FLAGS.trt_max_shape,
+        trt_opt_shape=FLAGS.trt_opt_shape,
+        trt_calib_mode=FLAGS.trt_calib_mode,
+        cpu_threads=FLAGS.cpu_threads,
+        enable_mkldnn=FLAGS.enable_mkldnn,
+        threshold=FLAGS.keypoint_threshold,
+        output_dir=FLAGS.output_dir,
+        use_dark=FLAGS.use_dark)
+    keypoint_arch = topdown_keypoint_detector.pred_config.arch
+    assert KEYPOINT_SUPPORT_MODELS[
+        keypoint_arch] == 'keypoint_topdown', 'MOT-Keypoint unite inference only supports topdown models.'
+
+    # predict from video file or camera video stream
+    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
+        mot_topdown_unite_predict_video(
+            mot_detector, topdown_keypoint_detector, FLAGS.camera_id,
+            FLAGS.keypoint_batch_size, FLAGS.save_res)
+    else:
+        # predict from image
+        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
+        mot_topdown_unite_predict(mot_detector, topdown_keypoint_detector,
+                                  img_list, FLAGS.keypoint_batch_size,
+                                  FLAGS.save_res)
+        if not FLAGS.run_benchmark:
+            mot_detector.det_times.info(average=True)
+            topdown_keypoint_detector.det_times.info(average=True)
+        else:
+            mode = FLAGS.run_mode
+            mot_model_dir = FLAGS.mot_model_dir
+            mot_model_info = {
+                'model_name': mot_model_dir.strip('/').split('/')[-1],
+                'precision': mode.split('_')[-1]
+            }
+            bench_log(mot_detector, img_list, mot_model_info, name='MOT')
+
+            keypoint_model_dir = FLAGS.keypoint_model_dir
+            keypoint_model_info = {
+                'model_name': keypoint_model_dir.strip('/').split('/')[-1],
+                'precision': mode.split('_')[-1]
+            }
+            bench_log(topdown_keypoint_detector, img_list, keypoint_model_info,
+                      FLAGS.keypoint_batch_size, 'KeyPoint')
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    print_arguments(FLAGS)
+    FLAGS.device = FLAGS.device.upper()
+    assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
+                            ], "device should be CPU, GPU, NPU or XPU"
+
+    main()
--- a/third-party/paddle-inference/mot_keypoint_unite_utils.py
+++ b/third-party/paddle-inference/mot_keypoint_unite_utils.py
@@ -0,0 +1,139 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import ast
+import argparse
+
+
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--mot_model_dir",
+        type=str,
+        default=None,
+        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
+              "'infer_cfg.yml', created by tools/export_model.py."),
+        required=True)
+    parser.add_argument(
+        "--keypoint_model_dir",
+        type=str,
+        default=None,
+        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
+              "'infer_cfg.yml', created by tools/export_model.py."),
+        required=True)
+    parser.add_argument(
+        "--image_file", type=str, default=None, help="Path of image file.")
+    parser.add_argument(
+        "--image_dir",
+        type=str,
+        default=None,
+        help="Dir of image file, `image_file` has a higher priority.")
+    parser.add_argument(
+        "--keypoint_batch_size",
+        type=int,
+        default=1,
+        help=("batch_size for keypoint inference. In detection-keypoint unit"
+              "inference, the batch size in detection is 1. Then collate det "
+              "result in batch for keypoint inference."))
+    parser.add_argument(
+        "--video_file",
+        type=str,
+        default=None,
+        help="Path of video file, `video_file` or `camera_id` has a highest priority."
+    )
+    parser.add_argument(
+        "--camera_id",
+        type=int,
+        default=-1,
+        help="device id of camera to predict.")
+    parser.add_argument(
+        "--mot_threshold", type=float, default=0.5, help="Threshold of score.")
+    parser.add_argument(
+        "--keypoint_threshold",
+        type=float,
+        default=0.5,
+        help="Threshold of score.")
+    parser.add_argument(
+        "--output_dir",
+        type=str,
+        default="output",
+        help="Directory of output visualization files.")
+    parser.add_argument(
+        "--run_mode",
+        type=str,
+        default='paddle',
+        help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        help="Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU."
+    )
+    parser.add_argument(
+        "--run_benchmark",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether to predict a image_file repeatedly for benchmark")
+    parser.add_argument(
+        "--enable_mkldnn",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether use mkldnn with CPU.")
+    parser.add_argument(
+        "--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
+    parser.add_argument(
+        "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_max_shape",
+        type=int,
+        default=1088,
+        help="max_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_opt_shape",
+        type=int,
+        default=608,
+        help="opt_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_calib_mode",
+        type=bool,
+        default=False,
+        help="If the model is produced by TRT offline quantitative "
+        "calibration, trt_calib_mode need to set True.")
+    parser.add_argument(
+        '--save_images',
+        action='store_true',
+        help='Save visualization image results.')
+    parser.add_argument(
+        '--save_mot_txts',
+        action='store_true',
+        help='Save tracking results (txt).')
+    parser.add_argument(
+        '--use_dark',
+        type=bool,
+        default=True,
+        help='whether to use darkpose to get better keypoint position predict ')
+    parser.add_argument(
+        '--save_res',
+        type=bool,
+        default=False,
+        help=(
+            "whether to save predict results to json file"
+            "1) store_res: a list of image_data"
+            "2) image_data: [imageid, rects, [keypoints, scores]]"
+            "3) rects: list of rect [xmin, ymin, xmax, ymax]"
+            "4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list"
+            "5) scores: mean of all joint conf"))
+    parser.add_argument(
+        "--tracker_config", type=str, default=None, help=("tracker donfig"))
+    return parser
--- a/third-party/paddle-inference/mot_sde_infer.py
+++ b/third-party/paddle-inference/mot_sde_infer.py
@@ -0,0 +1,522 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import yaml
+import cv2
+import numpy as np
+from collections import defaultdict
+import paddle
+
+from benchmark_utils import PaddleInferBenchmark
+from preprocess import decode_image
+from utils import argsparser, Timer, get_current_memory_mb
+from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig, load_predictor
+
+# add python path
+import sys
+parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
+sys.path.insert(0, parent_path)
+
+from pptracking.python.mot import JDETracker, DeepSORTTracker
+from pptracking.python.mot.utils import MOTTimer, write_mot_results, get_crops, clip_box
+from pptracking.python.mot.visualize import plot_tracking, plot_tracking_dict
+
+
+class SDE_Detector(Detector):
+    """
+    Args:
+        model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
+        tracker_config (str): tracker config path
+        device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
+        run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
+        batch_size (int): size of pre batch in inference
+        trt_min_shape (int): min shape for dynamic shape in trt
+        trt_max_shape (int): max shape for dynamic shape in trt
+        trt_opt_shape (int): opt shape for dynamic shape in trt
+        trt_calib_mode (bool): If the model is produced by TRT offline quantitative
+            calibration, trt_calib_mode need to set True
+        cpu_threads (int): cpu threads
+        enable_mkldnn (bool): whether to open MKLDNN
+        output_dir (string): The path of output, default as 'output'
+        threshold (float): Score threshold of the detected bbox, default as 0.5
+        save_images (bool): Whether to save visualization image results, default as False
+        save_mot_txts (bool): Whether to save tracking results (txt), default as False
+        reid_model_dir (str): reid model dir, default None for ByteTrack, but set for DeepSORT
+    """
+
+    def __init__(self,
+                 model_dir,
+                 tracker_config,
+                 device='CPU',
+                 run_mode='paddle',
+                 batch_size=1,
+                 trt_min_shape=1,
+                 trt_max_shape=1280,
+                 trt_opt_shape=640,
+                 trt_calib_mode=False,
+                 cpu_threads=1,
+                 enable_mkldnn=False,
+                 output_dir='output',
+                 threshold=0.5,
+                 save_images=False,
+                 save_mot_txts=False,
+                 reid_model_dir=None):
+        super(SDE_Detector, self).__init__(
+            model_dir=model_dir,
+            device=device,
+            run_mode=run_mode,
+            batch_size=batch_size,
+            trt_min_shape=trt_min_shape,
+            trt_max_shape=trt_max_shape,
+            trt_opt_shape=trt_opt_shape,
+            trt_calib_mode=trt_calib_mode,
+            cpu_threads=cpu_threads,
+            enable_mkldnn=enable_mkldnn,
+            output_dir=output_dir,
+            threshold=threshold, )
+        self.save_images = save_images
+        self.save_mot_txts = save_mot_txts
+        assert batch_size == 1, "MOT model only supports batch_size=1."
+        self.det_times = Timer(with_tracker=True)
+        self.num_classes = len(self.pred_config.labels)
+
+        # reid config
+        self.use_reid = False if reid_model_dir is None else True
+        if self.use_reid:
+            self.reid_pred_config = self.set_config(reid_model_dir)
+            self.reid_predictor, self.config = load_predictor(
+                reid_model_dir,
+                run_mode=run_mode,
+                batch_size=50,  # reid_batch_size
+                min_subgraph_size=self.reid_pred_config.min_subgraph_size,
+                device=device,
+                use_dynamic_shape=self.reid_pred_config.use_dynamic_shape,
+                trt_min_shape=trt_min_shape,
+                trt_max_shape=trt_max_shape,
+                trt_opt_shape=trt_opt_shape,
+                trt_calib_mode=trt_calib_mode,
+                cpu_threads=cpu_threads,
+                enable_mkldnn=enable_mkldnn)
+        else:
+            self.reid_pred_config = None
+            self.reid_predictor = None
+
+        assert tracker_config is not None, 'Note that tracker_config should be set.'
+        self.tracker_config = tracker_config
+        tracker_cfg = yaml.safe_load(open(self.tracker_config))
+        cfg = tracker_cfg[tracker_cfg['type']]
+
+        # tracker config
+        self.use_deepsort_tracker = True if tracker_cfg[
+            'type'] == 'DeepSORTTracker' else False
+        if self.use_deepsort_tracker:
+            # use DeepSORTTracker
+            if self.reid_pred_config is not None and hasattr(
+                    self.reid_pred_config, 'tracker'):
+                cfg = self.reid_pred_config.tracker
+            budget = cfg.get('budget', 100)
+            max_age = cfg.get('max_age', 30)
+            max_iou_distance = cfg.get('max_iou_distance', 0.7)
+            matching_threshold = cfg.get('matching_threshold', 0.2)
+            min_box_area = cfg.get('min_box_area', 0)
+            vertical_ratio = cfg.get('vertical_ratio', 0)
+
+            self.tracker = DeepSORTTracker(
+                budget=budget,
+                max_age=max_age,
+                max_iou_distance=max_iou_distance,
+                matching_threshold=matching_threshold,
+                min_box_area=min_box_area,
+                vertical_ratio=vertical_ratio, )
+        else:
+            # use ByteTracker
+            use_byte = cfg.get('use_byte', False)
+            det_thresh = cfg.get('det_thresh', 0.3)
+            min_box_area = cfg.get('min_box_area', 0)
+            vertical_ratio = cfg.get('vertical_ratio', 0)
+            match_thres = cfg.get('match_thres', 0.9)
+            conf_thres = cfg.get('conf_thres', 0.6)
+            low_conf_thres = cfg.get('low_conf_thres', 0.1)
+
+            self.tracker = JDETracker(
+                use_byte=use_byte,
+                det_thresh=det_thresh,
+                num_classes=self.num_classes,
+                min_box_area=min_box_area,
+                vertical_ratio=vertical_ratio,
+                match_thres=match_thres,
+                conf_thres=conf_thres,
+                low_conf_thres=low_conf_thres, )
+
+    def postprocess(self, inputs, result):
+        # postprocess output of predictor
+        np_boxes_num = result['boxes_num']
+        if np_boxes_num[0] <= 0:
+            print('[WARNNING] No object detected.')
+            result = {'boxes': np.zeros([0, 6]), 'boxes_num': [0]}
+        result = {k: v for k, v in result.items() if v is not None}
+        return result
+
+    def reidprocess(self, det_results, repeats=1):
+        pred_dets = det_results['boxes']
+        pred_xyxys = pred_dets[:, 2:6]
+
+        ori_image = det_results['ori_image']
+        ori_image_shape = ori_image.shape[:2]
+        pred_xyxys, keep_idx = clip_box(pred_xyxys, ori_image_shape)
+
+        if len(keep_idx[0]) == 0:
+            det_results['boxes'] = np.zeros((1, 6), dtype=np.float32)
+            det_results['embeddings'] = None
+            return det_results
+
+        pred_dets = pred_dets[keep_idx[0]]
+        pred_xyxys = pred_dets[:, 2:6]
+
+        w, h = self.tracker.input_size
+        crops = get_crops(pred_xyxys, ori_image, w, h)
+
+        # to keep fast speed, only use topk crops
+        crops = crops[:50]  # reid_batch_size
+        det_results['crops'] = np.array(crops).astype('float32')
+        det_results['boxes'] = pred_dets[:50]
+
+        input_names = self.reid_predictor.get_input_names()
+        for i in range(len(input_names)):
+            input_tensor = self.reid_predictor.get_input_handle(input_names[i])
+            input_tensor.copy_from_cpu(det_results[input_names[i]])
+
+        # model prediction
+        for i in range(repeats):
+            self.reid_predictor.run()
+            output_names = self.reid_predictor.get_output_names()
+            feature_tensor = self.reid_predictor.get_output_handle(output_names[
+                0])
+            pred_embs = feature_tensor.copy_to_cpu()
+
+        det_results['embeddings'] = pred_embs
+        return det_results
+
+    def tracking(self, det_results):
+        pred_dets = det_results['boxes']  # 'cls_id, score, x0, y0, x1, y1'
+        pred_embs = det_results.get('embeddings', None)
+
+        if self.use_deepsort_tracker:
+            # use DeepSORTTracker, only support singe class
+            self.tracker.predict()
+            online_targets = self.tracker.update(pred_dets, pred_embs)
+            online_tlwhs, online_scores, online_ids = [], [], []
+            for t in online_targets:
+                if not t.is_confirmed() or t.time_since_update > 1:
+                    continue
+                tlwh = t.to_tlwh()
+                tscore = t.score
+                tid = t.track_id
+                if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
+                        3] > self.tracker.vertical_ratio:
+                    continue
+                online_tlwhs.append(tlwh)
+                online_scores.append(tscore)
+                online_ids.append(tid)
+
+            tracking_outs = {
+                'online_tlwhs': online_tlwhs,
+                'online_scores': online_scores,
+                'online_ids': online_ids,
+            }
+            return tracking_outs
+        else:
+            # use ByteTracker, support multiple class
+            online_tlwhs = defaultdict(list)
+            online_scores = defaultdict(list)
+            online_ids = defaultdict(list)
+            online_targets_dict = self.tracker.update(pred_dets, pred_embs)
+            for cls_id in range(self.num_classes):
+                online_targets = online_targets_dict[cls_id]
+                for t in online_targets:
+                    tlwh = t.tlwh
+                    tid = t.track_id
+                    tscore = t.score
+                    if tlwh[2] * tlwh[3] <= self.tracker.min_box_area:
+                        continue
+                    if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
+                            3] > self.tracker.vertical_ratio:
+                        continue
+                    online_tlwhs[cls_id].append(tlwh)
+                    online_ids[cls_id].append(tid)
+                    online_scores[cls_id].append(tscore)
+
+            tracking_outs = {
+                'online_tlwhs': online_tlwhs,
+                'online_scores': online_scores,
+                'online_ids': online_ids,
+            }
+            return tracking_outs
+
+    def predict_image(self,
+                      image_list,
+                      run_benchmark=False,
+                      repeats=1,
+                      visual=True,
+                      seq_name=None):
+        num_classes = self.num_classes
+        image_list.sort()
+        ids2names = self.pred_config.labels
+        mot_results = []
+        for frame_id, img_file in enumerate(image_list):
+            batch_image_list = [img_file]  # bs=1 in MOT model
+            frame, _ = decode_image(img_file, {})
+            if run_benchmark:
+                # preprocess
+                inputs = self.preprocess(batch_image_list)  # warmup
+                self.det_times.preprocess_time_s.start()
+                inputs = self.preprocess(batch_image_list)
+                self.det_times.preprocess_time_s.end()
+
+                # model prediction
+                result_warmup = self.predict(repeats=repeats)  # warmup
+                self.det_times.inference_time_s.start()
+                result = self.predict(repeats=repeats)
+                self.det_times.inference_time_s.end(repeats=repeats)
+
+                # postprocess
+                result_warmup = self.postprocess(inputs, result)  # warmup
+                self.det_times.postprocess_time_s.start()
+                det_result = self.postprocess(inputs, result)
+                self.det_times.postprocess_time_s.end()
+
+                # tracking
+                if self.use_reid:
+                    det_result['frame_id'] = frame_id
+                    det_result['seq_name'] = seq_name
+                    det_result['ori_image'] = frame
+                    det_result = self.reidprocess(det_result)
+                result_warmup = self.tracking(det_result)
+                self.det_times.tracking_time_s.start()
+                if self.use_reid:
+                    det_result = self.reidprocess(det_result)
+                tracking_outs = self.tracking(det_result)
+                self.det_times.tracking_time_s.end()
+                self.det_times.img_num += 1
+
+                cm, gm, gu = get_current_memory_mb()
+                self.cpu_mem += cm
+                self.gpu_mem += gm
+                self.gpu_util += gu
+
+            else:
+                self.det_times.preprocess_time_s.start()
+                inputs = self.preprocess(batch_image_list)
+                self.det_times.preprocess_time_s.end()
+
+                self.det_times.inference_time_s.start()
+                result = self.predict()
+                self.det_times.inference_time_s.end()
+
+                self.det_times.postprocess_time_s.start()
+                det_result = self.postprocess(inputs, result)
+                self.det_times.postprocess_time_s.end()
+
+                # tracking process
+                self.det_times.tracking_time_s.start()
+                if self.use_reid:
+                    det_result['frame_id'] = frame_id
+                    det_result['seq_name'] = seq_name
+                    det_result['ori_image'] = frame
+                    det_result = self.reidprocess(det_result)
+                tracking_outs = self.tracking(det_result)
+                self.det_times.tracking_time_s.end()
+                self.det_times.img_num += 1
+
+            online_tlwhs = tracking_outs['online_tlwhs']
+            online_scores = tracking_outs['online_scores']
+            online_ids = tracking_outs['online_ids']
+
+            mot_results.append([online_tlwhs, online_scores, online_ids])
+
+            if visual:
+                if len(image_list) > 1 and frame_id % 10 == 0:
+                    print('Tracking frame {}'.format(frame_id))
+                frame, _ = decode_image(img_file, {})
+                if isinstance(online_tlwhs, defaultdict):
+                    im = plot_tracking_dict(
+                        frame,
+                        num_classes,
+                        online_tlwhs,
+                        online_ids,
+                        online_scores,
+                        frame_id=frame_id,
+                        ids2names=ids2names)
+                else:
+                    im = plot_tracking(
+                        frame,
+                        online_tlwhs,
+                        online_ids,
+                        online_scores,
+                        frame_id=frame_id,
+                        ids2names=ids2names)
+                save_dir = os.path.join(self.output_dir, seq_name)
+                if not os.path.exists(save_dir):
+                    os.makedirs(save_dir)
+                cv2.imwrite(
+                    os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im)
+
+        return mot_results
+
+    def predict_video(self, video_file, camera_id):
+        video_out_name = 'output.mp4'
+        if camera_id != -1:
+            capture = cv2.VideoCapture(camera_id)
+        else:
+            capture = cv2.VideoCapture(video_file)
+            video_out_name = os.path.split(video_file)[-1]
+        # Get Video info : resolution, fps, frame count
+        width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
+        height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
+        fps = int(capture.get(cv2.CAP_PROP_FPS))
+        frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
+        print("fps: %d, frame_count: %d" % (fps, frame_count))
+
+        if not os.path.exists(self.output_dir):
+            os.makedirs(self.output_dir)
+        out_path = os.path.join(self.output_dir, video_out_name)
+        video_format = 'mp4v'
+        fourcc = cv2.VideoWriter_fourcc(*video_format)
+        writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
+
+        frame_id = 1
+        timer = MOTTimer()
+        results = defaultdict(list)
+        num_classes = self.num_classes
+        data_type = 'mcmot' if num_classes > 1 else 'mot'
+        ids2names = self.pred_config.labels
+
+        while (1):
+            ret, frame = capture.read()
+            if not ret:
+                break
+            if frame_id % 10 == 0:
+                print('Tracking frame: %d' % (frame_id))
+            frame_id += 1
+
+            timer.tic()
+            seq_name = video_out_name.split('.')[0]
+            mot_results = self.predict_image(
+                [frame[:, :, ::-1]], visual=False, seq_name=seq_name)
+            timer.toc()
+
+            # bs=1 in MOT model
+            online_tlwhs, online_scores, online_ids = mot_results[0]
+
+            fps = 1. / timer.duration
+            if self.use_deepsort_tracker:
+                # use DeepSORTTracker, only support singe class
+                results[0].append(
+                    (frame_id + 1, online_tlwhs, online_scores, online_ids))
+                im = plot_tracking(
+                    frame,
+                    online_tlwhs,
+                    online_ids,
+                    online_scores,
+                    frame_id=frame_id,
+                    fps=fps,
+                    ids2names=ids2names)
+            else:
+                # use ByteTracker, support multiple class
+                for cls_id in range(num_classes):
+                    results[cls_id].append(
+                        (frame_id + 1, online_tlwhs[cls_id],
+                         online_scores[cls_id], online_ids[cls_id]))
+                im = plot_tracking_dict(
+                    frame,
+                    num_classes,
+                    online_tlwhs,
+                    online_ids,
+                    online_scores,
+                    frame_id=frame_id,
+                    fps=fps,
+                    ids2names=ids2names)
+
+            writer.write(im)
+            if camera_id != -1:
+                cv2.imshow('Mask Detection', im)
+                if cv2.waitKey(1) & 0xFF == ord('q'):
+                    break
+
+        if self.save_mot_txts:
+            result_filename = os.path.join(
+                self.output_dir, video_out_name.split('.')[-2] + '.txt')
+            write_mot_results(result_filename, results)
+
+        writer.release()
+
+
+def main():
+    deploy_file = os.path.join(FLAGS.model_dir, 'infer_cfg.yml')
+    with open(deploy_file) as f:
+        yml_conf = yaml.safe_load(f)
+    arch = yml_conf['arch']
+    detector = SDE_Detector(
+        FLAGS.model_dir,
+        tracker_config=FLAGS.tracker_config,
+        device=FLAGS.device,
+        run_mode=FLAGS.run_mode,
+        batch_size=1,
+        trt_min_shape=FLAGS.trt_min_shape,
+        trt_max_shape=FLAGS.trt_max_shape,
+        trt_opt_shape=FLAGS.trt_opt_shape,
+        trt_calib_mode=FLAGS.trt_calib_mode,
+        cpu_threads=FLAGS.cpu_threads,
+        enable_mkldnn=FLAGS.enable_mkldnn,
+        output_dir=FLAGS.output_dir,
+        threshold=FLAGS.threshold,
+        save_images=FLAGS.save_images,
+        save_mot_txts=FLAGS.save_mot_txts, )
+
+    # predict from video file or camera video stream
+    if FLAGS.video_file is not None or FLAGS.camera_id != -1:
+        detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
+    else:
+        # predict from image
+        if FLAGS.image_dir is None and FLAGS.image_file is not None:
+            assert FLAGS.batch_size == 1, "--batch_size should be 1 in MOT models."
+        img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
+        seq_name = FLAGS.image_dir.split('/')[-1]
+        detector.predict_image(
+            img_list, FLAGS.run_benchmark, repeats=10, seq_name=seq_name)
+
+        if not FLAGS.run_benchmark:
+            detector.det_times.info(average=True)
+        else:
+            mode = FLAGS.run_mode
+            model_dir = FLAGS.model_dir
+            model_info = {
+                'model_name': model_dir.strip('/').split('/')[-1],
+                'precision': mode.split('_')[-1]
+            }
+            bench_log(detector, img_list, model_info, name='MOT')
+
+
+if __name__ == '__main__':
+    paddle.enable_static()
+    parser = argsparser()
+    FLAGS = parser.parse_args()
+    print_arguments(FLAGS)
+    FLAGS.device = FLAGS.device.upper()
+    assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
+                            ], "device should be CPU, GPU, NPU or XPU"
+
+    main()
--- a/third-party/paddle-inference/picodet_postprocess.py
+++ b/third-party/paddle-inference/picodet_postprocess.py
@@ -0,0 +1,227 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+from scipy.special import softmax
+
+
+def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200):
+    """
+    Args:
+        box_scores (N, 5): boxes in corner-form and probabilities.
+        iou_threshold: intersection over union threshold.
+        top_k: keep top_k results. If k <= 0, keep all the results.
+        candidate_size: only consider the candidates with the highest scores.
+    Returns:
+         picked: a list of indexes of the kept boxes
+    """
+    scores = box_scores[:, -1]
+    boxes = box_scores[:, :-1]
+    picked = []
+    indexes = np.argsort(scores)
+    indexes = indexes[-candidate_size:]
+    while len(indexes) > 0:
+        current = indexes[-1]
+        picked.append(current)
+        if 0 < top_k == len(picked) or len(indexes) == 1:
+            break
+        current_box = boxes[current, :]
+        indexes = indexes[:-1]
+        rest_boxes = boxes[indexes, :]
+        iou = iou_of(
+            rest_boxes,
+            np.expand_dims(
+                current_box, axis=0), )
+        indexes = indexes[iou <= iou_threshold]
+
+    return box_scores[picked, :]
+
+
+def iou_of(boxes0, boxes1, eps=1e-5):
+    """Return intersection-over-union (Jaccard index) of boxes.
+    Args:
+        boxes0 (N, 4): ground truth boxes.
+        boxes1 (N or 1, 4): predicted boxes.
+        eps: a small number to avoid 0 as denominator.
+    Returns:
+        iou (N): IoU values.
+    """
+    overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2])
+    overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:])
+
+    overlap_area = area_of(overlap_left_top, overlap_right_bottom)
+    area0 = area_of(boxes0[..., :2], boxes0[..., 2:])
+    area1 = area_of(boxes1[..., :2], boxes1[..., 2:])
+    return overlap_area / (area0 + area1 - overlap_area + eps)
+
+
+def area_of(left_top, right_bottom):
+    """Compute the areas of rectangles given two corners.
+    Args:
+        left_top (N, 2): left top corner.
+        right_bottom (N, 2): right bottom corner.
+    Returns:
+        area (N): return the area.
+    """
+    hw = np.clip(right_bottom - left_top, 0.0, None)
+    return hw[..., 0] * hw[..., 1]
+
+
+class PicoDetPostProcess(object):
+    """
+    Args:
+        input_shape (int): network input image size
+        ori_shape (int): ori image shape of before padding
+        scale_factor (float): scale factor of ori image
+        enable_mkldnn (bool): whether to open MKLDNN
+    """
+
+    def __init__(self,
+                 input_shape,
+                 ori_shape,
+                 scale_factor,
+                 strides=[8, 16, 32, 64],
+                 score_threshold=0.4,
+                 nms_threshold=0.5,
+                 nms_top_k=1000,
+                 keep_top_k=100):
+        self.ori_shape = ori_shape
+        self.input_shape = input_shape
+        self.scale_factor = scale_factor
+        self.strides = strides
+        self.score_threshold = score_threshold
+        self.nms_threshold = nms_threshold
+        self.nms_top_k = nms_top_k
+        self.keep_top_k = keep_top_k
+
+    def warp_boxes(self, boxes, ori_shape):
+        """Apply transform to boxes
+        """
+        width, height = ori_shape[1], ori_shape[0]
+        n = len(boxes)
+        if n:
+            # warp points
+            xy = np.ones((n * 4, 3))
+            xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(
+                n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
+            # xy = xy @ M.T  # transform
+            xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8)  # rescale
+            # create new boxes
+            x = xy[:, [0, 2, 4, 6]]
+            y = xy[:, [1, 3, 5, 7]]
+            xy = np.concatenate(
+                (x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
+            # clip boxes
+            xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width)
+            xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height)
+            return xy.astype(np.float32)
+        else:
+            return boxes
+
+    def __call__(self, scores, raw_boxes):
+        batch_size = raw_boxes[0].shape[0]
+        reg_max = int(raw_boxes[0].shape[-1] / 4 - 1)
+        out_boxes_num = []
+        out_boxes_list = []
+        for batch_id in range(batch_size):
+            # generate centers
+            decode_boxes = []
+            select_scores = []
+            for stride, box_distribute, score in zip(self.strides, raw_boxes,
+                                                     scores):
+                box_distribute = box_distribute[batch_id]
+                score = score[batch_id]
+                # centers
+                fm_h = self.input_shape[0] / stride
+                fm_w = self.input_shape[1] / stride
+                h_range = np.arange(fm_h)
+                w_range = np.arange(fm_w)
+                ww, hh = np.meshgrid(w_range, h_range)
+                ct_row = (hh.flatten() + 0.5) * stride
+                ct_col = (ww.flatten() + 0.5) * stride
+                center = np.stack((ct_col, ct_row, ct_col, ct_row), axis=1)
+
+                # box distribution to distance
+                reg_range = np.arange(reg_max + 1)
+                box_distance = box_distribute.reshape((-1, reg_max + 1))
+                box_distance = softmax(box_distance, axis=1)
+                box_distance = box_distance * np.expand_dims(reg_range, axis=0)
+                box_distance = np.sum(box_distance, axis=1).reshape((-1, 4))
+                box_distance = box_distance * stride
+
+                # top K candidate
+                topk_idx = np.argsort(score.max(axis=1))[::-1]
+                topk_idx = topk_idx[:self.nms_top_k]
+                center = center[topk_idx]
+                score = score[topk_idx]
+                box_distance = box_distance[topk_idx]
+
+                # decode box
+                decode_box = center + [-1, -1, 1, 1] * box_distance
+
+                select_scores.append(score)
+                decode_boxes.append(decode_box)
+
+            # nms
+            bboxes = np.concatenate(decode_boxes, axis=0)
+            confidences = np.concatenate(select_scores, axis=0)
+            picked_box_probs = []
+            picked_labels = []
+            for class_index in range(0, confidences.shape[1]):
+                probs = confidences[:, class_index]
+                mask = probs > self.score_threshold
+                probs = probs[mask]
+                if probs.shape[0] == 0:
+                    continue
+                subset_boxes = bboxes[mask, :]
+                box_probs = np.concatenate(
+                    [subset_boxes, probs.reshape(-1, 1)], axis=1)
+                box_probs = hard_nms(
+                    box_probs,
+                    iou_threshold=self.nms_threshold,
+                    top_k=self.keep_top_k, )
+                picked_box_probs.append(box_probs)
+                picked_labels.extend([class_index] * box_probs.shape[0])
+
+            if len(picked_box_probs) == 0:
+                out_boxes_list.append(np.empty((0, 4)))
+                out_boxes_num.append(0)
+
+            else:
+                picked_box_probs = np.concatenate(picked_box_probs)
+
+                # resize output boxes
+                picked_box_probs[:, :4] = self.warp_boxes(
+                    picked_box_probs[:, :4], self.ori_shape[batch_id])
+                im_scale = np.concatenate([
+                    self.scale_factor[batch_id][::-1],
+                    self.scale_factor[batch_id][::-1]
+                ])
+                picked_box_probs[:, :4] /= im_scale
+                # clas score box
+                out_boxes_list.append(
+                    np.concatenate(
+                        [
+                            np.expand_dims(
+                                np.array(picked_labels),
+                                axis=-1), np.expand_dims(
+                                    picked_box_probs[:, 4], axis=-1),
+                            picked_box_probs[:, :4]
+                        ],
+                        axis=1))
+                out_boxes_num.append(len(picked_labels))
+
+        out_boxes_list = np.concatenate(out_boxes_list, axis=0)
+        out_boxes_num = np.asarray(out_boxes_num).astype(np.int32)
+        return out_boxes_list, out_boxes_num
--- a/third-party/paddle-inference/preprocess.py
+++ b/third-party/paddle-inference/preprocess.py
@@ -0,0 +1,549 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import numpy as np
+import imgaug.augmenters as iaa
+from keypoint_preprocess import get_affine_transform
+from PIL import Image
+
+
+def decode_image(im_file, im_info):
+    """read rgb image
+    Args:
+        im_file (str|np.ndarray): input can be image path or np.ndarray
+        im_info (dict): info of image
+    Returns:
+        im (np.ndarray):  processed image (np.ndarray)
+        im_info (dict): info of processed image
+    """
+    if isinstance(im_file, str):
+        with open(im_file, 'rb') as f:
+            im_read = f.read()
+        data = np.frombuffer(im_read, dtype='uint8')
+        im = cv2.imdecode(data, 1)  # BGR mode, but need RGB mode
+        im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
+    else:
+        im = im_file
+    im_info['im_shape'] = np.array(im.shape[:2], dtype=np.float32)
+    im_info['scale_factor'] = np.array([1., 1.], dtype=np.float32)
+    return im, im_info
+
+
+class Resize_Mult32(object):
+    """resize image by target_size and max_size
+    Args:
+        target_size (int): the target size of image
+        keep_ratio (bool): whether keep_ratio or not, default true
+        interp (int): method of resize
+    """
+
+    def __init__(self, limit_side_len, limit_type, interp=cv2.INTER_LINEAR):
+        self.limit_side_len = limit_side_len
+        self.limit_type = limit_type
+        self.interp = interp
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        im_channel = im.shape[2]
+        im_scale_y, im_scale_x = self.generate_scale(im)
+        im = cv2.resize(
+            im,
+            None,
+            None,
+            fx=im_scale_x,
+            fy=im_scale_y,
+            interpolation=self.interp)
+        im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
+        im_info['scale_factor'] = np.array(
+            [im_scale_y, im_scale_x]).astype('float32')
+        return im, im_info
+
+    def generate_scale(self, img):
+        """
+        Args:
+            img (np.ndarray): image (np.ndarray)
+        Returns:
+            im_scale_x: the resize ratio of X
+            im_scale_y: the resize ratio of Y
+        """
+        limit_side_len = self.limit_side_len
+        h, w, c = img.shape
+
+        # limit the max side
+        if self.limit_type == 'max':
+            if h > w:
+                ratio = float(limit_side_len) / h
+            else:
+                ratio = float(limit_side_len) / w
+        elif self.limit_type == 'min':
+            if h < w:
+                ratio = float(limit_side_len) / h
+            else:
+                ratio = float(limit_side_len) / w
+        elif self.limit_type == 'resize_long':
+            ratio = float(limit_side_len) / max(h, w)
+        else:
+            raise Exception('not support limit type, image ')
+        resize_h = int(h * ratio)
+        resize_w = int(w * ratio)
+
+        resize_h = max(int(round(resize_h / 32) * 32), 32)
+        resize_w = max(int(round(resize_w / 32) * 32), 32)
+
+        im_scale_y = resize_h / float(h)
+        im_scale_x = resize_w / float(w)
+        return im_scale_y, im_scale_x
+
+
+class Resize(object):
+    """resize image by target_size and max_size
+    Args:
+        target_size (int): the target size of image
+        keep_ratio (bool): whether keep_ratio or not, default true
+        interp (int): method of resize
+    """
+
+    def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR):
+        if isinstance(target_size, int):
+            target_size = [target_size, target_size]
+        self.target_size = target_size
+        self.keep_ratio = keep_ratio
+        self.interp = interp
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        assert len(self.target_size) == 2
+        assert self.target_size[0] > 0 and self.target_size[1] > 0
+        im_channel = im.shape[2]
+        im_scale_y, im_scale_x = self.generate_scale(im)
+        im = cv2.resize(
+            im,
+            None,
+            None,
+            fx=im_scale_x,
+            fy=im_scale_y,
+            interpolation=self.interp)
+        im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
+        im_info['scale_factor'] = np.array(
+            [im_scale_y, im_scale_x]).astype('float32')
+        return im, im_info
+
+    def generate_scale(self, im):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+        Returns:
+            im_scale_x: the resize ratio of X
+            im_scale_y: the resize ratio of Y
+        """
+        origin_shape = im.shape[:2]
+        im_c = im.shape[2]
+        if self.keep_ratio:
+            im_size_min = np.min(origin_shape)
+            im_size_max = np.max(origin_shape)
+            target_size_min = np.min(self.target_size)
+            target_size_max = np.max(self.target_size)
+            im_scale = float(target_size_min) / float(im_size_min)
+            if np.round(im_scale * im_size_max) > target_size_max:
+                im_scale = float(target_size_max) / float(im_size_max)
+            im_scale_x = im_scale
+            im_scale_y = im_scale
+        else:
+            resize_h, resize_w = self.target_size
+            im_scale_y = resize_h / float(origin_shape[0])
+            im_scale_x = resize_w / float(origin_shape[1])
+        return im_scale_y, im_scale_x
+
+
+class ShortSizeScale(object):
+    """
+    Scale images by short size.
+    Args:
+        short_size(float | int): Short size of an image will be scaled to the short_size.
+        fixed_ratio(bool): Set whether to zoom according to a fixed ratio. default: True
+        do_round(bool): Whether to round up when calculating the zoom ratio. default: False
+        backend(str): Choose pillow or cv2 as the graphics processing backend. default: 'pillow'
+    """
+
+    def __init__(self,
+                 short_size,
+                 fixed_ratio=True,
+                 keep_ratio=None,
+                 do_round=False,
+                 backend='pillow'):
+        self.short_size = short_size
+        assert (fixed_ratio and not keep_ratio) or (
+            not fixed_ratio
+        ), "fixed_ratio and keep_ratio cannot be true at the same time"
+        self.fixed_ratio = fixed_ratio
+        self.keep_ratio = keep_ratio
+        self.do_round = do_round
+
+        assert backend in [
+            'pillow', 'cv2'
+        ], "Scale's backend must be pillow or cv2, but get {backend}"
+
+        self.backend = backend
+
+    def __call__(self, img):
+        """
+        Performs resize operations.
+        Args:
+            img (PIL.Image): a PIL.Image.
+        return:
+            resized_img: a PIL.Image after scaling.
+        """
+
+        result_img = None
+
+        if isinstance(img, np.ndarray):
+            h, w, _ = img.shape
+        elif isinstance(img, Image.Image):
+            w, h = img.size
+        else:
+            raise NotImplementedError
+
+        if w <= h:
+            ow = self.short_size
+            if self.fixed_ratio:  # default is True
+                oh = int(self.short_size * 4.0 / 3.0)
+            elif not self.keep_ratio:  # no
+                oh = self.short_size
+            else:
+                scale_factor = self.short_size / w
+                oh = int(h * float(scale_factor) +
+                         0.5) if self.do_round else int(h * self.short_size / w)
+                ow = int(w * float(scale_factor) +
+                         0.5) if self.do_round else int(w * self.short_size / h)
+        else:
+            oh = self.short_size
+            if self.fixed_ratio:
+                ow = int(self.short_size * 4.0 / 3.0)
+            elif not self.keep_ratio:  # no
+                ow = self.short_size
+            else:
+                scale_factor = self.short_size / h
+                oh = int(h * float(scale_factor) +
+                         0.5) if self.do_round else int(h * self.short_size / w)
+                ow = int(w * float(scale_factor) +
+                         0.5) if self.do_round else int(w * self.short_size / h)
+
+        if type(img) == np.ndarray:
+            img = Image.fromarray(img, mode='RGB')
+
+        if self.backend == 'pillow':
+            result_img = img.resize((ow, oh), Image.BILINEAR)
+        elif self.backend == 'cv2' and (self.keep_ratio is not None):
+            result_img = cv2.resize(
+                img, (ow, oh), interpolation=cv2.INTER_LINEAR)
+        else:
+            result_img = Image.fromarray(
+                cv2.resize(
+                    np.asarray(img), (ow, oh), interpolation=cv2.INTER_LINEAR))
+
+        return result_img
+
+
+class NormalizeImage(object):
+    """normalize image
+    Args:
+        mean (list): im - mean
+        std (list): im / std
+        is_scale (bool): whether need im / 255
+        norm_type (str): type in ['mean_std', 'none']
+    """
+
+    def __init__(self, mean, std, is_scale=True, norm_type='mean_std'):
+        self.mean = mean
+        self.std = std
+        self.is_scale = is_scale
+        self.norm_type = norm_type
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        im = im.astype(np.float32, copy=False)
+        if self.is_scale:
+            scale = 1.0 / 255.0
+            im *= scale
+
+        if self.norm_type == 'mean_std':
+            mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
+            std = np.array(self.std)[np.newaxis, np.newaxis, :]
+            im -= mean
+            im /= std
+        return im, im_info
+
+
+class Permute(object):
+    """permute image
+    Args:
+        to_bgr (bool): whether convert RGB to BGR 
+        channel_first (bool): whether convert HWC to CHW
+    """
+
+    def __init__(self, ):
+        super(Permute, self).__init__()
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        im = im.transpose((2, 0, 1)).copy()
+        return im, im_info
+
+
+class PadStride(object):
+    """ padding image for model with FPN, instead PadBatch(pad_to_stride) in original config
+    Args:
+        stride (bool): model with FPN need image shape % stride == 0
+    """
+
+    def __init__(self, stride=0):
+        self.coarsest_stride = stride
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        coarsest_stride = self.coarsest_stride
+        if coarsest_stride <= 0:
+            return im, im_info
+        im_c, im_h, im_w = im.shape
+        pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride)
+        pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride)
+        padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32)
+        padding_im[:, :im_h, :im_w] = im
+        return padding_im, im_info
+
+
+class LetterBoxResize(object):
+    def __init__(self, target_size):
+        """
+        Resize image to target size, convert normalized xywh to pixel xyxy
+        format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]).
+        Args:
+            target_size (int|list): image target size.
+        """
+        super(LetterBoxResize, self).__init__()
+        if isinstance(target_size, int):
+            target_size = [target_size, target_size]
+        self.target_size = target_size
+
+    def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)):
+        # letterbox: resize a rectangular image to a padded rectangular
+        shape = img.shape[:2]  # [height, width]
+        ratio_h = float(height) / shape[0]
+        ratio_w = float(width) / shape[1]
+        ratio = min(ratio_h, ratio_w)
+        new_shape = (round(shape[1] * ratio),
+                     round(shape[0] * ratio))  # [width, height]
+        padw = (width - new_shape[0]) / 2
+        padh = (height - new_shape[1]) / 2
+        top, bottom = round(padh - 0.1), round(padh + 0.1)
+        left, right = round(padw - 0.1), round(padw + 0.1)
+
+        img = cv2.resize(
+            img, new_shape, interpolation=cv2.INTER_AREA)  # resized, no border
+        img = cv2.copyMakeBorder(
+            img, top, bottom, left, right, cv2.BORDER_CONSTANT,
+            value=color)  # padded rectangular
+        return img, ratio, padw, padh
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        assert len(self.target_size) == 2
+        assert self.target_size[0] > 0 and self.target_size[1] > 0
+        height, width = self.target_size
+        h, w = im.shape[:2]
+        im, ratio, padw, padh = self.letterbox(im, height=height, width=width)
+
+        new_shape = [round(h * ratio), round(w * ratio)]
+        im_info['im_shape'] = np.array(new_shape, dtype=np.float32)
+        im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32)
+        return im, im_info
+
+
+class Pad(object):
+    def __init__(self, size, fill_value=[114.0, 114.0, 114.0]):
+        """
+        Pad image to a specified size.
+        Args:
+            size (list[int]): image target size
+            fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0)
+        """
+        super(Pad, self).__init__()
+        if isinstance(size, int):
+            size = [size, size]
+        self.size = size
+        self.fill_value = fill_value
+
+    def __call__(self, im, im_info):
+        im_h, im_w = im.shape[:2]
+        h, w = self.size
+        if h == im_h and w == im_w:
+            im = im.astype(np.float32)
+            return im, im_info
+
+        canvas = np.ones((h, w, 3), dtype=np.float32)
+        canvas *= np.array(self.fill_value, dtype=np.float32)
+        canvas[0:im_h, 0:im_w, :] = im.astype(np.float32)
+        im = canvas
+        return im, im_info
+
+
+class WarpAffine(object):
+    """Warp affine the image
+    """
+
+    def __init__(self,
+                 keep_res=False,
+                 pad=31,
+                 input_h=512,
+                 input_w=512,
+                 scale=0.4,
+                 shift=0.1,
+                 down_ratio=4):
+        self.keep_res = keep_res
+        self.pad = pad
+        self.input_h = input_h
+        self.input_w = input_w
+        self.scale = scale
+        self.shift = shift
+        self.down_ratio = down_ratio
+
+    def __call__(self, im, im_info):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+            im_info (dict): info of image
+        Returns:
+            im (np.ndarray):  processed image (np.ndarray)
+            im_info (dict): info of processed image
+        """
+        img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
+
+        h, w = img.shape[:2]
+
+        if self.keep_res:
+            # True in detection eval/infer
+            input_h = (h | self.pad) + 1
+            input_w = (w | self.pad) + 1
+            s = np.array([input_w, input_h], dtype=np.float32)
+            c = np.array([w // 2, h // 2], dtype=np.float32)
+
+        else:
+            # False in centertrack eval_mot/eval_mot
+            s = max(h, w) * 1.0
+            input_h, input_w = self.input_h, self.input_w
+            c = np.array([w / 2., h / 2.], dtype=np.float32)
+
+        trans_input = get_affine_transform(c, s, 0, [input_w, input_h])
+        img = cv2.resize(img, (w, h))
+        inp = cv2.warpAffine(
+            img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR)
+
+        if not self.keep_res:
+            out_h = input_h // self.down_ratio
+            out_w = input_w // self.down_ratio
+            trans_output = get_affine_transform(c, s, 0, [out_w, out_h])
+
+            im_info.update({
+                'center': c,
+                'scale': s,
+                'out_height': out_h,
+                'out_width': out_w,
+                'inp_height': input_h,
+                'inp_width': input_w,
+                'trans_input': trans_input,
+                'trans_output': trans_output,
+            })
+        return inp, im_info
+
+
+class CULaneResize(object):
+    def __init__(self, img_h, img_w, cut_height, prob=0.5):
+        super(CULaneResize, self).__init__()
+        self.img_h = img_h
+        self.img_w = img_w
+        self.cut_height = cut_height
+        self.prob = prob
+
+    def __call__(self, im, im_info):
+        # cut
+        im = im[self.cut_height:, :, :]
+        # resize
+        transform = iaa.Sometimes(self.prob,
+                                  iaa.Resize({
+                                      "height": self.img_h,
+                                      "width": self.img_w
+                                  }))
+        im = transform(image=im.copy().astype(np.uint8))
+
+        im = im.astype(np.float32) / 255.
+        # check transpose is need whether the func decode_image is equal to CULaneDataSet cv.imread
+        im = im.transpose(2, 0, 1)
+
+        return im, im_info
+
+
+def preprocess(im, preprocess_ops):
+    # process image by preprocess_ops
+    im_info = {
+        'scale_factor': np.array(
+            [1., 1.], dtype=np.float32),
+        'im_shape': None,
+    }
+    im, im_info = decode_image(im, im_info)
+    for operator in preprocess_ops:
+        im, im_info = operator(im, im_info)
+    return im, im_info
--- a/third-party/paddle-inference/tracker_config.yml
+++ b/third-party/paddle-inference/tracker_config.yml
@@ -0,0 +1,32 @@
+# config of tracker for MOT SDE Detector, use 'JDETracker' as default.
+# The tracker of MOT JDE Detector (such as FairMOT) is exported together with the model.
+# Here 'min_box_area' and 'vertical_ratio' are set for pedestrian, you can modify for other objects tracking.
+
+type: JDETracker # 'JDETracker', 'DeepSORTTracker' or 'CenterTracker'
+
+# BYTETracker
+JDETracker:
+  use_byte: True
+  det_thresh: 0.3
+  conf_thres: 0.6
+  low_conf_thres: 0.1
+  match_thres: 0.9
+  min_box_area: 0
+  vertical_ratio: 0 # 1.6 for pedestrian
+
+DeepSORTTracker:
+  input_size: [64, 192]
+  min_box_area: 0
+  vertical_ratio: -1
+  budget: 100
+  max_age: 70
+  n_init: 3
+  metric_type: cosine
+  matching_threshold: 0.2
+  max_iou_distance: 0.9
+
+CenterTracker:
+  min_box_area: -1
+  vertical_ratio: -1
+  track_thresh: 0.4
+  pre_thresh: 0.5
--- a/third-party/paddle-inference/utils.py
+++ b/third-party/paddle-inference/utils.py
@@ -0,0 +1,551 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import time
+import os
+import ast
+import argparse
+import numpy as np
+
+
+def argsparser():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--model_dir",
+        type=str,
+        default=None,
+        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
+              "'infer_cfg.yml', created by tools/export_model.py."),
+        required=True)
+    parser.add_argument(
+        "--image_file", type=str, default=None, help="Path of image file.")
+    parser.add_argument(
+        "--image_dir",
+        type=str,
+        default=None,
+        help="Dir of image file, `image_file` has a higher priority.")
+    parser.add_argument(
+        "--batch_size", type=int, default=1, help="batch_size for inference.")
+    parser.add_argument(
+        "--video_file",
+        type=str,
+        default=None,
+        help="Path of video file, `video_file` or `camera_id` has a highest priority."
+    )
+    parser.add_argument(
+        "--camera_id",
+        type=int,
+        default=-1,
+        help="device id of camera to predict.")
+    parser.add_argument(
+        "--threshold", type=float, default=0.5, help="Threshold of score.")
+    parser.add_argument(
+        "--output_dir",
+        type=str,
+        default="output",
+        help="Directory of output visualization files.")
+    parser.add_argument(
+        "--run_mode",
+        type=str,
+        default='paddle',
+        help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        help="Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU."
+    )
+    parser.add_argument(
+        "--use_gpu",
+        type=ast.literal_eval,
+        default=False,
+        help="Deprecated, please use `--device`.")
+    parser.add_argument(
+        "--run_benchmark",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether to predict a image_file repeatedly for benchmark")
+    parser.add_argument(
+        "--enable_mkldnn",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether use mkldnn with CPU.")
+    parser.add_argument(
+        "--enable_mkldnn_bfloat16",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether use mkldnn bfloat16 inference with CPU.")
+    parser.add_argument(
+        "--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
+    parser.add_argument(
+        "--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_max_shape",
+        type=int,
+        default=1280,
+        help="max_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_opt_shape",
+        type=int,
+        default=640,
+        help="opt_shape for TensorRT.")
+    parser.add_argument(
+        "--trt_calib_mode",
+        type=bool,
+        default=False,
+        help="If the model is produced by TRT offline quantitative "
+        "calibration, trt_calib_mode need to set True.")
+    parser.add_argument(
+        '--save_images',
+        type=ast.literal_eval,
+        default=True,
+        help='Save visualization image results.')
+    parser.add_argument(
+        '--save_mot_txts',
+        action='store_true',
+        help='Save tracking results (txt).')
+    parser.add_argument(
+        '--save_mot_txt_per_img',
+        action='store_true',
+        help='Save tracking results (txt) for each image.')
+    parser.add_argument(
+        '--scaled',
+        type=bool,
+        default=False,
+        help="Whether coords after detector outputs are scaled, False in JDE YOLOv3 "
+        "True in general detector.")
+    parser.add_argument(
+        "--tracker_config", type=str, default=None, help=("tracker donfig"))
+    parser.add_argument(
+        "--reid_model_dir",
+        type=str,
+        default=None,
+        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
+              "'infer_cfg.yml', created by tools/export_model.py."))
+    parser.add_argument(
+        "--reid_batch_size",
+        type=int,
+        default=50,
+        help="max batch_size for reid model inference.")
+    parser.add_argument(
+        '--use_dark',
+        type=ast.literal_eval,
+        default=True,
+        help='whether to use darkpose to get better keypoint position predict ')
+    parser.add_argument(
+        "--action_file",
+        type=str,
+        default=None,
+        help="Path of input file for action recognition.")
+    parser.add_argument(
+        "--window_size",
+        type=int,
+        default=50,
+        help="Temporal size of skeleton feature for action recognition.")
+    parser.add_argument(
+        "--random_pad",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether do random padding for action recognition.")
+    parser.add_argument(
+        "--save_results",
+        action='store_true',
+        default=False,
+        help="Whether save detection result to file using coco format")
+    parser.add_argument(
+        '--use_coco_category',
+        action='store_true',
+        default=False,
+        help='Whether to use the coco format dictionary `clsid2catid`')
+    parser.add_argument(
+        "--slice_infer",
+        action='store_true',
+        help="Whether to slice the image and merge the inference results for small object detection."
+    )
+    parser.add_argument(
+        '--slice_size',
+        nargs='+',
+        type=int,
+        default=[640, 640],
+        help="Height of the sliced image.")
+    parser.add_argument(
+        "--overlap_ratio",
+        nargs='+',
+        type=float,
+        default=[0.25, 0.25],
+        help="Overlap height ratio of the sliced image.")
+    parser.add_argument(
+        "--combine_method",
+        type=str,
+        default='nms',
+        help="Combine method of the sliced images' detection results, choose in ['nms', 'nmm', 'concat']."
+    )
+    parser.add_argument(
+        "--match_threshold",
+        type=float,
+        default=0.6,
+        help="Combine method matching threshold.")
+    parser.add_argument(
+        "--match_metric",
+        type=str,
+        default='ios',
+        help="Combine method matching metric, choose in ['iou', 'ios'].")
+    parser.add_argument(
+        "--collect_trt_shape_info",
+        action='store_true',
+        default=False,
+        help="Whether to collect dynamic shape before using tensorrt.")
+    parser.add_argument(
+        "--tuned_trt_shape_file",
+        type=str,
+        default="shape_range_info.pbtxt",
+        help="Path of a dynamic shape file for tensorrt.")
+    parser.add_argument("--use_fd_format", action="store_true")
+    parser.add_argument(
+        "--task_type",
+        type=str,
+        default='Detection',
+        help="How to save the coco result, it only work with save_results==True.  Optional inputs are Rotate or Detection, default is Detection."
+    )
+    return parser
+
+
+class Times(object):
+    def __init__(self):
+        self.time = 0.
+        # start time
+        self.st = 0.
+        # end time
+        self.et = 0.
+
+    def start(self):
+        self.st = time.time()
+
+    def end(self, repeats=1, accumulative=True):
+        self.et = time.time()
+        if accumulative:
+            self.time += (self.et - self.st) / repeats
+        else:
+            self.time = (self.et - self.st) / repeats
+
+    def reset(self):
+        self.time = 0.
+        self.st = 0.
+        self.et = 0.
+
+    def value(self):
+        return round(self.time, 4)
+
+
+class Timer(Times):
+    def __init__(self, with_tracker=False):
+        super(Timer, self).__init__()
+        self.with_tracker = with_tracker
+        self.preprocess_time_s = Times()
+        self.inference_time_s = Times()
+        self.postprocess_time_s = Times()
+        self.tracking_time_s = Times()
+        self.img_num = 0
+
+    def info(self, average=False):
+        pre_time = self.preprocess_time_s.value()
+        infer_time = self.inference_time_s.value()
+        post_time = self.postprocess_time_s.value()
+        track_time = self.tracking_time_s.value()
+
+        total_time = pre_time + infer_time + post_time
+        if self.with_tracker:
+            total_time = total_time + track_time
+        total_time = round(total_time, 4)
+        print("------------------ Inference Time Info ----------------------")
+        print("total_time(ms): {}, img_num: {}".format(total_time * 1000,
+                                                       self.img_num))
+        preprocess_time = round(pre_time / max(1, self.img_num),
+                                4) if average else pre_time
+        postprocess_time = round(post_time / max(1, self.img_num),
+                                 4) if average else post_time
+        inference_time = round(infer_time / max(1, self.img_num),
+                               4) if average else infer_time
+        tracking_time = round(track_time / max(1, self.img_num),
+                              4) if average else track_time
+
+        average_latency = total_time / max(1, self.img_num)
+        qps = 0
+        if total_time > 0:
+            qps = 1 / average_latency
+        print("average latency time(ms): {:.2f}, QPS: {:2f}".format(
+            average_latency * 1000, qps))
+        if self.with_tracker:
+            print(
+                "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}, tracking_time(ms): {:.2f}".
+                format(preprocess_time * 1000, inference_time * 1000,
+                       postprocess_time * 1000, tracking_time * 1000))
+        else:
+            print(
+                "preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}".
+                format(preprocess_time * 1000, inference_time * 1000,
+                       postprocess_time * 1000))
+
+    def report(self, average=False):
+        dic = {}
+        pre_time = self.preprocess_time_s.value()
+        infer_time = self.inference_time_s.value()
+        post_time = self.postprocess_time_s.value()
+        track_time = self.tracking_time_s.value()
+
+        dic['preprocess_time_s'] = round(pre_time / max(1, self.img_num),
+                                         4) if average else pre_time
+        dic['inference_time_s'] = round(infer_time / max(1, self.img_num),
+                                        4) if average else infer_time
+        dic['postprocess_time_s'] = round(post_time / max(1, self.img_num),
+                                          4) if average else post_time
+        dic['img_num'] = self.img_num
+        total_time = pre_time + infer_time + post_time
+        if self.with_tracker:
+            dic['tracking_time_s'] = round(track_time / max(1, self.img_num),
+                                           4) if average else track_time
+            total_time = total_time + track_time
+        dic['total_time_s'] = round(total_time, 4)
+        return dic
+
+
+def get_current_memory_mb():
+    """
+    It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
+    And this function Current program is time-consuming.
+    """
+    import pynvml
+    import psutil
+    import GPUtil
+    gpu_id = int(os.environ.get('CUDA_VISIBLE_DEVICES', 0))
+
+    pid = os.getpid()
+    p = psutil.Process(pid)
+    info = p.memory_full_info()
+    cpu_mem = info.uss / 1024. / 1024.
+    gpu_mem = 0
+    gpu_percent = 0
+    gpus = GPUtil.getGPUs()
+    if gpu_id is not None and len(gpus) > 0:
+        gpu_percent = gpus[gpu_id].load
+        pynvml.nvmlInit()
+        handle = pynvml.nvmlDeviceGetHandleByIndex(0)
+        meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
+        gpu_mem = meminfo.used / 1024. / 1024.
+    return round(cpu_mem, 4), round(gpu_mem, 4), round(gpu_percent, 4)
+
+
+def multiclass_nms(bboxs, num_classes, match_threshold=0.6, match_metric='iou'):
+    final_boxes = []
+    for c in range(num_classes):
+        idxs = bboxs[:, 0] == c
+        if np.count_nonzero(idxs) == 0: continue
+        r = nms(bboxs[idxs, 1:], match_threshold, match_metric)
+        final_boxes.append(np.concatenate([np.full((r.shape[0], 1), c), r], 1))
+    return final_boxes
+
+
+def nms(dets, match_threshold=0.6, match_metric='iou'):
+    """ Apply NMS to avoid detecting too many overlapping bounding boxes.
+        Args:
+            dets: shape [N, 5], [score, x1, y1, x2, y2]
+            match_metric: 'iou' or 'ios'
+            match_threshold: overlap thresh for match metric.
+    """
+    if dets.shape[0] == 0:
+        return dets[[], :]
+    scores = dets[:, 0]
+    x1 = dets[:, 1]
+    y1 = dets[:, 2]
+    x2 = dets[:, 3]
+    y2 = dets[:, 4]
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    order = scores.argsort()[::-1]
+
+    ndets = dets.shape[0]
+    suppressed = np.zeros((ndets), dtype=np.int32)
+
+    for _i in range(ndets):
+        i = order[_i]
+        if suppressed[i] == 1:
+            continue
+        ix1 = x1[i]
+        iy1 = y1[i]
+        ix2 = x2[i]
+        iy2 = y2[i]
+        iarea = areas[i]
+        for _j in range(_i + 1, ndets):
+            j = order[_j]
+            if suppressed[j] == 1:
+                continue
+            xx1 = max(ix1, x1[j])
+            yy1 = max(iy1, y1[j])
+            xx2 = min(ix2, x2[j])
+            yy2 = min(iy2, y2[j])
+            w = max(0.0, xx2 - xx1 + 1)
+            h = max(0.0, yy2 - yy1 + 1)
+            inter = w * h
+            if match_metric == 'iou':
+                union = iarea + areas[j] - inter
+                match_value = inter / union
+            elif match_metric == 'ios':
+                smaller = min(iarea, areas[j])
+                match_value = inter / smaller
+            else:
+                raise ValueError()
+            if match_value >= match_threshold:
+                suppressed[j] = 1
+    keep = np.where(suppressed == 0)[0]
+    dets = dets[keep, :]
+    return dets
+
+
+coco_clsid2catid = {
+    0: 1,
+    1: 2,
+    2: 3,
+    3: 4,
+    4: 5,
+    5: 6,
+    6: 7,
+    7: 8,
+    8: 9,
+    9: 10,
+    10: 11,
+    11: 13,
+    12: 14,
+    13: 15,
+    14: 16,
+    15: 17,
+    16: 18,
+    17: 19,
+    18: 20,
+    19: 21,
+    20: 22,
+    21: 23,
+    22: 24,
+    23: 25,
+    24: 27,
+    25: 28,
+    26: 31,
+    27: 32,
+    28: 33,
+    29: 34,
+    30: 35,
+    31: 36,
+    32: 37,
+    33: 38,
+    34: 39,
+    35: 40,
+    36: 41,
+    37: 42,
+    38: 43,
+    39: 44,
+    40: 46,
+    41: 47,
+    42: 48,
+    43: 49,
+    44: 50,
+    45: 51,
+    46: 52,
+    47: 53,
+    48: 54,
+    49: 55,
+    50: 56,
+    51: 57,
+    52: 58,
+    53: 59,
+    54: 60,
+    55: 61,
+    56: 62,
+    57: 63,
+    58: 64,
+    59: 65,
+    60: 67,
+    61: 70,
+    62: 72,
+    63: 73,
+    64: 74,
+    65: 75,
+    66: 76,
+    67: 77,
+    68: 78,
+    69: 79,
+    70: 80,
+    71: 81,
+    72: 82,
+    73: 84,
+    74: 85,
+    75: 86,
+    76: 87,
+    77: 88,
+    78: 89,
+    79: 90
+}
+
+
+def gaussian_radius(bbox_size, min_overlap):
+    height, width = bbox_size
+
+    a1 = 1
+    b1 = (height + width)
+    c1 = width * height * (1 - min_overlap) / (1 + min_overlap)
+    sq1 = np.sqrt(b1**2 - 4 * a1 * c1)
+    radius1 = (b1 + sq1) / (2 * a1)
+
+    a2 = 4
+    b2 = 2 * (height + width)
+    c2 = (1 - min_overlap) * width * height
+    sq2 = np.sqrt(b2**2 - 4 * a2 * c2)
+    radius2 = (b2 + sq2) / 2
+
+    a3 = 4 * min_overlap
+    b3 = -2 * min_overlap * (height + width)
+    c3 = (min_overlap - 1) * width * height
+    sq3 = np.sqrt(b3**2 - 4 * a3 * c3)
+    radius3 = (b3 + sq3) / 2
+    return min(radius1, radius2, radius3)
+
+
+def gaussian2D(shape, sigma_x=1, sigma_y=1):
+    m, n = [(ss - 1.) / 2. for ss in shape]
+    y, x = np.ogrid[-m:m + 1, -n:n + 1]
+
+    h = np.exp(-(x * x / (2 * sigma_x * sigma_x) + y * y / (2 * sigma_y *
+                                                            sigma_y)))
+    h[h < np.finfo(h.dtype).eps * h.max()] = 0
+    return h
+
+
+def draw_umich_gaussian(heatmap, center, radius, k=1):
+    """
+    draw_umich_gaussian, refer to https://github.com/xingyizhou/CenterNet/blob/master/src/lib/utils/image.py#L126
+    """
+    diameter = 2 * radius + 1
+    gaussian = gaussian2D(
+        (diameter, diameter), sigma_x=diameter / 6, sigma_y=diameter / 6)
+
+    x, y = int(center[0]), int(center[1])
+
+    height, width = heatmap.shape[0:2]
+
+    left, right = min(x, radius), min(width - x, radius + 1)
+    top, bottom = min(y, radius), min(height - y, radius + 1)
+
+    masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right]
+    masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:
+                               radius + right]
+    if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0:
+        np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap)
+    return heatmap
--- a/third-party/paddle-inference/visualize.py
+++ b/third-party/paddle-inference/visualize.py
@@ -0,0 +1,665 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import division
+
+import os
+import cv2
+import math
+import numpy as np
+import PIL
+from PIL import Image, ImageDraw, ImageFile
+ImageFile.LOAD_TRUNCATED_IMAGES = True
+
+def imagedraw_textsize_c(draw, text):
+    if int(PIL.__version__.split('.')[0]) < 10:
+        tw, th = draw.textsize(text)
+    else:
+        left, top, right, bottom = draw.textbbox((0, 0), text)
+        tw, th = right - left, bottom - top
+
+    return tw, th
+    
+
+def visualize_box_mask(im, results, labels, threshold=0.5):
+    """
+    Args:
+        im (str/np.ndarray): path of image/np.ndarray read by cv2
+        results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
+                        matix element:[class, score, x_min, y_min, x_max, y_max]
+                        MaskRCNN's results include 'masks': np.ndarray:
+                        shape:[N, im_h, im_w]
+        labels (list): labels:['class1', ..., 'classn']
+        threshold (float): Threshold of score.
+    Returns:
+        im (PIL.Image.Image): visualized image
+    """
+    if isinstance(im, str):
+        im = Image.open(im).convert('RGB')
+    elif isinstance(im, np.ndarray):
+        im = Image.fromarray(im)
+    if 'masks' in results and 'boxes' in results and len(results['boxes']) > 0:
+        im = draw_mask(
+            im, results['boxes'], results['masks'], labels, threshold=threshold)
+    if 'boxes' in results and len(results['boxes']) > 0:
+        im = draw_box(im, results['boxes'], labels, threshold=threshold)
+    if 'segm' in results:
+        im = draw_segm(
+            im,
+            results['segm'],
+            results['label'],
+            results['score'],
+            labels,
+            threshold=threshold)
+    return im
+
+
+def get_color_map_list(num_classes):
+    """
+    Args:
+        num_classes (int): number of class
+    Returns:
+        color_map (list): RGB color list
+    """
+    color_map = num_classes * [0, 0, 0]
+    for i in range(0, num_classes):
+        j = 0
+        lab = i
+        while lab:
+            color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
+            color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
+            color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
+            j += 1
+            lab >>= 3
+    color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
+    return color_map
+
+
+def draw_mask(im, np_boxes, np_masks, labels, threshold=0.5):
+    """
+    Args:
+        im (PIL.Image.Image): PIL image
+        np_boxes (np.ndarray): shape:[N,6], N: number of box,
+            matix element:[class, score, x_min, y_min, x_max, y_max]
+        np_masks (np.ndarray): shape:[N, im_h, im_w]
+        labels (list): labels:['class1', ..., 'classn']
+        threshold (float): threshold of mask
+    Returns:
+        im (PIL.Image.Image): visualized image
+    """
+    color_list = get_color_map_list(len(labels))
+    w_ratio = 0.4
+    alpha = 0.7
+    im = np.array(im).astype('float32')
+    clsid2color = {}
+    expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1)
+    np_boxes = np_boxes[expect_boxes, :]
+    np_masks = np_masks[expect_boxes, :, :]
+    im_h, im_w = im.shape[:2]
+    np_masks = np_masks[:, :im_h, :im_w]
+    for i in range(len(np_masks)):
+        clsid, score = int(np_boxes[i][0]), np_boxes[i][1]
+        mask = np_masks[i]
+        if clsid not in clsid2color:
+            clsid2color[clsid] = color_list[clsid]
+        color_mask = clsid2color[clsid]
+        for c in range(3):
+            color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
+        idx = np.nonzero(mask)
+        color_mask = np.array(color_mask)
+        im[idx[0], idx[1], :] *= 1.0 - alpha
+        im[idx[0], idx[1], :] += alpha * color_mask
+    return Image.fromarray(im.astype('uint8'))
+
+
+def draw_box(im, np_boxes, labels, threshold=0.5):
+    """
+    Args:
+        im (PIL.Image.Image): PIL image
+        np_boxes (np.ndarray): shape:[N,6], N: number of box,
+                               matix element:[class, score, x_min, y_min, x_max, y_max]
+        labels (list): labels:['class1', ..., 'classn']
+        threshold (float): threshold of box
+    Returns:
+        im (PIL.Image.Image): visualized image
+    """
+    draw_thickness = min(im.size) // 320
+    draw = ImageDraw.Draw(im)
+    clsid2color = {}
+    color_list = get_color_map_list(len(labels))
+    expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1)
+    np_boxes = np_boxes[expect_boxes, :]
+
+    vis_order = False
+    if len(np_boxes) > 0 and len(np_boxes[0]) == 7:
+        np_boxes = sorted(np_boxes, key=lambda x: x[6])
+        vis_order = True
+
+    centers = []
+    for dt in np_boxes:
+        if len(dt) == 7:
+            clsid, bbox, score, read_order = int(dt[0]), dt[2:6], dt[1], int(dt[6])
+        else:
+            clsid, bbox, score = int(dt[0]), dt[2:], dt[1]
+        if clsid not in clsid2color:
+            clsid2color[clsid] = color_list[clsid]
+        color = tuple(clsid2color[clsid])
+
+        if len(bbox) == 4:
+            xmin, ymin, xmax, ymax = bbox
+            print('class_id:{:d}, confidence:{:.4f}, left_top:[{:.2f},{:.2f}],'
+                  'right_bottom:[{:.2f},{:.2f}]'.format(
+                      int(clsid), score, xmin, ymin, xmax, ymax))
+            # draw bbox
+            draw.line(
+                [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
+                 (xmin, ymin)],
+                width=draw_thickness,
+                fill=color)
+            cx, cy = int((xmin + xmax)/2), int((ymin + ymax)/2)
+            centers.append((cx, cy))
+        elif len(bbox) == 8:
+            x1, y1, x2, y2, x3, y3, x4, y4 = bbox
+            draw.line(
+                [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)],
+                width=2,
+                fill=color)
+            xmin = min(x1, x2, x3, x4)
+            ymin = min(y1, y2, y3, y4)
+
+        # draw label
+        text = "{} {:.4f}".format(labels[clsid], score)
+        tw, th = imagedraw_textsize_c(draw, text)
+        draw.rectangle(
+            [(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color)
+        draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
+
+    if vis_order:
+        for i in range(len(centers)-1):
+            draw.line([centers[i], centers[i+1]], fill=(255, 0, 0), width=2)
+
+    return im
+
+
+def draw_segm(im,
+              np_segms,
+              np_label,
+              np_score,
+              labels,
+              threshold=0.5,
+              alpha=0.7):
+    """
+    Draw segmentation on image
+    """
+    mask_color_id = 0
+    w_ratio = .4
+    color_list = get_color_map_list(len(labels))
+    im = np.array(im).astype('float32')
+    clsid2color = {}
+    np_segms = np_segms.astype(np.uint8)
+    for i in range(np_segms.shape[0]):
+        mask, score, clsid = np_segms[i], np_score[i], np_label[i]
+        if score < threshold:
+            continue
+
+        if clsid not in clsid2color:
+            clsid2color[clsid] = color_list[clsid]
+        color_mask = clsid2color[clsid]
+        for c in range(3):
+            color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
+        idx = np.nonzero(mask)
+        color_mask = np.array(color_mask)
+        idx0 = np.minimum(idx[0], im.shape[0] - 1)
+        idx1 = np.minimum(idx[1], im.shape[1] - 1)
+        im[idx0, idx1, :] *= 1.0 - alpha
+        im[idx0, idx1, :] += alpha * color_mask
+        sum_x = np.sum(mask, axis=0)
+        x = np.where(sum_x > 0.5)[0]
+        sum_y = np.sum(mask, axis=1)
+        y = np.where(sum_y > 0.5)[0]
+        x0, x1, y0, y1 = x[0], x[-1], y[0], y[-1]
+        cv2.rectangle(im, (x0, y0), (x1, y1),
+                      tuple(color_mask.astype('int32').tolist()), 1)
+        bbox_text = '%s %.2f' % (labels[clsid], score)
+        t_size = cv2.getTextSize(bbox_text, 0, 0.3, thickness=1)[0]
+        cv2.rectangle(im, (x0, y0), (x0 + t_size[0], y0 - t_size[1] - 3),
+                      tuple(color_mask.astype('int32').tolist()), -1)
+        cv2.putText(
+            im,
+            bbox_text, (x0, y0 - 2),
+            cv2.FONT_HERSHEY_SIMPLEX,
+            0.3, (0, 0, 0),
+            1,
+            lineType=cv2.LINE_AA)
+    return Image.fromarray(im.astype('uint8'))
+
+
+def get_color(idx):
+    idx = idx * 3
+    color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255)
+    return color
+
+
+def visualize_pose(imgfile,
+                   results,
+                   visual_thresh=0.6,
+                   save_name='pose.jpg',
+                   save_dir='output',
+                   returnimg=False,
+                   ids=None):
+    try:
+        import matplotlib.pyplot as plt
+        import matplotlib
+        plt.switch_backend('agg')
+    except Exception as e:
+        print('Matplotlib not found, please install matplotlib.'
+              'for example: `pip install matplotlib`.')
+        raise e
+    skeletons, scores = results['keypoint']
+    skeletons = np.array(skeletons)
+    kpt_nums = 17
+    if len(skeletons) > 0:
+        kpt_nums = skeletons.shape[1]
+    if kpt_nums == 17:  #plot coco keypoint
+        EDGES = [(0, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7), (6, 8),
+                 (7, 9), (8, 10), (5, 11), (6, 12), (11, 13), (12, 14),
+                 (13, 15), (14, 16), (11, 12)]
+    else:  #plot mpii keypoint
+        EDGES = [(0, 1), (1, 2), (3, 4), (4, 5), (2, 6), (3, 6), (6, 7), (7, 8),
+                 (8, 9), (10, 11), (11, 12), (13, 14), (14, 15), (8, 12),
+                 (8, 13)]
+    NUM_EDGES = len(EDGES)
+
+    colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
+            [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
+            [170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]
+    cmap = matplotlib.cm.get_cmap('hsv')
+    plt.figure()
+
+    img = cv2.imread(imgfile) if type(imgfile) == str else imgfile
+
+    color_set = results['colors'] if 'colors' in results else None
+
+    if 'bbox' in results and ids is None:
+        bboxs = results['bbox']
+        for j, rect in enumerate(bboxs):
+            xmin, ymin, xmax, ymax = rect
+            color = colors[0] if color_set is None else colors[color_set[j] %
+                                                               len(colors)]
+            cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, 1)
+
+    canvas = img.copy()
+    for i in range(kpt_nums):
+        for j in range(len(skeletons)):
+            if skeletons[j][i, 2] < visual_thresh:
+                continue
+            if ids is None:
+                color = colors[i] if color_set is None else colors[color_set[j]
+                                                                   %
+                                                                   len(colors)]
+            else:
+                color = get_color(ids[j])
+
+            cv2.circle(
+                canvas,
+                tuple(skeletons[j][i, 0:2].astype('int32')),
+                2,
+                color,
+                thickness=-1)
+
+    to_plot = cv2.addWeighted(img, 0.3, canvas, 0.7, 0)
+    fig = matplotlib.pyplot.gcf()
+
+    stickwidth = 2
+
+    for i in range(NUM_EDGES):
+        for j in range(len(skeletons)):
+            edge = EDGES[i]
+            if skeletons[j][edge[0], 2] < visual_thresh or skeletons[j][edge[
+                    1], 2] < visual_thresh:
+                continue
+
+            cur_canvas = canvas.copy()
+            X = [skeletons[j][edge[0], 1], skeletons[j][edge[1], 1]]
+            Y = [skeletons[j][edge[0], 0], skeletons[j][edge[1], 0]]
+            mX = np.mean(X)
+            mY = np.mean(Y)
+            length = ((X[0] - X[1])**2 + (Y[0] - Y[1])**2)**0.5
+            angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
+            polygon = cv2.ellipse2Poly((int(mY), int(mX)),
+                                       (int(length / 2), stickwidth),
+                                       int(angle), 0, 360, 1)
+            if ids is None:
+                color = colors[i] if color_set is None else colors[color_set[j]
+                                                                   %
+                                                                   len(colors)]
+            else:
+                color = get_color(ids[j])
+            cv2.fillConvexPoly(cur_canvas, polygon, color)
+            canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0)
+    if returnimg:
+        return canvas
+    save_name = os.path.join(
+        save_dir, os.path.splitext(os.path.basename(imgfile))[0] + '_vis.jpg')
+    plt.imsave(save_name, canvas[:, :, ::-1])
+    print("keypoint visualize image saved to: " + save_name)
+    plt.close()
+
+
+def visualize_attr(im, results, boxes=None, is_mtmct=False):
+    if isinstance(im, str):
+        im = Image.open(im)
+        im = np.ascontiguousarray(np.copy(im))
+        im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
+    else:
+        im = np.ascontiguousarray(np.copy(im))
+
+    im_h, im_w = im.shape[:2]
+    text_scale = max(0.5, im.shape[0] / 3000.)
+    text_thickness = 1
+
+    line_inter = im.shape[0] / 40.
+    for i, res in enumerate(results):
+        if boxes is None:
+            text_w = 3
+            text_h = 1
+        elif is_mtmct:
+            box = boxes[i]  # multi camera, bbox shape is x,y, w,h
+            text_w = int(box[0]) + 3
+            text_h = int(box[1])
+        else:
+            box = boxes[i]  # single camera, bbox shape is 0, 0, x,y, w,h
+            text_w = int(box[2]) + 3
+            text_h = int(box[3])
+        for text in res:
+            text_h += int(line_inter)
+            text_loc = (text_w, text_h)
+            cv2.putText(
+                im,
+                text,
+                text_loc,
+                cv2.FONT_ITALIC,
+                text_scale, (0, 255, 255),
+                thickness=text_thickness)
+    return im
+
+
+def visualize_action(im,
+                     mot_boxes,
+                     action_visual_collector=None,
+                     action_text="",
+                     video_action_score=None,
+                     video_action_text=""):
+    im = cv2.imread(im) if isinstance(im, str) else im
+    im_h, im_w = im.shape[:2]
+
+    text_scale = max(1, im.shape[1] / 400.)
+    text_thickness = 2
+
+    if action_visual_collector:
+        id_action_dict = {}
+        for collector, action_type in zip(action_visual_collector, action_text):
+            id_detected = collector.get_visualize_ids()
+            for pid in id_detected:
+                id_action_dict[pid] = id_action_dict.get(pid, [])
+                id_action_dict[pid].append(action_type)
+        for mot_box in mot_boxes:
+            # mot_box is a format with [mot_id, class, score, xmin, ymin, w, h] 
+            if mot_box[0] in id_action_dict:
+                text_position = (int(mot_box[3] + mot_box[5] * 0.75),
+                                 int(mot_box[4] - 10))
+                display_text = ', '.join(id_action_dict[mot_box[0]])
+                cv2.putText(im, display_text, text_position,
+                            cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 0, 255), 2)
+
+    if video_action_score:
+        cv2.putText(
+            im,
+            video_action_text + ': %.2f' % video_action_score,
+            (int(im_w / 2), int(15 * text_scale) + 5),
+            cv2.FONT_ITALIC,
+            text_scale, (0, 0, 255),
+            thickness=text_thickness)
+
+    return im
+
+
+def visualize_vehicleplate(im, results, boxes=None):
+    if isinstance(im, str):
+        im = Image.open(im)
+        im = np.ascontiguousarray(np.copy(im))
+        im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
+    else:
+        im = np.ascontiguousarray(np.copy(im))
+
+    im_h, im_w = im.shape[:2]
+    text_scale = max(1.0, im.shape[0] / 400.)
+    text_thickness = 2
+
+    line_inter = im.shape[0] / 40.
+    for i, res in enumerate(results):
+        if boxes is None:
+            text_w = 3
+            text_h = 1
+        else:
+            box = boxes[i]
+            text = res
+            if text == "":
+                continue
+            text_w = int(box[2])
+            text_h = int(box[5] + box[3])
+            text_loc = (text_w, text_h)
+            cv2.putText(
+                im,
+                "LP: " + text,
+                text_loc,
+                cv2.FONT_ITALIC,
+                text_scale, (0, 255, 255),
+                thickness=text_thickness)
+    return im
+
+
+def draw_press_box_lanes(im, np_boxes, labels, threshold=0.5):
+    """
+    Args:
+        im (PIL.Image.Image): PIL image
+        np_boxes (np.ndarray): shape:[N,6], N: number of box,
+                               matix element:[class, score, x_min, y_min, x_max, y_max]
+        labels (list): labels:['class1', ..., 'classn']
+        threshold (float): threshold of box
+    Returns:
+        im (PIL.Image.Image): visualized image
+    """
+
+    if isinstance(im, str):
+        im = Image.open(im).convert('RGB')
+    elif isinstance(im, np.ndarray):
+        im = Image.fromarray(im)
+
+    draw_thickness = min(im.size) // 320
+    draw = ImageDraw.Draw(im)
+    clsid2color = {}
+    color_list = get_color_map_list(len(labels))
+
+    if np_boxes.shape[1] == 7:
+        np_boxes = np_boxes[:, 1:]
+
+    expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1)
+    np_boxes = np_boxes[expect_boxes, :]
+
+    for dt in np_boxes:
+        clsid, bbox, score = int(dt[0]), dt[2:], dt[1]
+        if clsid not in clsid2color:
+            clsid2color[clsid] = color_list[clsid]
+        color = tuple(clsid2color[clsid])
+
+        if len(bbox) == 4:
+            xmin, ymin, xmax, ymax = bbox
+            # draw bbox
+            draw.line(
+                [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
+                 (xmin, ymin)],
+                width=draw_thickness,
+                fill=(0, 0, 255))
+        elif len(bbox) == 8:
+            x1, y1, x2, y2, x3, y3, x4, y4 = bbox
+            draw.line(
+                [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)],
+                width=2,
+                fill=color)
+            xmin = min(x1, x2, x3, x4)
+            ymin = min(y1, y2, y3, y4)
+
+        # draw label
+        text = "{}".format(labels[clsid])
+        tw, th = imagedraw_textsize_c(draw, text)
+        draw.rectangle(
+            [(xmin + 1, ymax - th), (xmin + tw + 1, ymax)], fill=color)
+        draw.text((xmin + 1, ymax - th), text, fill=(0, 0, 255))
+    return im
+
+
+def visualize_vehiclepress(im, results, threshold=0.5):
+    results = np.array(results)
+    labels = ['violation']
+    im = draw_press_box_lanes(im, results, labels, threshold=threshold)
+    return im
+
+
+def visualize_lane(im, lanes):
+    if isinstance(im, str):
+        im = Image.open(im).convert('RGB')
+    elif isinstance(im, np.ndarray):
+        im = Image.fromarray(im)
+
+    draw_thickness = min(im.size) // 320
+    draw = ImageDraw.Draw(im)
+
+    if len(lanes) > 0:
+        for lane in lanes:
+            draw.line(
+                [(lane[0], lane[1]), (lane[2], lane[3])],
+                width=draw_thickness,
+                fill=(0, 0, 255))
+
+    return im
+
+
+def visualize_vehicle_retrograde(im, mot_res, vehicle_retrograde_res):
+    if isinstance(im, str):
+        im = Image.open(im).convert('RGB')
+    elif isinstance(im, np.ndarray):
+        im = Image.fromarray(im)
+
+    draw_thickness = min(im.size) // 320
+    draw = ImageDraw.Draw(im)
+
+    lane = vehicle_retrograde_res['fence_line']
+    if lane is not None:
+        draw.line(
+            [(lane[0], lane[1]), (lane[2], lane[3])],
+            width=draw_thickness,
+            fill=(0, 0, 0))
+
+    mot_id = vehicle_retrograde_res['output']
+    if mot_id is None or len(mot_id) == 0:
+        return im
+
+    if mot_res is None:
+        return im
+    np_boxes = mot_res['boxes']
+
+    if np_boxes is not None:
+        for dt in np_boxes:
+            if dt[0] not in mot_id:
+                continue
+            bbox = dt[3:]
+            if len(bbox) == 4:
+                xmin, ymin, xmax, ymax = bbox
+                # draw bbox
+                draw.line(
+                    [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
+                     (xmin, ymin)],
+                    width=draw_thickness,
+                    fill=(0, 255, 0))
+
+            # draw label
+            text = "retrograde"
+            tw, th = imagedraw_textsize_c(draw, text)
+            draw.rectangle(
+                [(xmax + 1, ymin - th), (xmax + tw + 1, ymin)],
+                fill=(0, 255, 0))
+            draw.text((xmax + 1, ymin - th), text, fill=(0, 255, 0))
+
+    return im
+
+
+COLORS = [
+    (255, 0, 0),
+    (0, 255, 0),
+    (0, 0, 255),
+    (255, 255, 0),
+    (255, 0, 255),
+    (0, 255, 255),
+    (128, 255, 0),
+    (255, 128, 0),
+    (128, 0, 255),
+    (255, 0, 128),
+    (0, 128, 255),
+    (0, 255, 128),
+    (128, 255, 255),
+    (255, 128, 255),
+    (255, 255, 128),
+    (60, 180, 0),
+    (180, 60, 0),
+    (0, 60, 180),
+    (0, 180, 60),
+    (60, 0, 180),
+    (180, 0, 60),
+    (255, 0, 0),
+    (0, 255, 0),
+    (0, 0, 255),
+    (255, 255, 0),
+    (255, 0, 255),
+    (0, 255, 255),
+    (128, 255, 0),
+    (255, 128, 0),
+    (128, 0, 255),
+]
+
+
+def imshow_lanes(img, lanes, show=False, out_file=None, width=4):
+    lanes_xys = []
+    for _, lane in enumerate(lanes):
+        xys = []
+        for x, y in lane:
+            if x <= 0 or y <= 0:
+                continue
+            x, y = int(x), int(y)
+            xys.append((x, y))
+        lanes_xys.append(xys)
+    lanes_xys.sort(key=lambda xys: xys[0][0] if len(xys) > 0 else 0)
+
+    for idx, xys in enumerate(lanes_xys):
+        for i in range(1, len(xys)):
+            cv2.line(img, xys[i - 1], xys[i], COLORS[idx], thickness=width)
+
+    if show:
+        cv2.imshow('view', img)
+        cv2.waitKey(0)
+
+    if out_file:
+        if not os.path.exists(os.path.dirname(out_file)):
+            os.makedirs(os.path.dirname(out_file))
+        cv2.imwrite(out_file, img)