feat: 新增PaddlePaddle检测支持,重构项目架构
1. 新增concurrently依赖用于并行启动服务 2. 新增服务器启动脚本统一管理环境变量和虚拟环境 3. 新增PaddlePaddle推理引擎和配套工具代码 4. 新增抽烟检测Paddle模型支持,完善模型管理 5. 重构开发启动脚本,优化开发体验 6. 更新.gitignore排除不必要的外部目录和缓存 7. 完善文档说明,新增PaddlePaddle部署指南
This commit is contained in:
104
third-party/paddle-inference/README.md
vendored
Normal file
104
third-party/paddle-inference/README.md
vendored
Normal file
@@ -0,0 +1,104 @@
|
||||
# Python端预测部署
|
||||
|
||||
在PaddlePaddle中预测引擎和训练引擎底层有着不同的优化方法, 预测引擎使用了AnalysisPredictor,专门针对推理进行了优化,是基于[C++预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html)的Python接口,该引擎可以对模型进行多项图优化,减少不必要的内存拷贝。如果用户在部署已训练模型的过程中对性能有较高的要求,我们提供了独立于PaddleDetection的预测脚本,方便用户直接集成部署。
|
||||
|
||||
|
||||
Python端预测部署主要包含两个步骤:
|
||||
- 导出预测模型
|
||||
- 基于Python进行预测
|
||||
|
||||
## 1. 导出预测模型
|
||||
|
||||
PaddleDetection在训练过程包括网络的前向和优化器相关参数,而在部署过程中,我们只需要前向参数,具体参考:[导出模型](../EXPORT_MODEL.md),例如
|
||||
|
||||
```bash
|
||||
# 导出YOLOv3检测模型
|
||||
python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --output_dir=./inference_model \
|
||||
-o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams
|
||||
|
||||
# 导出HigherHRNet(bottom-up)关键点检测模型
|
||||
python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams
|
||||
|
||||
# 导出HRNet(top-down)关键点检测模型
|
||||
python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_384x288.yml -o weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams
|
||||
|
||||
# 导出FairMOT多目标跟踪模型
|
||||
python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
|
||||
|
||||
# 导出ByteTrack多目标跟踪模型(相当于只导出检测器)
|
||||
python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
|
||||
```
|
||||
|
||||
导出后目录下,包括`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`四个文件。
|
||||
|
||||
|
||||
## 2. 基于Python的预测
|
||||
|
||||
### 2.1 通用检测
|
||||
在终端输入以下命令进行预测:
|
||||
```bash
|
||||
python deploy/python/infer.py --model_dir=./output_inference/yolov3_darknet53_270e_coco --image_file=./demo/000000014439.jpg --device=GPU
|
||||
```
|
||||
|
||||
### 2.2 关键点检测
|
||||
在终端输入以下命令进行预测:
|
||||
```bash
|
||||
# keypoint top-down(HRNet)/bottom-up(HigherHRNet)单独推理,该模式下top-down模型HRNet只支持单人截图预测
|
||||
python deploy/python/keypoint_infer.py --model_dir=output_inference/hrnet_w32_384x288/ --image_file=./demo/hrnet_demo.jpg --device=GPU --threshold=0.5
|
||||
python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=GPU --threshold=0.5
|
||||
|
||||
# detector 检测 + keypoint top-down模型联合部署(联合推理只支持top-down关键点模型)
|
||||
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/yolov3_darknet53_270e_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file={your video name}.mp4 --device=GPU
|
||||
```
|
||||
**注意:**
|
||||
- 关键点检测模型导出和预测具体可参照[keypoint](../../configs/keypoint/README.md),可分别在各个模型的文档中查找具体用法;
|
||||
- 此目录下的关键点检测部署为基础前向功能,更多关键点检测功能可使用PP-Human项目,参照[pipeline](../pipeline/README.md);
|
||||
|
||||
|
||||
### 2.3 多目标跟踪
|
||||
在终端输入以下命令进行预测:
|
||||
```bash
|
||||
# FairMOT跟踪
|
||||
python deploy/python/mot_jde_infer.py --model_dir=output_inference/fairmot_dla34_30e_1088x608 --video_file={your video name}.mp4 --device=GPU
|
||||
|
||||
# ByteTrack跟踪
|
||||
python deploy/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=deploy/python/tracker_config.yml --video_file={your video name}.mp4 --device=GPU --scaled=True
|
||||
|
||||
# FairMOT多目标跟踪联合HRNet关键点检测(联合推理只支持top-down关键点模型)
|
||||
python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file={your video name}.mp4 --device=GPU
|
||||
```
|
||||
|
||||
**注意:**
|
||||
- 多目标跟踪模型导出和预测具体可参照[mot]](../../configs/mot/README.md),可分别在各个模型的文档中查找具体用法;
|
||||
- 此目录下的跟踪部署为基础前向功能以及联合关键点部署,更多跟踪功能可使用PP-Human项目,参照[pipeline](../pipeline/README.md),或PP-Tracking项目(绘制轨迹、出入口流量计数),参照[pptracking](../pptracking/README.md);
|
||||
|
||||
|
||||
参数说明如下:
|
||||
|
||||
| 参数 | 是否必须| 含义 |
|
||||
|-------|-------|---------------------------------------------------------------------------------------------|
|
||||
| --model_dir | Yes| 上述导出的模型路径 |
|
||||
| --image_file | Option | 需要预测的图片 |
|
||||
| --image_dir | Option | 要预测的图片文件夹路径 |
|
||||
| --video_file | Option | 需要预测的视频 |
|
||||
| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测,可设置为:0 - (摄像头数目-1) ),预测过程中在可视化界面按`q`退出输出预测结果到:output/output.mp4 |
|
||||
| --device | Option | 运行时的设备,可选择`CPU/GPU/XPU`,默认为`CPU` |
|
||||
| --run_mode | Option | 使用GPU时,默认为paddle, 可选(paddle/trt_fp32/trt_fp16/trt_int8) |
|
||||
| --batch_size | Option | 预测时的batch size,在指定`image_dir`时有效,默认为1 |
|
||||
| --threshold | Option| 预测得分的阈值,默认为0.5 |
|
||||
| --output_dir | Option| 可视化结果保存的根目录,默认为output/ |
|
||||
| --run_benchmark | Option| 是否运行benchmark,同时需指定`--image_file`或`--image_dir`,默认为False |
|
||||
| --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速,默认为False |
|
||||
| --cpu_threads | Option| 设置cpu线程数,默认为1 |
|
||||
| --trt_calib_mode | Option| TensorRT是否使用校准功能,默认为False。使用TensorRT的int8功能时,需设置为True,使用PaddleSlim量化后的模型时需要设置为False |
|
||||
| --save_images | Option| 是否保存可视化结果 |
|
||||
| --save_results | Option| 是否在文件夹下将图片的预测结果以JSON的形式保存 |
|
||||
|
||||
|
||||
说明:
|
||||
|
||||
- 参数优先级顺序:`camera_id` > `video_file` > `image_dir` > `image_file`。
|
||||
- run_mode:paddle代表使用AnalysisPredictor,精度float32来推理,其他参数指用AnalysisPredictor,TensorRT不同精度来推理。
|
||||
- 如果安装的PaddlePaddle不支持基于TensorRT进行预测,需要自行编译,详细可参考[预测库编译教程](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html)。
|
||||
- --run_benchmark如果设置为True,则需要安装依赖`pip install pynvml psutil GPUtil`。
|
||||
- 如果需要使用导出模型在coco数据集上进行评估,请在推理时添加`--save_results`和`--use_coco_category`参数用以保存coco评估所需要的json文件
|
||||
289
third-party/paddle-inference/benchmark_utils.py
vendored
Normal file
289
third-party/paddle-inference/benchmark_utils.py
vendored
Normal file
@@ -0,0 +1,289 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import os
|
||||
import logging
|
||||
|
||||
import paddle
|
||||
import paddle.inference as paddle_infer
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
CUR_DIR = os.path.dirname(os.path.abspath(__file__))
|
||||
LOG_PATH_ROOT = f"{CUR_DIR}/../../output"
|
||||
|
||||
|
||||
class PaddleInferBenchmark(object):
|
||||
def __init__(self,
|
||||
config,
|
||||
model_info: dict={},
|
||||
data_info: dict={},
|
||||
perf_info: dict={},
|
||||
resource_info: dict={},
|
||||
**kwargs):
|
||||
"""
|
||||
Construct PaddleInferBenchmark Class to format logs.
|
||||
args:
|
||||
config(paddle.inference.Config): paddle inference config
|
||||
model_info(dict): basic model info
|
||||
{'model_name': 'resnet50'
|
||||
'precision': 'fp32'}
|
||||
data_info(dict): input data info
|
||||
{'batch_size': 1
|
||||
'shape': '3,224,224'
|
||||
'data_num': 1000}
|
||||
perf_info(dict): performance result
|
||||
{'preprocess_time_s': 1.0
|
||||
'inference_time_s': 2.0
|
||||
'postprocess_time_s': 1.0
|
||||
'total_time_s': 4.0}
|
||||
resource_info(dict):
|
||||
cpu and gpu resources
|
||||
{'cpu_rss': 100
|
||||
'gpu_rss': 100
|
||||
'gpu_util': 60}
|
||||
"""
|
||||
# PaddleInferBenchmark Log Version
|
||||
self.log_version = "1.0.3"
|
||||
|
||||
# Paddle Version
|
||||
self.paddle_version = paddle.__version__
|
||||
self.paddle_commit = paddle.__git_commit__
|
||||
paddle_infer_info = paddle_infer.get_version()
|
||||
self.paddle_branch = paddle_infer_info.strip().split(': ')[-1]
|
||||
|
||||
# model info
|
||||
self.model_info = model_info
|
||||
|
||||
# data info
|
||||
self.data_info = data_info
|
||||
|
||||
# perf info
|
||||
self.perf_info = perf_info
|
||||
|
||||
try:
|
||||
# required value
|
||||
self.model_name = model_info['model_name']
|
||||
self.precision = model_info['precision']
|
||||
|
||||
self.batch_size = data_info['batch_size']
|
||||
self.shape = data_info['shape']
|
||||
self.data_num = data_info['data_num']
|
||||
|
||||
self.inference_time_s = round(perf_info['inference_time_s'], 4)
|
||||
except:
|
||||
self.print_help()
|
||||
raise ValueError(
|
||||
"Set argument wrong, please check input argument and its type")
|
||||
|
||||
self.preprocess_time_s = perf_info.get('preprocess_time_s', 0)
|
||||
self.postprocess_time_s = perf_info.get('postprocess_time_s', 0)
|
||||
self.with_tracker = True if 'tracking_time_s' in perf_info else False
|
||||
self.tracking_time_s = perf_info.get('tracking_time_s', 0)
|
||||
self.total_time_s = perf_info.get('total_time_s', 0)
|
||||
|
||||
self.inference_time_s_90 = perf_info.get("inference_time_s_90", "")
|
||||
self.inference_time_s_99 = perf_info.get("inference_time_s_99", "")
|
||||
self.succ_rate = perf_info.get("succ_rate", "")
|
||||
self.qps = perf_info.get("qps", "")
|
||||
|
||||
# conf info
|
||||
self.config_status = self.parse_config(config)
|
||||
|
||||
# mem info
|
||||
if isinstance(resource_info, dict):
|
||||
self.cpu_rss_mb = int(resource_info.get('cpu_rss_mb', 0))
|
||||
self.cpu_vms_mb = int(resource_info.get('cpu_vms_mb', 0))
|
||||
self.cpu_shared_mb = int(resource_info.get('cpu_shared_mb', 0))
|
||||
self.cpu_dirty_mb = int(resource_info.get('cpu_dirty_mb', 0))
|
||||
self.cpu_util = round(resource_info.get('cpu_util', 0), 2)
|
||||
|
||||
self.gpu_rss_mb = int(resource_info.get('gpu_rss_mb', 0))
|
||||
self.gpu_util = round(resource_info.get('gpu_util', 0), 2)
|
||||
self.gpu_mem_util = round(resource_info.get('gpu_mem_util', 0), 2)
|
||||
else:
|
||||
self.cpu_rss_mb = 0
|
||||
self.cpu_vms_mb = 0
|
||||
self.cpu_shared_mb = 0
|
||||
self.cpu_dirty_mb = 0
|
||||
self.cpu_util = 0
|
||||
|
||||
self.gpu_rss_mb = 0
|
||||
self.gpu_util = 0
|
||||
self.gpu_mem_util = 0
|
||||
|
||||
# init benchmark logger
|
||||
self.benchmark_logger()
|
||||
|
||||
def benchmark_logger(self):
|
||||
"""
|
||||
benchmark logger
|
||||
"""
|
||||
# remove other logging handler
|
||||
for handler in logging.root.handlers[:]:
|
||||
logging.root.removeHandler(handler)
|
||||
|
||||
# Init logger
|
||||
FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
||||
log_output = f"{LOG_PATH_ROOT}/{self.model_name}.log"
|
||||
Path(f"{LOG_PATH_ROOT}").mkdir(parents=True, exist_ok=True)
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format=FORMAT,
|
||||
handlers=[
|
||||
logging.FileHandler(
|
||||
filename=log_output, mode='w'),
|
||||
logging.StreamHandler(),
|
||||
])
|
||||
self.logger = logging.getLogger(__name__)
|
||||
self.logger.info(
|
||||
f"Paddle Inference benchmark log will be saved to {log_output}")
|
||||
|
||||
def parse_config(self, config) -> dict:
|
||||
"""
|
||||
parse paddle predictor config
|
||||
args:
|
||||
config(paddle.inference.Config): paddle inference config
|
||||
return:
|
||||
config_status(dict): dict style config info
|
||||
"""
|
||||
if isinstance(config, paddle_infer.Config):
|
||||
config_status = {}
|
||||
config_status['runtime_device'] = "gpu" if config.use_gpu(
|
||||
) else "cpu"
|
||||
config_status['ir_optim'] = config.ir_optim()
|
||||
config_status['enable_tensorrt'] = config.tensorrt_engine_enabled()
|
||||
config_status['precision'] = self.precision
|
||||
config_status['enable_mkldnn'] = config.mkldnn_enabled()
|
||||
config_status[
|
||||
'cpu_math_library_num_threads'] = config.cpu_math_library_num_threads(
|
||||
)
|
||||
elif isinstance(config, dict):
|
||||
config_status['runtime_device'] = config.get('runtime_device', "")
|
||||
config_status['ir_optim'] = config.get('ir_optim', "")
|
||||
config_status['enable_tensorrt'] = config.get('enable_tensorrt', "")
|
||||
config_status['precision'] = config.get('precision', "")
|
||||
config_status['enable_mkldnn'] = config.get('enable_mkldnn', "")
|
||||
config_status['cpu_math_library_num_threads'] = config.get(
|
||||
'cpu_math_library_num_threads', "")
|
||||
else:
|
||||
self.print_help()
|
||||
raise ValueError(
|
||||
"Set argument config wrong, please check input argument and its type"
|
||||
)
|
||||
return config_status
|
||||
|
||||
def report(self, identifier=None):
|
||||
"""
|
||||
print log report
|
||||
args:
|
||||
identifier(string): identify log
|
||||
"""
|
||||
if identifier:
|
||||
identifier = f"[{identifier}]"
|
||||
else:
|
||||
identifier = ""
|
||||
|
||||
self.logger.info("\n")
|
||||
self.logger.info(
|
||||
"---------------------- Paddle info ----------------------")
|
||||
self.logger.info(f"{identifier} paddle_version: {self.paddle_version}")
|
||||
self.logger.info(f"{identifier} paddle_commit: {self.paddle_commit}")
|
||||
self.logger.info(f"{identifier} paddle_branch: {self.paddle_branch}")
|
||||
self.logger.info(f"{identifier} log_api_version: {self.log_version}")
|
||||
self.logger.info(
|
||||
"----------------------- Conf info -----------------------")
|
||||
self.logger.info(
|
||||
f"{identifier} runtime_device: {self.config_status['runtime_device']}"
|
||||
)
|
||||
self.logger.info(
|
||||
f"{identifier} ir_optim: {self.config_status['ir_optim']}")
|
||||
self.logger.info(f"{identifier} enable_memory_optim: {True}")
|
||||
self.logger.info(
|
||||
f"{identifier} enable_tensorrt: {self.config_status['enable_tensorrt']}"
|
||||
)
|
||||
self.logger.info(
|
||||
f"{identifier} enable_mkldnn: {self.config_status['enable_mkldnn']}")
|
||||
self.logger.info(
|
||||
f"{identifier} cpu_math_library_num_threads: {self.config_status['cpu_math_library_num_threads']}"
|
||||
)
|
||||
self.logger.info(
|
||||
"----------------------- Model info ----------------------")
|
||||
self.logger.info(f"{identifier} model_name: {self.model_name}")
|
||||
self.logger.info(f"{identifier} precision: {self.precision}")
|
||||
self.logger.info(
|
||||
"----------------------- Data info -----------------------")
|
||||
self.logger.info(f"{identifier} batch_size: {self.batch_size}")
|
||||
self.logger.info(f"{identifier} input_shape: {self.shape}")
|
||||
self.logger.info(f"{identifier} data_num: {self.data_num}")
|
||||
self.logger.info(
|
||||
"----------------------- Perf info -----------------------")
|
||||
self.logger.info(
|
||||
f"{identifier} cpu_rss(MB): {self.cpu_rss_mb}, cpu_vms: {self.cpu_vms_mb}, cpu_shared_mb: {self.cpu_shared_mb}, cpu_dirty_mb: {self.cpu_dirty_mb}, cpu_util: {self.cpu_util}%"
|
||||
)
|
||||
self.logger.info(
|
||||
f"{identifier} gpu_rss(MB): {self.gpu_rss_mb}, gpu_util: {self.gpu_util}%, gpu_mem_util: {self.gpu_mem_util}%"
|
||||
)
|
||||
self.logger.info(
|
||||
f"{identifier} total time spent(s): {self.total_time_s}")
|
||||
|
||||
if self.with_tracker:
|
||||
self.logger.info(
|
||||
f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, "
|
||||
f"inference_time(ms): {round(self.inference_time_s*1000, 1)}, "
|
||||
f"postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}, "
|
||||
f"tracking_time(ms): {round(self.tracking_time_s*1000, 1)}")
|
||||
else:
|
||||
self.logger.info(
|
||||
f"{identifier} preprocess_time(ms): {round(self.preprocess_time_s*1000, 1)}, "
|
||||
f"inference_time(ms): {round(self.inference_time_s*1000, 1)}, "
|
||||
f"postprocess_time(ms): {round(self.postprocess_time_s*1000, 1)}"
|
||||
)
|
||||
if self.inference_time_s_90:
|
||||
self.looger.info(
|
||||
f"{identifier} 90%_cost: {self.inference_time_s_90}, 99%_cost: {self.inference_time_s_99}, succ_rate: {self.succ_rate}"
|
||||
)
|
||||
if self.qps:
|
||||
self.logger.info(f"{identifier} QPS: {self.qps}")
|
||||
|
||||
def print_help(self):
|
||||
"""
|
||||
print function help
|
||||
"""
|
||||
print("""Usage:
|
||||
==== Print inference benchmark logs. ====
|
||||
config = paddle.inference.Config()
|
||||
model_info = {'model_name': 'resnet50'
|
||||
'precision': 'fp32'}
|
||||
data_info = {'batch_size': 1
|
||||
'shape': '3,224,224'
|
||||
'data_num': 1000}
|
||||
perf_info = {'preprocess_time_s': 1.0
|
||||
'inference_time_s': 2.0
|
||||
'postprocess_time_s': 1.0
|
||||
'total_time_s': 4.0}
|
||||
resource_info = {'cpu_rss_mb': 100
|
||||
'gpu_rss_mb': 100
|
||||
'gpu_util': 60}
|
||||
log = PaddleInferBenchmark(config, model_info, data_info, perf_info, resource_info)
|
||||
log('Test')
|
||||
""")
|
||||
|
||||
def __call__(self, identifier=None):
|
||||
"""
|
||||
__call__
|
||||
args:
|
||||
identifier(string): identify log
|
||||
"""
|
||||
self.report(identifier)
|
||||
262
third-party/paddle-inference/clrnet_postprocess.py
vendored
Normal file
262
third-party/paddle-inference/clrnet_postprocess.py
vendored
Normal file
@@ -0,0 +1,262 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import numpy as np
|
||||
import paddle
|
||||
import paddle.nn as nn
|
||||
from scipy.special import softmax
|
||||
from scipy.interpolate import InterpolatedUnivariateSpline
|
||||
|
||||
|
||||
def line_iou(pred, target, img_w, length=15, aligned=True):
|
||||
'''
|
||||
Calculate the line iou value between predictions and targets
|
||||
Args:
|
||||
pred: lane predictions, shape: (num_pred, 72)
|
||||
target: ground truth, shape: (num_target, 72)
|
||||
img_w: image width
|
||||
length: extended radius
|
||||
aligned: True for iou loss calculation, False for pair-wise ious in assign
|
||||
'''
|
||||
px1 = pred - length
|
||||
px2 = pred + length
|
||||
tx1 = target - length
|
||||
tx2 = target + length
|
||||
|
||||
if aligned:
|
||||
invalid_mask = target
|
||||
ovr = paddle.minimum(px2, tx2) - paddle.maximum(px1, tx1)
|
||||
union = paddle.maximum(px2, tx2) - paddle.minimum(px1, tx1)
|
||||
else:
|
||||
num_pred = pred.shape[0]
|
||||
invalid_mask = target.tile([num_pred, 1, 1])
|
||||
|
||||
ovr = (paddle.minimum(px2[:, None, :], tx2[None, ...]) - paddle.maximum(
|
||||
px1[:, None, :], tx1[None, ...]))
|
||||
union = (paddle.maximum(px2[:, None, :], tx2[None, ...]) -
|
||||
paddle.minimum(px1[:, None, :], tx1[None, ...]))
|
||||
|
||||
invalid_masks = (invalid_mask < 0) | (invalid_mask >= img_w)
|
||||
|
||||
ovr[invalid_masks] = 0.
|
||||
union[invalid_masks] = 0.
|
||||
iou = ovr.sum(axis=-1) / (union.sum(axis=-1) + 1e-9)
|
||||
return iou
|
||||
|
||||
|
||||
class Lane:
|
||||
def __init__(self, points=None, invalid_value=-2., metadata=None):
|
||||
super(Lane, self).__init__()
|
||||
self.curr_iter = 0
|
||||
self.points = points
|
||||
self.invalid_value = invalid_value
|
||||
self.function = InterpolatedUnivariateSpline(
|
||||
points[:, 1], points[:, 0], k=min(3, len(points) - 1))
|
||||
self.min_y = points[:, 1].min() - 0.01
|
||||
self.max_y = points[:, 1].max() + 0.01
|
||||
self.metadata = metadata or {}
|
||||
|
||||
def __repr__(self):
|
||||
return '[Lane]\n' + str(self.points) + '\n[/Lane]'
|
||||
|
||||
def __call__(self, lane_ys):
|
||||
lane_xs = self.function(lane_ys)
|
||||
|
||||
lane_xs[(lane_ys < self.min_y) | (lane_ys > self.max_y
|
||||
)] = self.invalid_value
|
||||
return lane_xs
|
||||
|
||||
def to_array(self, sample_y_range, img_w, img_h):
|
||||
self.sample_y = range(sample_y_range[0], sample_y_range[1],
|
||||
sample_y_range[2])
|
||||
sample_y = self.sample_y
|
||||
img_w, img_h = img_w, img_h
|
||||
ys = np.array(sample_y) / float(img_h)
|
||||
xs = self(ys)
|
||||
valid_mask = (xs >= 0) & (xs < 1)
|
||||
lane_xs = xs[valid_mask] * img_w
|
||||
lane_ys = ys[valid_mask] * img_h
|
||||
lane = np.concatenate(
|
||||
(lane_xs.reshape(-1, 1), lane_ys.reshape(-1, 1)), axis=1)
|
||||
return lane
|
||||
|
||||
def __iter__(self):
|
||||
return self
|
||||
|
||||
def __next__(self):
|
||||
if self.curr_iter < len(self.points):
|
||||
self.curr_iter += 1
|
||||
return self.points[self.curr_iter - 1]
|
||||
self.curr_iter = 0
|
||||
raise StopIteration
|
||||
|
||||
|
||||
class CLRNetPostProcess(object):
|
||||
"""
|
||||
Args:
|
||||
input_shape (int): network input image size
|
||||
ori_shape (int): ori image shape of before padding
|
||||
scale_factor (float): scale factor of ori image
|
||||
enable_mkldnn (bool): whether to open MKLDNN
|
||||
"""
|
||||
|
||||
def __init__(self, img_w, ori_img_h, cut_height, conf_threshold, nms_thres,
|
||||
max_lanes, num_points):
|
||||
self.img_w = img_w
|
||||
self.conf_threshold = conf_threshold
|
||||
self.nms_thres = nms_thres
|
||||
self.max_lanes = max_lanes
|
||||
self.num_points = num_points
|
||||
self.n_strips = num_points - 1
|
||||
self.n_offsets = num_points
|
||||
self.ori_img_h = ori_img_h
|
||||
self.cut_height = cut_height
|
||||
|
||||
self.prior_ys = paddle.linspace(
|
||||
start=1, stop=0, num=self.n_offsets).astype('float64')
|
||||
|
||||
def predictions_to_pred(self, predictions):
|
||||
"""
|
||||
Convert predictions to internal Lane structure for evaluation.
|
||||
"""
|
||||
lanes = []
|
||||
for lane in predictions:
|
||||
lane_xs = lane[6:].clone()
|
||||
start = min(
|
||||
max(0, int(round(lane[2].item() * self.n_strips))),
|
||||
self.n_strips)
|
||||
length = int(round(lane[5].item()))
|
||||
end = start + length - 1
|
||||
end = min(end, len(self.prior_ys) - 1)
|
||||
if start > 0:
|
||||
mask = ((lane_xs[:start] >= 0.) &
|
||||
(lane_xs[:start] <= 1.)).cpu().detach().numpy()[::-1]
|
||||
mask = ~((mask.cumprod()[::-1]).astype(np.bool_))
|
||||
lane_xs[:start][mask] = -2
|
||||
if end < len(self.prior_ys) - 1:
|
||||
lane_xs[end + 1:] = -2
|
||||
|
||||
lane_ys = self.prior_ys[lane_xs >= 0].clone()
|
||||
lane_xs = lane_xs[lane_xs >= 0]
|
||||
lane_xs = lane_xs.flip(axis=0).astype('float64')
|
||||
lane_ys = lane_ys.flip(axis=0)
|
||||
|
||||
lane_ys = (lane_ys *
|
||||
(self.ori_img_h - self.cut_height) + self.cut_height
|
||||
) / self.ori_img_h
|
||||
if len(lane_xs) <= 1:
|
||||
continue
|
||||
points = paddle.stack(
|
||||
x=(lane_xs.reshape([-1, 1]), lane_ys.reshape([-1, 1])),
|
||||
axis=1).squeeze(axis=2)
|
||||
lane = Lane(
|
||||
points=points.cpu().numpy(),
|
||||
metadata={
|
||||
'start_x': lane[3],
|
||||
'start_y': lane[2],
|
||||
'conf': lane[1]
|
||||
})
|
||||
lanes.append(lane)
|
||||
return lanes
|
||||
|
||||
def lane_nms(self, predictions, scores, nms_overlap_thresh, top_k):
|
||||
"""
|
||||
NMS for lane detection.
|
||||
predictions: paddle.Tensor [num_lanes,conf,y,x,lenght,72offsets] [12,77]
|
||||
scores: paddle.Tensor [num_lanes]
|
||||
nms_overlap_thresh: float
|
||||
top_k: int
|
||||
"""
|
||||
# sort by scores to get idx
|
||||
idx = scores.argsort(descending=True)
|
||||
keep = []
|
||||
|
||||
condidates = predictions.clone()
|
||||
condidates = condidates.index_select(idx)
|
||||
|
||||
while len(condidates) > 0:
|
||||
keep.append(idx[0])
|
||||
if len(keep) >= top_k or len(condidates) == 1:
|
||||
break
|
||||
|
||||
ious = []
|
||||
for i in range(1, len(condidates)):
|
||||
ious.append(1 - line_iou(
|
||||
condidates[i].unsqueeze(0),
|
||||
condidates[0].unsqueeze(0),
|
||||
img_w=self.img_w,
|
||||
length=15))
|
||||
ious = paddle.to_tensor(ious)
|
||||
|
||||
mask = ious <= nms_overlap_thresh
|
||||
id = paddle.where(mask == False)[0]
|
||||
|
||||
if id.shape[0] == 0:
|
||||
break
|
||||
condidates = condidates[1:].index_select(id)
|
||||
idx = idx[1:].index_select(id)
|
||||
keep = paddle.stack(keep)
|
||||
|
||||
return keep
|
||||
|
||||
def get_lanes(self, output, as_lanes=True):
|
||||
"""
|
||||
Convert model output to lanes.
|
||||
"""
|
||||
softmax = nn.Softmax(axis=1)
|
||||
decoded = []
|
||||
|
||||
for predictions in output:
|
||||
if len(predictions) == 0:
|
||||
decoded.append([])
|
||||
continue
|
||||
threshold = self.conf_threshold
|
||||
scores = softmax(predictions[:, :2])[:, 1]
|
||||
keep_inds = scores >= threshold
|
||||
predictions = predictions[keep_inds]
|
||||
scores = scores[keep_inds]
|
||||
|
||||
if predictions.shape[0] == 0:
|
||||
decoded.append([])
|
||||
continue
|
||||
nms_predictions = predictions.detach().clone()
|
||||
nms_predictions = paddle.concat(
|
||||
x=[nms_predictions[..., :4], nms_predictions[..., 5:]], axis=-1)
|
||||
|
||||
nms_predictions[..., 4] = nms_predictions[..., 4] * self.n_strips
|
||||
nms_predictions[..., 5:] = nms_predictions[..., 5:] * (
|
||||
self.img_w - 1)
|
||||
|
||||
keep = self.lane_nms(
|
||||
nms_predictions[..., 5:],
|
||||
scores,
|
||||
nms_overlap_thresh=self.nms_thres,
|
||||
top_k=self.max_lanes)
|
||||
|
||||
predictions = predictions.index_select(keep)
|
||||
|
||||
if predictions.shape[0] == 0:
|
||||
decoded.append([])
|
||||
continue
|
||||
predictions[:, 5] = paddle.round(predictions[:, 5] * self.n_strips)
|
||||
if as_lanes:
|
||||
pred = self.predictions_to_pred(predictions)
|
||||
else:
|
||||
pred = predictions
|
||||
decoded.append(pred)
|
||||
return decoded
|
||||
|
||||
def __call__(self, lanes_list):
|
||||
lanes = self.get_lanes(lanes_list)
|
||||
return lanes
|
||||
374
third-party/paddle-inference/det_keypoint_unite_infer.py
vendored
Normal file
374
third-party/paddle-inference/det_keypoint_unite_infer.py
vendored
Normal file
@@ -0,0 +1,374 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import os
|
||||
import json
|
||||
import cv2
|
||||
import math
|
||||
import numpy as np
|
||||
import paddle
|
||||
import yaml
|
||||
|
||||
from det_keypoint_unite_utils import argsparser
|
||||
from preprocess import decode_image
|
||||
from infer import Detector, DetectorPicoDet, PredictConfig, print_arguments, get_test_images, bench_log
|
||||
from keypoint_infer import KeyPointDetector, PredictConfig_KeyPoint
|
||||
from visualize import visualize_pose
|
||||
from benchmark_utils import PaddleInferBenchmark
|
||||
from utils import get_current_memory_mb
|
||||
from keypoint_postprocess import translate_to_ori_images
|
||||
|
||||
KEYPOINT_SUPPORT_MODELS = {
|
||||
'HigherHRNet': 'keypoint_bottomup',
|
||||
'HRNet': 'keypoint_topdown'
|
||||
}
|
||||
|
||||
|
||||
def predict_with_given_det(image, det_res, keypoint_detector,
|
||||
keypoint_batch_size, run_benchmark):
|
||||
keypoint_res = {}
|
||||
|
||||
rec_images, records, det_rects = keypoint_detector.get_person_from_rect(
|
||||
image, det_res)
|
||||
|
||||
if len(det_rects) == 0:
|
||||
keypoint_res['keypoint'] = [[], []]
|
||||
return keypoint_res
|
||||
|
||||
keypoint_vector = []
|
||||
score_vector = []
|
||||
|
||||
rect_vector = det_rects
|
||||
keypoint_results = keypoint_detector.predict_image(
|
||||
rec_images, run_benchmark, repeats=10, visual=False)
|
||||
keypoint_vector, score_vector = translate_to_ori_images(keypoint_results,
|
||||
np.array(records))
|
||||
keypoint_res['keypoint'] = [
|
||||
keypoint_vector.tolist(), score_vector.tolist()
|
||||
] if len(keypoint_vector) > 0 else [[], []]
|
||||
keypoint_res['bbox'] = rect_vector
|
||||
return keypoint_res
|
||||
|
||||
|
||||
def topdown_unite_predict(detector,
|
||||
topdown_keypoint_detector,
|
||||
image_list,
|
||||
keypoint_batch_size=1,
|
||||
save_res=False):
|
||||
det_timer = detector.get_timer()
|
||||
store_res = []
|
||||
for i, img_file in enumerate(image_list):
|
||||
# Decode image in advance in det + pose prediction
|
||||
det_timer.preprocess_time_s.start()
|
||||
image, _ = decode_image(img_file, {})
|
||||
det_timer.preprocess_time_s.end()
|
||||
|
||||
if FLAGS.run_benchmark:
|
||||
results = detector.predict_image(
|
||||
[image], run_benchmark=True, repeats=10)
|
||||
|
||||
cm, gm, gu = get_current_memory_mb()
|
||||
detector.cpu_mem += cm
|
||||
detector.gpu_mem += gm
|
||||
detector.gpu_util += gu
|
||||
else:
|
||||
results = detector.predict_image([image], visual=False)
|
||||
results = detector.filter_box(results, FLAGS.det_threshold)
|
||||
if results['boxes_num'] > 0:
|
||||
keypoint_res = predict_with_given_det(
|
||||
image, results, topdown_keypoint_detector, keypoint_batch_size,
|
||||
FLAGS.run_benchmark)
|
||||
|
||||
if save_res:
|
||||
save_name = img_file if isinstance(img_file, str) else i
|
||||
store_res.append([
|
||||
save_name, keypoint_res['bbox'],
|
||||
[keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]]
|
||||
])
|
||||
else:
|
||||
results["keypoint"] = [[], []]
|
||||
keypoint_res = results
|
||||
if FLAGS.run_benchmark:
|
||||
cm, gm, gu = get_current_memory_mb()
|
||||
topdown_keypoint_detector.cpu_mem += cm
|
||||
topdown_keypoint_detector.gpu_mem += gm
|
||||
topdown_keypoint_detector.gpu_util += gu
|
||||
else:
|
||||
if not os.path.exists(FLAGS.output_dir):
|
||||
os.makedirs(FLAGS.output_dir)
|
||||
visualize_pose(
|
||||
img_file,
|
||||
keypoint_res,
|
||||
visual_thresh=FLAGS.keypoint_threshold,
|
||||
save_dir=FLAGS.output_dir)
|
||||
if save_res:
|
||||
"""
|
||||
1) store_res: a list of image_data
|
||||
2) image_data: [imageid, rects, [keypoints, scores]]
|
||||
3) rects: list of rect [xmin, ymin, xmax, ymax]
|
||||
4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list
|
||||
5) scores: mean of all joint conf
|
||||
"""
|
||||
with open("det_keypoint_unite_image_results.json", 'w') as wf:
|
||||
json.dump(store_res, wf, indent=4)
|
||||
|
||||
|
||||
def topdown_unite_predict_video(detector,
|
||||
topdown_keypoint_detector,
|
||||
camera_id,
|
||||
keypoint_batch_size=1,
|
||||
save_res=False):
|
||||
video_name = 'output.mp4'
|
||||
if camera_id != -1:
|
||||
capture = cv2.VideoCapture(camera_id)
|
||||
else:
|
||||
capture = cv2.VideoCapture(FLAGS.video_file)
|
||||
video_name = os.path.split(FLAGS.video_file)[-1]
|
||||
# Get Video info : resolution, fps, frame count
|
||||
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
fps = int(capture.get(cv2.CAP_PROP_FPS))
|
||||
frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||
print("fps: %d, frame_count: %d" % (fps, frame_count))
|
||||
|
||||
if not os.path.exists(FLAGS.output_dir):
|
||||
os.makedirs(FLAGS.output_dir)
|
||||
out_path = os.path.join(FLAGS.output_dir, video_name)
|
||||
fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
|
||||
writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
|
||||
index = 0
|
||||
store_res = []
|
||||
keypoint_smoothing = KeypointSmoothing(
|
||||
width, height, filter_type=FLAGS.filter_type, beta=0.05)
|
||||
|
||||
while (1):
|
||||
ret, frame = capture.read()
|
||||
if not ret:
|
||||
break
|
||||
index += 1
|
||||
print('detect frame: %d' % (index))
|
||||
|
||||
frame2 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
||||
|
||||
results = detector.predict_image([frame2], visual=False)
|
||||
results = detector.filter_box(results, FLAGS.det_threshold)
|
||||
if results['boxes_num'] == 0:
|
||||
writer.write(frame)
|
||||
continue
|
||||
|
||||
keypoint_res = predict_with_given_det(
|
||||
frame2, results, topdown_keypoint_detector, keypoint_batch_size,
|
||||
FLAGS.run_benchmark)
|
||||
|
||||
if FLAGS.smooth and len(keypoint_res['keypoint'][0]) == 1:
|
||||
current_keypoints = np.array(keypoint_res['keypoint'][0][0])
|
||||
smooth_keypoints = keypoint_smoothing.smooth_process(
|
||||
current_keypoints)
|
||||
|
||||
keypoint_res['keypoint'][0][0] = smooth_keypoints.tolist()
|
||||
|
||||
im = visualize_pose(
|
||||
frame,
|
||||
keypoint_res,
|
||||
visual_thresh=FLAGS.keypoint_threshold,
|
||||
returnimg=True)
|
||||
|
||||
if save_res:
|
||||
store_res.append([
|
||||
index, keypoint_res['bbox'],
|
||||
[keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]]
|
||||
])
|
||||
|
||||
writer.write(im)
|
||||
if camera_id != -1:
|
||||
cv2.imshow('Mask Detection', im)
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
writer.release()
|
||||
print('output_video saved to: {}'.format(out_path))
|
||||
if save_res:
|
||||
"""
|
||||
1) store_res: a list of frame_data
|
||||
2) frame_data: [frameid, rects, [keypoints, scores]]
|
||||
3) rects: list of rect [xmin, ymin, xmax, ymax]
|
||||
4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list
|
||||
5) scores: mean of all joint conf
|
||||
"""
|
||||
with open("det_keypoint_unite_video_results.json", 'w') as wf:
|
||||
json.dump(store_res, wf, indent=4)
|
||||
|
||||
|
||||
class KeypointSmoothing(object):
|
||||
# The following code are modified from:
|
||||
# https://github.com/jaantollander/OneEuroFilter
|
||||
|
||||
def __init__(self,
|
||||
width,
|
||||
height,
|
||||
filter_type,
|
||||
alpha=0.5,
|
||||
fc_d=0.1,
|
||||
fc_min=0.1,
|
||||
beta=0.1,
|
||||
thres_mult=0.3):
|
||||
super(KeypointSmoothing, self).__init__()
|
||||
self.image_width = width
|
||||
self.image_height = height
|
||||
self.threshold = np.array([
|
||||
0.005, 0.005, 0.005, 0.005, 0.005, 0.01, 0.01, 0.01, 0.01, 0.01,
|
||||
0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01
|
||||
]) * thres_mult
|
||||
self.filter_type = filter_type
|
||||
self.alpha = alpha
|
||||
self.dx_prev_hat = None
|
||||
self.x_prev_hat = None
|
||||
self.fc_d = fc_d
|
||||
self.fc_min = fc_min
|
||||
self.beta = beta
|
||||
|
||||
if self.filter_type == 'OneEuro':
|
||||
self.smooth_func = self.one_euro_filter
|
||||
elif self.filter_type == 'EMA':
|
||||
self.smooth_func = self.ema_filter
|
||||
else:
|
||||
raise ValueError('filter type must be one_euro or ema')
|
||||
|
||||
def smooth_process(self, current_keypoints):
|
||||
if self.x_prev_hat is None:
|
||||
self.x_prev_hat = current_keypoints[:, :2]
|
||||
self.dx_prev_hat = np.zeros(current_keypoints[:, :2].shape)
|
||||
return current_keypoints
|
||||
else:
|
||||
result = current_keypoints
|
||||
num_keypoints = len(current_keypoints)
|
||||
for i in range(num_keypoints):
|
||||
result[i, :2] = self.smooth(current_keypoints[i, :2],
|
||||
self.threshold[i], i)
|
||||
return result
|
||||
|
||||
def smooth(self, current_keypoint, threshold, index):
|
||||
distance = np.sqrt(
|
||||
np.square((current_keypoint[0] - self.x_prev_hat[index][0]) /
|
||||
self.image_width) + np.square((current_keypoint[
|
||||
1] - self.x_prev_hat[index][1]) / self.image_height))
|
||||
if distance < threshold:
|
||||
result = self.x_prev_hat[index]
|
||||
else:
|
||||
result = self.smooth_func(current_keypoint, self.x_prev_hat[index],
|
||||
index)
|
||||
|
||||
return result
|
||||
|
||||
def one_euro_filter(self, x_cur, x_pre, index):
|
||||
te = 1
|
||||
self.alpha = self.smoothing_factor(te, self.fc_d)
|
||||
dx_cur = (x_cur - x_pre) / te
|
||||
dx_cur_hat = self.exponential_smoothing(dx_cur, self.dx_prev_hat[index])
|
||||
|
||||
fc = self.fc_min + self.beta * np.abs(dx_cur_hat)
|
||||
self.alpha = self.smoothing_factor(te, fc)
|
||||
x_cur_hat = self.exponential_smoothing(x_cur, x_pre)
|
||||
self.dx_prev_hat[index] = dx_cur_hat
|
||||
self.x_prev_hat[index] = x_cur_hat
|
||||
return x_cur_hat
|
||||
|
||||
def ema_filter(self, x_cur, x_pre, index):
|
||||
x_cur_hat = self.exponential_smoothing(x_cur, x_pre)
|
||||
self.x_prev_hat[index] = x_cur_hat
|
||||
return x_cur_hat
|
||||
|
||||
def smoothing_factor(self, te, fc):
|
||||
r = 2 * math.pi * fc * te
|
||||
return r / (r + 1)
|
||||
|
||||
def exponential_smoothing(self, x_cur, x_pre, index=0):
|
||||
return self.alpha * x_cur + (1 - self.alpha) * x_pre
|
||||
|
||||
|
||||
def main():
|
||||
deploy_file = os.path.join(FLAGS.det_model_dir, 'infer_cfg.yml')
|
||||
with open(deploy_file) as f:
|
||||
yml_conf = yaml.safe_load(f)
|
||||
arch = yml_conf['arch']
|
||||
detector_func = 'Detector'
|
||||
if arch == 'PicoDet':
|
||||
detector_func = 'DetectorPicoDet'
|
||||
|
||||
detector = eval(detector_func)(FLAGS.det_model_dir,
|
||||
device=FLAGS.device,
|
||||
run_mode=FLAGS.run_mode,
|
||||
trt_min_shape=FLAGS.trt_min_shape,
|
||||
trt_max_shape=FLAGS.trt_max_shape,
|
||||
trt_opt_shape=FLAGS.trt_opt_shape,
|
||||
trt_calib_mode=FLAGS.trt_calib_mode,
|
||||
cpu_threads=FLAGS.cpu_threads,
|
||||
enable_mkldnn=FLAGS.enable_mkldnn,
|
||||
threshold=FLAGS.det_threshold)
|
||||
|
||||
topdown_keypoint_detector = KeyPointDetector(
|
||||
FLAGS.keypoint_model_dir,
|
||||
device=FLAGS.device,
|
||||
run_mode=FLAGS.run_mode,
|
||||
batch_size=FLAGS.keypoint_batch_size,
|
||||
trt_min_shape=FLAGS.trt_min_shape,
|
||||
trt_max_shape=FLAGS.trt_max_shape,
|
||||
trt_opt_shape=FLAGS.trt_opt_shape,
|
||||
trt_calib_mode=FLAGS.trt_calib_mode,
|
||||
cpu_threads=FLAGS.cpu_threads,
|
||||
enable_mkldnn=FLAGS.enable_mkldnn,
|
||||
use_dark=FLAGS.use_dark)
|
||||
keypoint_arch = topdown_keypoint_detector.pred_config.arch
|
||||
assert KEYPOINT_SUPPORT_MODELS[
|
||||
keypoint_arch] == 'keypoint_topdown', 'Detection-Keypoint unite inference only supports topdown models.'
|
||||
|
||||
# predict from video file or camera video stream
|
||||
if FLAGS.video_file is not None or FLAGS.camera_id != -1:
|
||||
topdown_unite_predict_video(detector, topdown_keypoint_detector,
|
||||
FLAGS.camera_id, FLAGS.keypoint_batch_size,
|
||||
FLAGS.save_res)
|
||||
else:
|
||||
# predict from image
|
||||
img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
|
||||
topdown_unite_predict(detector, topdown_keypoint_detector, img_list,
|
||||
FLAGS.keypoint_batch_size, FLAGS.save_res)
|
||||
if not FLAGS.run_benchmark:
|
||||
detector.det_times.info(average=True)
|
||||
topdown_keypoint_detector.det_times.info(average=True)
|
||||
else:
|
||||
mode = FLAGS.run_mode
|
||||
det_model_dir = FLAGS.det_model_dir
|
||||
det_model_info = {
|
||||
'model_name': det_model_dir.strip('/').split('/')[-1],
|
||||
'precision': mode.split('_')[-1]
|
||||
}
|
||||
bench_log(detector, img_list, det_model_info, name='Det')
|
||||
keypoint_model_dir = FLAGS.keypoint_model_dir
|
||||
keypoint_model_info = {
|
||||
'model_name': keypoint_model_dir.strip('/').split('/')[-1],
|
||||
'precision': mode.split('_')[-1]
|
||||
}
|
||||
bench_log(topdown_keypoint_detector, img_list, keypoint_model_info,
|
||||
FLAGS.keypoint_batch_size, 'KeyPoint')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
paddle.enable_static()
|
||||
parser = argsparser()
|
||||
FLAGS = parser.parse_args()
|
||||
print_arguments(FLAGS)
|
||||
FLAGS.device = FLAGS.device.upper()
|
||||
assert FLAGS.device in ['CPU', 'GPU', 'XPU'
|
||||
], "device should be CPU, GPU or XPU"
|
||||
|
||||
main()
|
||||
141
third-party/paddle-inference/det_keypoint_unite_utils.py
vendored
Normal file
141
third-party/paddle-inference/det_keypoint_unite_utils.py
vendored
Normal file
@@ -0,0 +1,141 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import ast
|
||||
import argparse
|
||||
|
||||
|
||||
def argsparser():
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument(
|
||||
"--det_model_dir",
|
||||
type=str,
|
||||
default=None,
|
||||
help=("Directory include:'model.pdiparams', 'model.pdmodel', "
|
||||
"'infer_cfg.yml', created by tools/export_model.py."),
|
||||
required=True)
|
||||
parser.add_argument(
|
||||
"--keypoint_model_dir",
|
||||
type=str,
|
||||
default=None,
|
||||
help=("Directory include:'model.pdiparams', 'model.pdmodel', "
|
||||
"'infer_cfg.yml', created by tools/export_model.py."),
|
||||
required=True)
|
||||
parser.add_argument(
|
||||
"--image_file", type=str, default=None, help="Path of image file.")
|
||||
parser.add_argument(
|
||||
"--image_dir",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Dir of image file, `image_file` has a higher priority.")
|
||||
parser.add_argument(
|
||||
"--keypoint_batch_size",
|
||||
type=int,
|
||||
default=8,
|
||||
help=("batch_size for keypoint inference. In detection-keypoint unit"
|
||||
"inference, the batch size in detection is 1. Then collate det "
|
||||
"result in batch for keypoint inference."))
|
||||
parser.add_argument(
|
||||
"--video_file",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Path of video file, `video_file` or `camera_id` has a highest priority."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--camera_id",
|
||||
type=int,
|
||||
default=-1,
|
||||
help="device id of camera to predict.")
|
||||
parser.add_argument(
|
||||
"--det_threshold", type=float, default=0.5, help="Threshold of score.")
|
||||
parser.add_argument(
|
||||
"--keypoint_threshold",
|
||||
type=float,
|
||||
default=0.5,
|
||||
help="Threshold of score.")
|
||||
parser.add_argument(
|
||||
"--output_dir",
|
||||
type=str,
|
||||
default="output",
|
||||
help="Directory of output visualization files.")
|
||||
parser.add_argument(
|
||||
"--run_mode",
|
||||
type=str,
|
||||
default='paddle',
|
||||
help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
type=str,
|
||||
default='cpu',
|
||||
help="Choose the device you want to run, it can be: CPU/GPU/XPU, default is CPU."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--run_benchmark",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="Whether to predict a image_file repeatedly for benchmark")
|
||||
parser.add_argument(
|
||||
"--enable_mkldnn",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="Whether use mkldnn with CPU.")
|
||||
parser.add_argument(
|
||||
"--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
|
||||
parser.add_argument(
|
||||
"--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
|
||||
parser.add_argument(
|
||||
"--trt_max_shape",
|
||||
type=int,
|
||||
default=1280,
|
||||
help="max_shape for TensorRT.")
|
||||
parser.add_argument(
|
||||
"--trt_opt_shape",
|
||||
type=int,
|
||||
default=640,
|
||||
help="opt_shape for TensorRT.")
|
||||
parser.add_argument(
|
||||
"--trt_calib_mode",
|
||||
type=bool,
|
||||
default=False,
|
||||
help="If the model is produced by TRT offline quantitative "
|
||||
"calibration, trt_calib_mode need to set True.")
|
||||
parser.add_argument(
|
||||
'--use_dark',
|
||||
type=ast.literal_eval,
|
||||
default=True,
|
||||
help='whether to use darkpose to get better keypoint position predict ')
|
||||
parser.add_argument(
|
||||
'--save_res',
|
||||
type=bool,
|
||||
default=False,
|
||||
help=(
|
||||
"whether to save predict results to json file"
|
||||
"1) store_res: a list of image_data"
|
||||
"2) image_data: [imageid, rects, [keypoints, scores]]"
|
||||
"3) rects: list of rect [xmin, ymin, xmax, ymax]"
|
||||
"4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list"
|
||||
"5) scores: mean of all joint conf"))
|
||||
parser.add_argument(
|
||||
'--smooth',
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help='smoothing keypoints for each frame, new incoming keypoints will be more stable.'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--filter_type',
|
||||
type=str,
|
||||
default='OneEuro',
|
||||
help='when set --smooth True, choose filter type you want to use, it can be [OneEuro] or [EMA].'
|
||||
)
|
||||
return parser
|
||||
1278
third-party/paddle-inference/infer.py
vendored
Normal file
1278
third-party/paddle-inference/infer.py
vendored
Normal file
File diff suppressed because it is too large
Load Diff
433
third-party/paddle-inference/keypoint_infer.py
vendored
Normal file
433
third-party/paddle-inference/keypoint_infer.py
vendored
Normal file
@@ -0,0 +1,433 @@
|
||||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import os
|
||||
import time
|
||||
import yaml
|
||||
import glob
|
||||
from functools import reduce
|
||||
|
||||
from PIL import Image
|
||||
import cv2
|
||||
import math
|
||||
import numpy as np
|
||||
import paddle
|
||||
|
||||
import sys
|
||||
# add deploy path of PaddleDetection to sys.path
|
||||
parent_path = os.path.abspath(os.path.join(__file__, *(['..'])))
|
||||
sys.path.insert(0, parent_path)
|
||||
|
||||
from preprocess import preprocess, NormalizeImage, Permute
|
||||
from keypoint_preprocess import EvalAffine, TopDownEvalAffine, expand_crop
|
||||
from keypoint_postprocess import HrHRNetPostProcess, HRNetPostProcess
|
||||
from visualize import visualize_pose
|
||||
from paddle.inference import Config
|
||||
from paddle.inference import create_predictor
|
||||
from utils import argsparser, Timer, get_current_memory_mb
|
||||
from benchmark_utils import PaddleInferBenchmark
|
||||
from infer import Detector, get_test_images, print_arguments
|
||||
|
||||
# Global dictionary
|
||||
KEYPOINT_SUPPORT_MODELS = {
|
||||
'HigherHRNet': 'keypoint_bottomup',
|
||||
'HRNet': 'keypoint_topdown'
|
||||
}
|
||||
|
||||
|
||||
class KeyPointDetector(Detector):
|
||||
"""
|
||||
Args:
|
||||
model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
|
||||
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
|
||||
run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
|
||||
batch_size (int): size of pre batch in inference
|
||||
trt_min_shape (int): min shape for dynamic shape in trt
|
||||
trt_max_shape (int): max shape for dynamic shape in trt
|
||||
trt_opt_shape (int): opt shape for dynamic shape in trt
|
||||
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
|
||||
calibration, trt_calib_mode need to set True
|
||||
cpu_threads (int): cpu threads
|
||||
enable_mkldnn (bool): whether to open MKLDNN
|
||||
use_dark(bool): whether to use postprocess in DarkPose
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
model_dir,
|
||||
device='CPU',
|
||||
run_mode='paddle',
|
||||
batch_size=1,
|
||||
trt_min_shape=1,
|
||||
trt_max_shape=1280,
|
||||
trt_opt_shape=640,
|
||||
trt_calib_mode=False,
|
||||
cpu_threads=1,
|
||||
enable_mkldnn=False,
|
||||
output_dir='output',
|
||||
threshold=0.5,
|
||||
use_dark=True,
|
||||
use_fd_format=False):
|
||||
super(KeyPointDetector, self).__init__(
|
||||
model_dir=model_dir,
|
||||
device=device,
|
||||
run_mode=run_mode,
|
||||
batch_size=batch_size,
|
||||
trt_min_shape=trt_min_shape,
|
||||
trt_max_shape=trt_max_shape,
|
||||
trt_opt_shape=trt_opt_shape,
|
||||
trt_calib_mode=trt_calib_mode,
|
||||
cpu_threads=cpu_threads,
|
||||
enable_mkldnn=enable_mkldnn,
|
||||
output_dir=output_dir,
|
||||
threshold=threshold,
|
||||
use_fd_format=use_fd_format)
|
||||
self.use_dark = use_dark
|
||||
|
||||
def set_config(self, model_dir, use_fd_format):
|
||||
return PredictConfig_KeyPoint(model_dir, use_fd_format=use_fd_format)
|
||||
|
||||
def get_person_from_rect(self, image, results):
|
||||
# crop the person result from image
|
||||
self.det_times.preprocess_time_s.start()
|
||||
valid_rects = results['boxes']
|
||||
rect_images = []
|
||||
new_rects = []
|
||||
org_rects = []
|
||||
for rect in valid_rects:
|
||||
rect_image, new_rect, org_rect = expand_crop(image, rect)
|
||||
if rect_image is None or rect_image.size == 0:
|
||||
continue
|
||||
rect_images.append(rect_image)
|
||||
new_rects.append(new_rect)
|
||||
org_rects.append(org_rect)
|
||||
self.det_times.preprocess_time_s.end()
|
||||
return rect_images, new_rects, org_rects
|
||||
|
||||
def postprocess(self, inputs, result):
|
||||
np_heatmap = result['heatmap']
|
||||
np_masks = result['masks']
|
||||
# postprocess output of predictor
|
||||
if KEYPOINT_SUPPORT_MODELS[
|
||||
self.pred_config.arch] == 'keypoint_bottomup':
|
||||
results = {}
|
||||
h, w = inputs['im_shape'][0]
|
||||
preds = [np_heatmap]
|
||||
if np_masks is not None:
|
||||
preds += np_masks
|
||||
preds += [h, w]
|
||||
keypoint_postprocess = HrHRNetPostProcess()
|
||||
kpts, scores = keypoint_postprocess(*preds)
|
||||
results['keypoint'] = kpts
|
||||
results['score'] = scores
|
||||
return results
|
||||
elif KEYPOINT_SUPPORT_MODELS[
|
||||
self.pred_config.arch] == 'keypoint_topdown':
|
||||
results = {}
|
||||
imshape = inputs['im_shape'][:, ::-1]
|
||||
center = np.round(imshape / 2.)
|
||||
scale = imshape / 200.
|
||||
keypoint_postprocess = HRNetPostProcess(use_dark=self.use_dark)
|
||||
kpts, scores = keypoint_postprocess(np_heatmap, center, scale)
|
||||
results['keypoint'] = kpts
|
||||
results['score'] = scores
|
||||
return results
|
||||
else:
|
||||
raise ValueError("Unsupported arch: {}, expect {}".format(
|
||||
self.pred_config.arch, KEYPOINT_SUPPORT_MODELS))
|
||||
|
||||
def predict(self, repeats=1):
|
||||
'''
|
||||
Args:
|
||||
repeats (int): repeat number for prediction
|
||||
Returns:
|
||||
results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
|
||||
matix element:[class, score, x_min, y_min, x_max, y_max]
|
||||
MaskRCNN's results include 'masks': np.ndarray:
|
||||
shape: [N, im_h, im_w]
|
||||
'''
|
||||
# model prediction
|
||||
np_heatmap, np_masks = None, None
|
||||
for i in range(repeats):
|
||||
self.predictor.run()
|
||||
output_names = self.predictor.get_output_names()
|
||||
heatmap_tensor = self.predictor.get_output_handle(output_names[0])
|
||||
np_heatmap = heatmap_tensor.copy_to_cpu()
|
||||
if self.pred_config.tagmap:
|
||||
masks_tensor = self.predictor.get_output_handle(output_names[1])
|
||||
heat_k = self.predictor.get_output_handle(output_names[2])
|
||||
inds_k = self.predictor.get_output_handle(output_names[3])
|
||||
np_masks = [
|
||||
masks_tensor.copy_to_cpu(), heat_k.copy_to_cpu(),
|
||||
inds_k.copy_to_cpu()
|
||||
]
|
||||
result = dict(heatmap=np_heatmap, masks=np_masks)
|
||||
return result
|
||||
|
||||
def predict_image(self,
|
||||
image_list,
|
||||
run_benchmark=False,
|
||||
repeats=1,
|
||||
visual=True):
|
||||
results = []
|
||||
batch_loop_cnt = math.ceil(float(len(image_list)) / self.batch_size)
|
||||
for i in range(batch_loop_cnt):
|
||||
start_index = i * self.batch_size
|
||||
end_index = min((i + 1) * self.batch_size, len(image_list))
|
||||
batch_image_list = image_list[start_index:end_index]
|
||||
if run_benchmark:
|
||||
# preprocess
|
||||
inputs = self.preprocess(batch_image_list) # warmup
|
||||
self.det_times.preprocess_time_s.start()
|
||||
inputs = self.preprocess(batch_image_list)
|
||||
self.det_times.preprocess_time_s.end()
|
||||
|
||||
# model prediction
|
||||
result_warmup = self.predict(repeats=repeats) # warmup
|
||||
self.det_times.inference_time_s.start()
|
||||
result = self.predict(repeats=repeats)
|
||||
self.det_times.inference_time_s.end(repeats=repeats)
|
||||
|
||||
# postprocess
|
||||
result_warmup = self.postprocess(inputs, result) # warmup
|
||||
self.det_times.postprocess_time_s.start()
|
||||
result = self.postprocess(inputs, result)
|
||||
self.det_times.postprocess_time_s.end()
|
||||
self.det_times.img_num += len(batch_image_list)
|
||||
|
||||
cm, gm, gu = get_current_memory_mb()
|
||||
self.cpu_mem += cm
|
||||
self.gpu_mem += gm
|
||||
self.gpu_util += gu
|
||||
|
||||
else:
|
||||
# preprocess
|
||||
self.det_times.preprocess_time_s.start()
|
||||
inputs = self.preprocess(batch_image_list)
|
||||
self.det_times.preprocess_time_s.end()
|
||||
|
||||
# model prediction
|
||||
self.det_times.inference_time_s.start()
|
||||
result = self.predict()
|
||||
self.det_times.inference_time_s.end()
|
||||
|
||||
# postprocess
|
||||
self.det_times.postprocess_time_s.start()
|
||||
result = self.postprocess(inputs, result)
|
||||
self.det_times.postprocess_time_s.end()
|
||||
self.det_times.img_num += len(batch_image_list)
|
||||
|
||||
if visual:
|
||||
if not os.path.exists(self.output_dir):
|
||||
os.makedirs(self.output_dir)
|
||||
visualize(
|
||||
batch_image_list,
|
||||
result,
|
||||
visual_thresh=self.threshold,
|
||||
save_dir=self.output_dir)
|
||||
|
||||
results.append(result)
|
||||
if visual:
|
||||
print('Test iter {}'.format(i))
|
||||
results = self.merge_batch_result(results)
|
||||
return results
|
||||
|
||||
def predict_video(self, video_file, camera_id):
|
||||
video_name = 'output.mp4'
|
||||
if camera_id != -1:
|
||||
capture = cv2.VideoCapture(camera_id)
|
||||
else:
|
||||
capture = cv2.VideoCapture(video_file)
|
||||
video_name = os.path.split(video_file)[-1]
|
||||
# Get Video info : resolution, fps, frame count
|
||||
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
fps = int(capture.get(cv2.CAP_PROP_FPS))
|
||||
frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||
print("fps: %d, frame_count: %d" % (fps, frame_count))
|
||||
|
||||
if not os.path.exists(self.output_dir):
|
||||
os.makedirs(self.output_dir)
|
||||
out_path = os.path.join(self.output_dir, video_name)
|
||||
fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
|
||||
writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
|
||||
index = 1
|
||||
while (1):
|
||||
ret, frame = capture.read()
|
||||
if not ret:
|
||||
break
|
||||
print('detect frame: %d' % (index))
|
||||
index += 1
|
||||
results = self.predict_image([frame[:, :, ::-1]], visual=False)
|
||||
im_results = {}
|
||||
im_results['keypoint'] = [results['keypoint'], results['score']]
|
||||
im = visualize_pose(
|
||||
frame, im_results, visual_thresh=self.threshold, returnimg=True)
|
||||
writer.write(im)
|
||||
if camera_id != -1:
|
||||
cv2.imshow('Mask Detection', im)
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
writer.release()
|
||||
|
||||
|
||||
def create_inputs(imgs, im_info):
|
||||
"""generate input for different model type
|
||||
Args:
|
||||
imgs (list(numpy)): list of image (np.ndarray)
|
||||
im_info (list(dict)): list of image info
|
||||
Returns:
|
||||
inputs (dict): input of model
|
||||
"""
|
||||
inputs = {}
|
||||
inputs['image'] = np.stack(imgs, axis=0).astype('float32')
|
||||
im_shape = []
|
||||
for e in im_info:
|
||||
im_shape.append(np.array((e['im_shape'])).astype('float32'))
|
||||
inputs['im_shape'] = np.stack(im_shape, axis=0)
|
||||
return inputs
|
||||
|
||||
|
||||
class PredictConfig_KeyPoint():
|
||||
"""set config of preprocess, postprocess and visualize
|
||||
Args:
|
||||
model_dir (str): root path of model.yml
|
||||
"""
|
||||
|
||||
def __init__(self, model_dir, use_fd_format=False):
|
||||
# parsing Yaml config for Preprocess
|
||||
fd_deploy_file = os.path.join(model_dir, 'inference.yml')
|
||||
ppdet_deploy_file = os.path.join(model_dir, 'infer_cfg.yml')
|
||||
if use_fd_format:
|
||||
if not os.path.exists(fd_deploy_file) and os.path.exists(
|
||||
ppdet_deploy_file):
|
||||
raise RuntimeError(
|
||||
"Non-FD format model detected. Please set `use_fd_format` to False."
|
||||
)
|
||||
deploy_file = fd_deploy_file
|
||||
else:
|
||||
if not os.path.exists(ppdet_deploy_file) and os.path.exists(
|
||||
fd_deploy_file):
|
||||
raise RuntimeError(
|
||||
"FD format model detected. Please set `use_fd_format` to False."
|
||||
)
|
||||
deploy_file = ppdet_deploy_file
|
||||
with open(deploy_file) as f:
|
||||
yml_conf = yaml.safe_load(f)
|
||||
self.check_model(yml_conf)
|
||||
self.arch = yml_conf['arch']
|
||||
self.archcls = KEYPOINT_SUPPORT_MODELS[yml_conf['arch']]
|
||||
self.preprocess_infos = yml_conf['Preprocess']
|
||||
self.min_subgraph_size = yml_conf['min_subgraph_size']
|
||||
self.labels = yml_conf['label_list']
|
||||
self.tagmap = False
|
||||
self.use_dynamic_shape = yml_conf['use_dynamic_shape']
|
||||
if 'keypoint_bottomup' == self.archcls:
|
||||
self.tagmap = True
|
||||
self.print_config()
|
||||
|
||||
def check_model(self, yml_conf):
|
||||
"""
|
||||
Raises:
|
||||
ValueError: loaded model not in supported model type
|
||||
"""
|
||||
for support_model in KEYPOINT_SUPPORT_MODELS:
|
||||
if support_model in yml_conf['arch']:
|
||||
return True
|
||||
raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[
|
||||
'arch'], KEYPOINT_SUPPORT_MODELS))
|
||||
|
||||
def print_config(self):
|
||||
print('----------- Model Configuration -----------')
|
||||
print('%s: %s' % ('Model Arch', self.arch))
|
||||
print('%s: ' % ('Transform Order'))
|
||||
for op_info in self.preprocess_infos:
|
||||
print('--%s: %s' % ('transform op', op_info['type']))
|
||||
print('--------------------------------------------')
|
||||
|
||||
|
||||
def visualize(image_list, results, visual_thresh=0.6, save_dir='output'):
|
||||
im_results = {}
|
||||
for i, image_file in enumerate(image_list):
|
||||
skeletons = results['keypoint']
|
||||
scores = results['score']
|
||||
skeleton = skeletons[i:i + 1]
|
||||
score = scores[i:i + 1]
|
||||
im_results['keypoint'] = [skeleton, score]
|
||||
visualize_pose(
|
||||
image_file,
|
||||
im_results,
|
||||
visual_thresh=visual_thresh,
|
||||
save_dir=save_dir)
|
||||
|
||||
|
||||
def main():
|
||||
detector = KeyPointDetector(
|
||||
FLAGS.model_dir,
|
||||
device=FLAGS.device,
|
||||
run_mode=FLAGS.run_mode,
|
||||
batch_size=FLAGS.batch_size,
|
||||
trt_min_shape=FLAGS.trt_min_shape,
|
||||
trt_max_shape=FLAGS.trt_max_shape,
|
||||
trt_opt_shape=FLAGS.trt_opt_shape,
|
||||
trt_calib_mode=FLAGS.trt_calib_mode,
|
||||
cpu_threads=FLAGS.cpu_threads,
|
||||
enable_mkldnn=FLAGS.enable_mkldnn,
|
||||
threshold=FLAGS.threshold,
|
||||
output_dir=FLAGS.output_dir,
|
||||
use_dark=FLAGS.use_dark,
|
||||
use_fd_format=FLAGS.use_fd_format)
|
||||
|
||||
# predict from video file or camera video stream
|
||||
if FLAGS.video_file is not None or FLAGS.camera_id != -1:
|
||||
detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
|
||||
else:
|
||||
# predict from image
|
||||
img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
|
||||
detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10)
|
||||
if not FLAGS.run_benchmark:
|
||||
detector.det_times.info(average=True)
|
||||
else:
|
||||
mems = {
|
||||
'cpu_rss_mb': detector.cpu_mem / len(img_list),
|
||||
'gpu_rss_mb': detector.gpu_mem / len(img_list),
|
||||
'gpu_util': detector.gpu_util * 100 / len(img_list)
|
||||
}
|
||||
perf_info = detector.det_times.report(average=True)
|
||||
model_dir = FLAGS.model_dir
|
||||
mode = FLAGS.run_mode
|
||||
model_info = {
|
||||
'model_name': model_dir.strip('/').split('/')[-1],
|
||||
'precision': mode.split('_')[-1]
|
||||
}
|
||||
data_info = {
|
||||
'batch_size': 1,
|
||||
'shape': "dynamic_shape",
|
||||
'data_num': perf_info['img_num']
|
||||
}
|
||||
det_log = PaddleInferBenchmark(detector.config, model_info,
|
||||
data_info, perf_info, mems)
|
||||
det_log('KeyPoint')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
paddle.enable_static()
|
||||
parser = argsparser()
|
||||
FLAGS = parser.parse_args()
|
||||
print_arguments(FLAGS)
|
||||
FLAGS.device = FLAGS.device.upper()
|
||||
assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
|
||||
], "device should be CPU, GPU, XPU or NPU"
|
||||
assert not FLAGS.use_gpu, "use_gpu has been deprecated, please use --device"
|
||||
|
||||
main()
|
||||
369
third-party/paddle-inference/keypoint_postprocess.py
vendored
Normal file
369
third-party/paddle-inference/keypoint_postprocess.py
vendored
Normal file
@@ -0,0 +1,369 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from scipy.optimize import linear_sum_assignment
|
||||
from collections import abc, defaultdict
|
||||
import cv2
|
||||
import numpy as np
|
||||
import math
|
||||
import paddle
|
||||
import paddle.nn as nn
|
||||
from keypoint_preprocess import get_affine_mat_kernel, get_affine_transform
|
||||
|
||||
|
||||
class HrHRNetPostProcess(object):
|
||||
"""
|
||||
HrHRNet postprocess contain:
|
||||
1) get topk keypoints in the output heatmap
|
||||
2) sample the tagmap's value corresponding to each of the topk coordinate
|
||||
3) match different joints to combine to some people with Hungary algorithm
|
||||
4) adjust the coordinate by +-0.25 to decrease error std
|
||||
5) salvage missing joints by check positivity of heatmap - tagdiff_norm
|
||||
Args:
|
||||
max_num_people (int): max number of people support in postprocess
|
||||
heat_thresh (float): value of topk below this threshhold will be ignored
|
||||
tag_thresh (float): coord's value sampled in tagmap below this threshold belong to same people for init
|
||||
|
||||
inputs(list[heatmap]): the output list of model, [heatmap, heatmap_maxpool, tagmap], heatmap_maxpool used to get topk
|
||||
original_height, original_width (float): the original image size
|
||||
"""
|
||||
|
||||
def __init__(self, max_num_people=30, heat_thresh=0.2, tag_thresh=1.):
|
||||
self.max_num_people = max_num_people
|
||||
self.heat_thresh = heat_thresh
|
||||
self.tag_thresh = tag_thresh
|
||||
|
||||
def lerp(self, j, y, x, heatmap):
|
||||
H, W = heatmap.shape[-2:]
|
||||
left = np.clip(x - 1, 0, W - 1)
|
||||
right = np.clip(x + 1, 0, W - 1)
|
||||
up = np.clip(y - 1, 0, H - 1)
|
||||
down = np.clip(y + 1, 0, H - 1)
|
||||
offset_y = np.where(heatmap[j, down, x] > heatmap[j, up, x], 0.25,
|
||||
-0.25)
|
||||
offset_x = np.where(heatmap[j, y, right] > heatmap[j, y, left], 0.25,
|
||||
-0.25)
|
||||
return offset_y + 0.5, offset_x + 0.5
|
||||
|
||||
def __call__(self, heatmap, tagmap, heat_k, inds_k, original_height,
|
||||
original_width):
|
||||
|
||||
N, J, H, W = heatmap.shape
|
||||
assert N == 1, "only support batch size 1"
|
||||
heatmap = heatmap[0]
|
||||
tagmap = tagmap[0]
|
||||
heats = heat_k[0]
|
||||
inds_np = inds_k[0]
|
||||
y = inds_np // W
|
||||
x = inds_np % W
|
||||
tags = tagmap[np.arange(J)[None, :].repeat(self.max_num_people),
|
||||
y.flatten(), x.flatten()].reshape(J, -1, tagmap.shape[-1])
|
||||
coords = np.stack((y, x), axis=2)
|
||||
# threshold
|
||||
mask = heats > self.heat_thresh
|
||||
# cluster
|
||||
cluster = defaultdict(lambda: {
|
||||
'coords': np.zeros((J, 2), dtype=np.float32),
|
||||
'scores': np.zeros(J, dtype=np.float32),
|
||||
'tags': []
|
||||
})
|
||||
for jid, m in enumerate(mask):
|
||||
num_valid = m.sum()
|
||||
if num_valid == 0:
|
||||
continue
|
||||
valid_inds = np.where(m)[0]
|
||||
valid_tags = tags[jid, m, :]
|
||||
if len(cluster) == 0: # initialize
|
||||
for i in valid_inds:
|
||||
tag = tags[jid, i]
|
||||
key = tag[0]
|
||||
cluster[key]['tags'].append(tag)
|
||||
cluster[key]['scores'][jid] = heats[jid, i]
|
||||
cluster[key]['coords'][jid] = coords[jid, i]
|
||||
continue
|
||||
candidates = list(cluster.keys())[:self.max_num_people]
|
||||
centroids = [
|
||||
np.mean(
|
||||
cluster[k]['tags'], axis=0) for k in candidates
|
||||
]
|
||||
num_clusters = len(centroids)
|
||||
# shape is (num_valid, num_clusters, tag_dim)
|
||||
dist = valid_tags[:, None, :] - np.array(centroids)[None, ...]
|
||||
l2_dist = np.linalg.norm(dist, ord=2, axis=2)
|
||||
# modulate dist with heat value, see `use_detection_val`
|
||||
cost = np.round(l2_dist) * 100 - heats[jid, m, None]
|
||||
# pad the cost matrix, otherwise new pose are ignored
|
||||
if num_valid > num_clusters:
|
||||
cost = np.pad(cost, ((0, 0), (0, num_valid - num_clusters)),
|
||||
'constant',
|
||||
constant_values=((0, 0), (0, 1e-10)))
|
||||
rows, cols = linear_sum_assignment(cost)
|
||||
for y, x in zip(rows, cols):
|
||||
tag = tags[jid, y]
|
||||
if y < num_valid and x < num_clusters and \
|
||||
l2_dist[y, x] < self.tag_thresh:
|
||||
key = candidates[x] # merge to cluster
|
||||
else:
|
||||
key = tag[0] # initialize new cluster
|
||||
cluster[key]['tags'].append(tag)
|
||||
cluster[key]['scores'][jid] = heats[jid, y]
|
||||
cluster[key]['coords'][jid] = coords[jid, y]
|
||||
|
||||
# shape is [k, J, 2] and [k, J]
|
||||
pose_tags = np.array([cluster[k]['tags'] for k in cluster])
|
||||
pose_coords = np.array([cluster[k]['coords'] for k in cluster])
|
||||
pose_scores = np.array([cluster[k]['scores'] for k in cluster])
|
||||
valid = pose_scores > 0
|
||||
|
||||
pose_kpts = np.zeros((pose_scores.shape[0], J, 3), dtype=np.float32)
|
||||
if valid.sum() == 0:
|
||||
return pose_kpts, pose_kpts
|
||||
|
||||
# refine coords
|
||||
valid_coords = pose_coords[valid].astype(np.int32)
|
||||
y = valid_coords[..., 0].flatten()
|
||||
x = valid_coords[..., 1].flatten()
|
||||
_, j = np.nonzero(valid)
|
||||
offsets = self.lerp(j, y, x, heatmap)
|
||||
pose_coords[valid, 0] += offsets[0]
|
||||
pose_coords[valid, 1] += offsets[1]
|
||||
|
||||
# mean score before salvage
|
||||
mean_score = pose_scores.mean(axis=1)
|
||||
pose_kpts[valid, 2] = pose_scores[valid]
|
||||
|
||||
# salvage missing joints
|
||||
if True:
|
||||
for pid, coords in enumerate(pose_coords):
|
||||
tag_mean = np.array(pose_tags[pid]).mean(axis=0)
|
||||
norm = np.sum((tagmap - tag_mean)**2, axis=3)**0.5
|
||||
score = heatmap - np.round(norm) # (J, H, W)
|
||||
flat_score = score.reshape(J, -1)
|
||||
max_inds = np.argmax(flat_score, axis=1)
|
||||
max_scores = np.max(flat_score, axis=1)
|
||||
salvage_joints = (pose_scores[pid] == 0) & (max_scores > 0)
|
||||
if salvage_joints.sum() == 0:
|
||||
continue
|
||||
y = max_inds[salvage_joints] // W
|
||||
x = max_inds[salvage_joints] % W
|
||||
offsets = self.lerp(salvage_joints.nonzero()[0], y, x, heatmap)
|
||||
y = y.astype(np.float32) + offsets[0]
|
||||
x = x.astype(np.float32) + offsets[1]
|
||||
pose_coords[pid][salvage_joints, 0] = y
|
||||
pose_coords[pid][salvage_joints, 1] = x
|
||||
pose_kpts[pid][salvage_joints, 2] = max_scores[salvage_joints]
|
||||
pose_kpts[..., :2] = transpred(pose_coords[..., :2][..., ::-1],
|
||||
original_height, original_width,
|
||||
min(H, W))
|
||||
return pose_kpts, mean_score
|
||||
|
||||
|
||||
def transpred(kpts, h, w, s):
|
||||
trans, _ = get_affine_mat_kernel(h, w, s, inv=True)
|
||||
|
||||
return warp_affine_joints(kpts[..., :2].copy(), trans)
|
||||
|
||||
|
||||
def warp_affine_joints(joints, mat):
|
||||
"""Apply affine transformation defined by the transform matrix on the
|
||||
joints.
|
||||
|
||||
Args:
|
||||
joints (np.ndarray[..., 2]): Origin coordinate of joints.
|
||||
mat (np.ndarray[3, 2]): The affine matrix.
|
||||
|
||||
Returns:
|
||||
matrix (np.ndarray[..., 2]): Result coordinate of joints.
|
||||
"""
|
||||
joints = np.array(joints)
|
||||
shape = joints.shape
|
||||
joints = joints.reshape(-1, 2)
|
||||
return np.dot(np.concatenate(
|
||||
(joints, joints[:, 0:1] * 0 + 1), axis=1),
|
||||
mat.T).reshape(shape)
|
||||
|
||||
|
||||
class HRNetPostProcess(object):
|
||||
def __init__(self, use_dark=True):
|
||||
self.use_dark = use_dark
|
||||
|
||||
def flip_back(self, output_flipped, matched_parts):
|
||||
assert output_flipped.ndim == 4,\
|
||||
'output_flipped should be [batch_size, num_joints, height, width]'
|
||||
|
||||
output_flipped = output_flipped[:, :, :, ::-1]
|
||||
|
||||
for pair in matched_parts:
|
||||
tmp = output_flipped[:, pair[0], :, :].copy()
|
||||
output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
|
||||
output_flipped[:, pair[1], :, :] = tmp
|
||||
|
||||
return output_flipped
|
||||
|
||||
def get_max_preds(self, heatmaps):
|
||||
"""get predictions from score maps
|
||||
|
||||
Args:
|
||||
heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
|
||||
|
||||
Returns:
|
||||
preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
|
||||
maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints
|
||||
"""
|
||||
assert isinstance(heatmaps,
|
||||
np.ndarray), 'heatmaps should be numpy.ndarray'
|
||||
assert heatmaps.ndim == 4, 'batch_images should be 4-ndim'
|
||||
|
||||
batch_size = heatmaps.shape[0]
|
||||
num_joints = heatmaps.shape[1]
|
||||
width = heatmaps.shape[3]
|
||||
heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1))
|
||||
idx = np.argmax(heatmaps_reshaped, 2)
|
||||
maxvals = np.amax(heatmaps_reshaped, 2)
|
||||
|
||||
maxvals = maxvals.reshape((batch_size, num_joints, 1))
|
||||
idx = idx.reshape((batch_size, num_joints, 1))
|
||||
|
||||
preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
|
||||
|
||||
preds[:, :, 0] = (preds[:, :, 0]) % width
|
||||
preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
|
||||
|
||||
pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
|
||||
pred_mask = pred_mask.astype(np.float32)
|
||||
|
||||
preds *= pred_mask
|
||||
|
||||
return preds, maxvals
|
||||
|
||||
def gaussian_blur(self, heatmap, kernel):
|
||||
border = (kernel - 1) // 2
|
||||
batch_size = heatmap.shape[0]
|
||||
num_joints = heatmap.shape[1]
|
||||
height = heatmap.shape[2]
|
||||
width = heatmap.shape[3]
|
||||
for i in range(batch_size):
|
||||
for j in range(num_joints):
|
||||
origin_max = np.max(heatmap[i, j])
|
||||
dr = np.zeros((height + 2 * border, width + 2 * border))
|
||||
dr[border:-border, border:-border] = heatmap[i, j].copy()
|
||||
dr = cv2.GaussianBlur(dr, (kernel, kernel), 0)
|
||||
heatmap[i, j] = dr[border:-border, border:-border].copy()
|
||||
heatmap[i, j] *= origin_max / np.max(heatmap[i, j])
|
||||
return heatmap
|
||||
|
||||
def dark_parse(self, hm, coord):
|
||||
heatmap_height = hm.shape[0]
|
||||
heatmap_width = hm.shape[1]
|
||||
px = int(coord[0])
|
||||
py = int(coord[1])
|
||||
if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2:
|
||||
dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1])
|
||||
dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px])
|
||||
dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2])
|
||||
dxy = 0.25 * (hm[py+1][px+1] - hm[py-1][px+1] - hm[py+1][px-1] \
|
||||
+ hm[py-1][px-1])
|
||||
dyy = 0.25 * (
|
||||
hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px])
|
||||
derivative = np.matrix([[dx], [dy]])
|
||||
hessian = np.matrix([[dxx, dxy], [dxy, dyy]])
|
||||
if dxx * dyy - dxy**2 != 0:
|
||||
hessianinv = hessian.I
|
||||
offset = -hessianinv * derivative
|
||||
offset = np.squeeze(np.array(offset.T), axis=0)
|
||||
coord += offset
|
||||
return coord
|
||||
|
||||
def dark_postprocess(self, hm, coords, kernelsize):
|
||||
"""
|
||||
refer to https://github.com/ilovepose/DarkPose/lib/core/inference.py
|
||||
|
||||
"""
|
||||
hm = self.gaussian_blur(hm, kernelsize)
|
||||
hm = np.maximum(hm, 1e-10)
|
||||
hm = np.log(hm)
|
||||
for n in range(coords.shape[0]):
|
||||
for p in range(coords.shape[1]):
|
||||
coords[n, p] = self.dark_parse(hm[n][p], coords[n][p])
|
||||
return coords
|
||||
|
||||
def get_final_preds(self, heatmaps, center, scale, kernelsize=3):
|
||||
"""the highest heatvalue location with a quarter offset in the
|
||||
direction from the highest response to the second highest response.
|
||||
|
||||
Args:
|
||||
heatmaps (numpy.ndarray): The predicted heatmaps
|
||||
center (numpy.ndarray): The boxes center
|
||||
scale (numpy.ndarray): The scale factor
|
||||
|
||||
Returns:
|
||||
preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
|
||||
maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints
|
||||
"""
|
||||
|
||||
coords, maxvals = self.get_max_preds(heatmaps)
|
||||
|
||||
heatmap_height = heatmaps.shape[2]
|
||||
heatmap_width = heatmaps.shape[3]
|
||||
|
||||
if self.use_dark:
|
||||
coords = self.dark_postprocess(heatmaps, coords, kernelsize)
|
||||
else:
|
||||
for n in range(coords.shape[0]):
|
||||
for p in range(coords.shape[1]):
|
||||
hm = heatmaps[n][p]
|
||||
px = int(math.floor(coords[n][p][0] + 0.5))
|
||||
py = int(math.floor(coords[n][p][1] + 0.5))
|
||||
if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1:
|
||||
diff = np.array([
|
||||
hm[py][px + 1] - hm[py][px - 1],
|
||||
hm[py + 1][px] - hm[py - 1][px]
|
||||
])
|
||||
coords[n][p] += np.sign(diff) * .25
|
||||
preds = coords.copy()
|
||||
|
||||
# Transform back
|
||||
for i in range(coords.shape[0]):
|
||||
preds[i] = transform_preds(coords[i], center[i], scale[i],
|
||||
[heatmap_width, heatmap_height])
|
||||
|
||||
return preds, maxvals
|
||||
|
||||
def __call__(self, output, center, scale):
|
||||
preds, maxvals = self.get_final_preds(output, center, scale)
|
||||
return np.concatenate(
|
||||
(preds, maxvals), axis=-1), np.mean(
|
||||
maxvals, axis=1)
|
||||
|
||||
|
||||
def transform_preds(coords, center, scale, output_size):
|
||||
target_coords = np.zeros(coords.shape)
|
||||
trans = get_affine_transform(center, scale * 200, 0, output_size, inv=1)
|
||||
for p in range(coords.shape[0]):
|
||||
target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
|
||||
return target_coords
|
||||
|
||||
|
||||
def affine_transform(pt, t):
|
||||
new_pt = np.array([pt[0], pt[1], 1.]).T
|
||||
new_pt = np.dot(t, new_pt)
|
||||
return new_pt[:2]
|
||||
|
||||
|
||||
def translate_to_ori_images(keypoint_result, batch_records):
|
||||
kpts = keypoint_result['keypoint']
|
||||
scores = keypoint_result['score']
|
||||
kpts[..., 0] += batch_records[:, 0:1]
|
||||
kpts[..., 1] += batch_records[:, 1:2]
|
||||
return kpts, scores
|
||||
243
third-party/paddle-inference/keypoint_preprocess.py
vendored
Normal file
243
third-party/paddle-inference/keypoint_preprocess.py
vendored
Normal file
@@ -0,0 +1,243 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""
|
||||
this code is based on https://github.com/open-mmlab/mmpose/mmpose/core/post_processing/post_transforms.py
|
||||
"""
|
||||
import cv2
|
||||
import numpy as np
|
||||
|
||||
|
||||
class EvalAffine(object):
|
||||
def __init__(self, size, stride=64):
|
||||
super(EvalAffine, self).__init__()
|
||||
self.size = size
|
||||
self.stride = stride
|
||||
|
||||
def __call__(self, image, im_info):
|
||||
s = self.size
|
||||
h, w, _ = image.shape
|
||||
trans, size_resized = get_affine_mat_kernel(h, w, s, inv=False)
|
||||
image_resized = cv2.warpAffine(image, trans, size_resized)
|
||||
return image_resized, im_info
|
||||
|
||||
|
||||
def get_affine_mat_kernel(h, w, s, inv=False):
|
||||
if w < h:
|
||||
w_ = s
|
||||
h_ = int(np.ceil((s / w * h) / 64.) * 64)
|
||||
scale_w = w
|
||||
scale_h = h_ / w_ * w
|
||||
|
||||
else:
|
||||
h_ = s
|
||||
w_ = int(np.ceil((s / h * w) / 64.) * 64)
|
||||
scale_h = h
|
||||
scale_w = w_ / h_ * h
|
||||
|
||||
center = np.array([np.round(w / 2.), np.round(h / 2.)])
|
||||
|
||||
size_resized = (w_, h_)
|
||||
trans = get_affine_transform(
|
||||
center, np.array([scale_w, scale_h]), 0, size_resized, inv=inv)
|
||||
|
||||
return trans, size_resized
|
||||
|
||||
|
||||
def get_affine_transform(center,
|
||||
input_size,
|
||||
rot,
|
||||
output_size,
|
||||
shift=(0., 0.),
|
||||
inv=False):
|
||||
"""Get the affine transform matrix, given the center/scale/rot/output_size.
|
||||
|
||||
Args:
|
||||
center (np.ndarray[2, ]): Center of the bounding box (x, y).
|
||||
scale (np.ndarray[2, ]): Scale of the bounding box
|
||||
wrt [width, height].
|
||||
rot (float): Rotation angle (degree).
|
||||
output_size (np.ndarray[2, ]): Size of the destination heatmaps.
|
||||
shift (0-100%): Shift translation ratio wrt the width/height.
|
||||
Default (0., 0.).
|
||||
inv (bool): Option to inverse the affine transform direction.
|
||||
(inv=False: src->dst or inv=True: dst->src)
|
||||
|
||||
Returns:
|
||||
np.ndarray: The transform matrix.
|
||||
"""
|
||||
assert len(center) == 2
|
||||
assert len(output_size) == 2
|
||||
assert len(shift) == 2
|
||||
if not isinstance(input_size, (np.ndarray, list)):
|
||||
input_size = np.array([input_size, input_size], dtype=np.float32)
|
||||
scale_tmp = input_size
|
||||
|
||||
shift = np.array(shift)
|
||||
src_w = scale_tmp[0]
|
||||
dst_w = output_size[0]
|
||||
dst_h = output_size[1]
|
||||
|
||||
rot_rad = np.pi * rot / 180
|
||||
src_dir = rotate_point([0., src_w * -0.5], rot_rad)
|
||||
dst_dir = np.array([0., dst_w * -0.5])
|
||||
|
||||
src = np.zeros((3, 2), dtype=np.float32)
|
||||
src[0, :] = center + scale_tmp * shift
|
||||
src[1, :] = center + src_dir + scale_tmp * shift
|
||||
src[2, :] = _get_3rd_point(src[0, :], src[1, :])
|
||||
|
||||
dst = np.zeros((3, 2), dtype=np.float32)
|
||||
dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
|
||||
dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
|
||||
dst[2, :] = _get_3rd_point(dst[0, :], dst[1, :])
|
||||
|
||||
if inv:
|
||||
trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
|
||||
else:
|
||||
trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
|
||||
|
||||
return trans
|
||||
|
||||
|
||||
def get_warp_matrix(theta, size_input, size_dst, size_target):
|
||||
"""This code is based on
|
||||
https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py
|
||||
|
||||
Calculate the transformation matrix under the constraint of unbiased.
|
||||
Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
|
||||
Data Processing for Human Pose Estimation (CVPR 2020).
|
||||
|
||||
Args:
|
||||
theta (float): Rotation angle in degrees.
|
||||
size_input (np.ndarray): Size of input image [w, h].
|
||||
size_dst (np.ndarray): Size of output image [w, h].
|
||||
size_target (np.ndarray): Size of ROI in input plane [w, h].
|
||||
|
||||
Returns:
|
||||
matrix (np.ndarray): A matrix for transformation.
|
||||
"""
|
||||
theta = np.deg2rad(theta)
|
||||
matrix = np.zeros((2, 3), dtype=np.float32)
|
||||
scale_x = size_dst[0] / size_target[0]
|
||||
scale_y = size_dst[1] / size_target[1]
|
||||
matrix[0, 0] = np.cos(theta) * scale_x
|
||||
matrix[0, 1] = -np.sin(theta) * scale_x
|
||||
matrix[0, 2] = scale_x * (
|
||||
-0.5 * size_input[0] * np.cos(theta) + 0.5 * size_input[1] *
|
||||
np.sin(theta) + 0.5 * size_target[0])
|
||||
matrix[1, 0] = np.sin(theta) * scale_y
|
||||
matrix[1, 1] = np.cos(theta) * scale_y
|
||||
matrix[1, 2] = scale_y * (
|
||||
-0.5 * size_input[0] * np.sin(theta) - 0.5 * size_input[1] *
|
||||
np.cos(theta) + 0.5 * size_target[1])
|
||||
return matrix
|
||||
|
||||
|
||||
def rotate_point(pt, angle_rad):
|
||||
"""Rotate a point by an angle.
|
||||
|
||||
Args:
|
||||
pt (list[float]): 2 dimensional point to be rotated
|
||||
angle_rad (float): rotation angle by radian
|
||||
|
||||
Returns:
|
||||
list[float]: Rotated point.
|
||||
"""
|
||||
assert len(pt) == 2
|
||||
sn, cs = np.sin(angle_rad), np.cos(angle_rad)
|
||||
new_x = pt[0] * cs - pt[1] * sn
|
||||
new_y = pt[0] * sn + pt[1] * cs
|
||||
rotated_pt = [new_x, new_y]
|
||||
|
||||
return rotated_pt
|
||||
|
||||
|
||||
def _get_3rd_point(a, b):
|
||||
"""To calculate the affine matrix, three pairs of points are required. This
|
||||
function is used to get the 3rd point, given 2D points a & b.
|
||||
|
||||
The 3rd point is defined by rotating vector `a - b` by 90 degrees
|
||||
anticlockwise, using b as the rotation center.
|
||||
|
||||
Args:
|
||||
a (np.ndarray): point(x,y)
|
||||
b (np.ndarray): point(x,y)
|
||||
|
||||
Returns:
|
||||
np.ndarray: The 3rd point.
|
||||
"""
|
||||
assert len(a) == 2
|
||||
assert len(b) == 2
|
||||
direction = a - b
|
||||
third_pt = b + np.array([-direction[1], direction[0]], dtype=np.float32)
|
||||
|
||||
return third_pt
|
||||
|
||||
|
||||
class TopDownEvalAffine(object):
|
||||
"""apply affine transform to image and coords
|
||||
|
||||
Args:
|
||||
trainsize (list): [w, h], the standard size used to train
|
||||
use_udp (bool): whether to use Unbiased Data Processing.
|
||||
records(dict): the dict contained the image and coords
|
||||
|
||||
Returns:
|
||||
records (dict): contain the image and coords after tranformed
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, trainsize, use_udp=False):
|
||||
self.trainsize = trainsize
|
||||
self.use_udp = use_udp
|
||||
|
||||
def __call__(self, image, im_info):
|
||||
rot = 0
|
||||
imshape = im_info['im_shape'][::-1]
|
||||
center = im_info['center'] if 'center' in im_info else imshape / 2.
|
||||
scale = im_info['scale'] if 'scale' in im_info else imshape
|
||||
if self.use_udp:
|
||||
trans = get_warp_matrix(
|
||||
rot, center * 2.0,
|
||||
[self.trainsize[0] - 1.0, self.trainsize[1] - 1.0], scale)
|
||||
image = cv2.warpAffine(
|
||||
image,
|
||||
trans, (int(self.trainsize[0]), int(self.trainsize[1])),
|
||||
flags=cv2.INTER_LINEAR)
|
||||
else:
|
||||
trans = get_affine_transform(center, scale, rot, self.trainsize)
|
||||
image = cv2.warpAffine(
|
||||
image,
|
||||
trans, (int(self.trainsize[0]), int(self.trainsize[1])),
|
||||
flags=cv2.INTER_LINEAR)
|
||||
|
||||
return image, im_info
|
||||
|
||||
|
||||
def expand_crop(images, rect, expand_ratio=0.3):
|
||||
imgh, imgw, c = images.shape
|
||||
label, conf, xmin, ymin, xmax, ymax = [int(x) for x in rect.tolist()]
|
||||
if label != 0:
|
||||
return None, None, None
|
||||
org_rect = [xmin, ymin, xmax, ymax]
|
||||
h_half = (ymax - ymin) * (1 + expand_ratio) / 2.
|
||||
w_half = (xmax - xmin) * (1 + expand_ratio) / 2.
|
||||
if h_half > w_half * 4 / 3:
|
||||
w_half = h_half * 0.75
|
||||
center = [(ymin + ymax) / 2., (xmin + xmax) / 2.]
|
||||
ymin = max(0, int(center[0] - h_half))
|
||||
ymax = min(imgh - 1, int(center[0] + h_half))
|
||||
xmin = max(0, int(center[1] - w_half))
|
||||
xmax = min(imgw - 1, int(center[1] + w_half))
|
||||
return images[ymin:ymax, xmin:xmax, :], [xmin, ymin, xmax, ymax], org_rect
|
||||
501
third-party/paddle-inference/mot_centertrack_infer.py
vendored
Normal file
501
third-party/paddle-inference/mot_centertrack_infer.py
vendored
Normal file
@@ -0,0 +1,501 @@
|
||||
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import os
|
||||
import copy
|
||||
import math
|
||||
import time
|
||||
import yaml
|
||||
import cv2
|
||||
import numpy as np
|
||||
from collections import defaultdict
|
||||
import paddle
|
||||
|
||||
from benchmark_utils import PaddleInferBenchmark
|
||||
from utils import gaussian_radius, gaussian2D, draw_umich_gaussian
|
||||
from preprocess import preprocess, decode_image, WarpAffine, NormalizeImage, Permute
|
||||
from utils import argsparser, Timer, get_current_memory_mb
|
||||
from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig
|
||||
from keypoint_preprocess import get_affine_transform
|
||||
|
||||
# add python path
|
||||
import sys
|
||||
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
|
||||
sys.path.insert(0, parent_path)
|
||||
|
||||
from pptracking.python.mot import CenterTracker
|
||||
from pptracking.python.mot.utils import MOTTimer, write_mot_results
|
||||
from pptracking.python.mot.visualize import plot_tracking
|
||||
|
||||
|
||||
def transform_preds_with_trans(coords, trans):
|
||||
target_coords = np.ones((coords.shape[0], 3), np.float32)
|
||||
target_coords[:, :2] = coords
|
||||
target_coords = np.dot(trans, target_coords.transpose()).transpose()
|
||||
return target_coords[:, :2]
|
||||
|
||||
|
||||
def affine_transform(pt, t):
|
||||
new_pt = np.array([pt[0], pt[1], 1.]).T
|
||||
new_pt = np.dot(t, new_pt)
|
||||
return new_pt[:2]
|
||||
|
||||
|
||||
def affine_transform_bbox(bbox, trans, width, height):
|
||||
bbox = np.array(copy.deepcopy(bbox), dtype=np.float32)
|
||||
bbox[:2] = affine_transform(bbox[:2], trans)
|
||||
bbox[2:] = affine_transform(bbox[2:], trans)
|
||||
bbox[[0, 2]] = np.clip(bbox[[0, 2]], 0, width - 1)
|
||||
bbox[[1, 3]] = np.clip(bbox[[1, 3]], 0, height - 1)
|
||||
return bbox
|
||||
|
||||
|
||||
class CenterTrack(Detector):
|
||||
"""
|
||||
Args:
|
||||
model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
|
||||
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
|
||||
run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
|
||||
batch_size (int): size of pre batch in inference
|
||||
trt_min_shape (int): min shape for dynamic shape in trt
|
||||
trt_max_shape (int): max shape for dynamic shape in trt
|
||||
trt_opt_shape (int): opt shape for dynamic shape in trt
|
||||
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
|
||||
calibration, trt_calib_mode need to set True
|
||||
cpu_threads (int): cpu threads
|
||||
enable_mkldnn (bool): whether to open MKLDNN
|
||||
output_dir (string): The path of output, default as 'output'
|
||||
threshold (float): Score threshold of the detected bbox, default as 0.5
|
||||
save_images (bool): Whether to save visualization image results, default as False
|
||||
save_mot_txts (bool): Whether to save tracking results (txt), default as False
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_dir,
|
||||
tracker_config=None,
|
||||
device='CPU',
|
||||
run_mode='paddle',
|
||||
batch_size=1,
|
||||
trt_min_shape=1,
|
||||
trt_max_shape=960,
|
||||
trt_opt_shape=544,
|
||||
trt_calib_mode=False,
|
||||
cpu_threads=1,
|
||||
enable_mkldnn=False,
|
||||
output_dir='output',
|
||||
threshold=0.5,
|
||||
save_images=False,
|
||||
save_mot_txts=False, ):
|
||||
super(CenterTrack, self).__init__(
|
||||
model_dir=model_dir,
|
||||
device=device,
|
||||
run_mode=run_mode,
|
||||
batch_size=batch_size,
|
||||
trt_min_shape=trt_min_shape,
|
||||
trt_max_shape=trt_max_shape,
|
||||
trt_opt_shape=trt_opt_shape,
|
||||
trt_calib_mode=trt_calib_mode,
|
||||
cpu_threads=cpu_threads,
|
||||
enable_mkldnn=enable_mkldnn,
|
||||
output_dir=output_dir,
|
||||
threshold=threshold, )
|
||||
self.save_images = save_images
|
||||
self.save_mot_txts = save_mot_txts
|
||||
assert batch_size == 1, "MOT model only supports batch_size=1."
|
||||
self.det_times = Timer(with_tracker=True)
|
||||
self.num_classes = len(self.pred_config.labels)
|
||||
|
||||
# tracker config
|
||||
cfg = self.pred_config.tracker
|
||||
min_box_area = cfg.get('min_box_area', -1)
|
||||
vertical_ratio = cfg.get('vertical_ratio', -1)
|
||||
track_thresh = cfg.get('track_thresh', 0.4)
|
||||
pre_thresh = cfg.get('pre_thresh', 0.5)
|
||||
|
||||
self.tracker = CenterTracker(
|
||||
num_classes=self.num_classes,
|
||||
min_box_area=min_box_area,
|
||||
vertical_ratio=vertical_ratio,
|
||||
track_thresh=track_thresh,
|
||||
pre_thresh=pre_thresh)
|
||||
|
||||
self.pre_image = None
|
||||
|
||||
def get_additional_inputs(self, dets, meta, with_hm=True):
|
||||
# Render input heatmap from previous trackings.
|
||||
trans_input = meta['trans_input']
|
||||
inp_width, inp_height = int(meta['inp_width']), int(meta['inp_height'])
|
||||
input_hm = np.zeros((1, inp_height, inp_width), dtype=np.float32)
|
||||
|
||||
for det in dets:
|
||||
if det['score'] < self.tracker.pre_thresh:
|
||||
continue
|
||||
bbox = affine_transform_bbox(det['bbox'], trans_input, inp_width,
|
||||
inp_height)
|
||||
h, w = bbox[3] - bbox[1], bbox[2] - bbox[0]
|
||||
if (h > 0 and w > 0):
|
||||
radius = gaussian_radius(
|
||||
(math.ceil(h), math.ceil(w)), min_overlap=0.7)
|
||||
radius = max(0, int(radius))
|
||||
ct = np.array(
|
||||
[(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2],
|
||||
dtype=np.float32)
|
||||
ct_int = ct.astype(np.int32)
|
||||
if with_hm:
|
||||
input_hm[0] = draw_umich_gaussian(input_hm[0], ct_int,
|
||||
radius)
|
||||
if with_hm:
|
||||
input_hm = input_hm[np.newaxis]
|
||||
return input_hm
|
||||
|
||||
def preprocess(self, image_list):
|
||||
preprocess_ops = []
|
||||
for op_info in self.pred_config.preprocess_infos:
|
||||
new_op_info = op_info.copy()
|
||||
op_type = new_op_info.pop('type')
|
||||
preprocess_ops.append(eval(op_type)(**new_op_info))
|
||||
|
||||
assert len(image_list) == 1, 'MOT only support bs=1'
|
||||
im_path = image_list[0]
|
||||
im, im_info = preprocess(im_path, preprocess_ops)
|
||||
#inputs = create_inputs(im, im_info)
|
||||
inputs = {}
|
||||
inputs['image'] = np.array((im, )).astype('float32')
|
||||
inputs['im_shape'] = np.array((im_info['im_shape'], )).astype('float32')
|
||||
inputs['scale_factor'] = np.array(
|
||||
(im_info['scale_factor'], )).astype('float32')
|
||||
|
||||
inputs['trans_input'] = im_info['trans_input']
|
||||
inputs['inp_width'] = im_info['inp_width']
|
||||
inputs['inp_height'] = im_info['inp_height']
|
||||
inputs['center'] = im_info['center']
|
||||
inputs['scale'] = im_info['scale']
|
||||
inputs['out_height'] = im_info['out_height']
|
||||
inputs['out_width'] = im_info['out_width']
|
||||
|
||||
if self.pre_image is None:
|
||||
self.pre_image = inputs['image']
|
||||
# initializing tracker for the first frame
|
||||
self.tracker.init_track([])
|
||||
inputs['pre_image'] = self.pre_image
|
||||
self.pre_image = inputs['image'] # Note: update for next image
|
||||
|
||||
# render input heatmap from tracker status
|
||||
pre_hm = self.get_additional_inputs(
|
||||
self.tracker.tracks, inputs, with_hm=True)
|
||||
inputs['pre_hm'] = pre_hm #.to_tensor(pre_hm)
|
||||
|
||||
input_names = self.predictor.get_input_names()
|
||||
for i in range(len(input_names)):
|
||||
input_tensor = self.predictor.get_input_handle(input_names[i])
|
||||
if input_names[i] == 'x':
|
||||
input_tensor.copy_from_cpu(inputs['image'])
|
||||
else:
|
||||
input_tensor.copy_from_cpu(inputs[input_names[i]])
|
||||
|
||||
return inputs
|
||||
|
||||
def postprocess(self, inputs, result):
|
||||
# postprocess output of predictor
|
||||
np_bboxes = result['bboxes']
|
||||
if np_bboxes.shape[0] <= 0:
|
||||
print('[WARNNING] No object detected and tracked.')
|
||||
result = {'bboxes': np.zeros([0, 6]), 'cts': None, 'tracking': None}
|
||||
return result
|
||||
result = {k: v for k, v in result.items() if v is not None}
|
||||
return result
|
||||
|
||||
def centertrack_post_process(self, dets, meta, out_thresh):
|
||||
if not ('bboxes' in dets):
|
||||
return [{}]
|
||||
|
||||
preds = []
|
||||
c, s = meta['center'], meta['scale']
|
||||
h, w = meta['out_height'], meta['out_width']
|
||||
trans = get_affine_transform(
|
||||
center=c,
|
||||
input_size=s,
|
||||
rot=0,
|
||||
output_size=[w, h],
|
||||
shift=(0., 0.),
|
||||
inv=True).astype(np.float32)
|
||||
for i, dets_bbox in enumerate(dets['bboxes']):
|
||||
if dets_bbox[1] < out_thresh:
|
||||
break
|
||||
item = {}
|
||||
item['score'] = dets_bbox[1]
|
||||
item['class'] = int(dets_bbox[0]) + 1
|
||||
item['ct'] = transform_preds_with_trans(
|
||||
dets['cts'][i].reshape([1, 2]), trans).reshape(2)
|
||||
|
||||
if 'tracking' in dets:
|
||||
tracking = transform_preds_with_trans(
|
||||
(dets['tracking'][i] + dets['cts'][i]).reshape([1, 2]),
|
||||
trans).reshape(2)
|
||||
item['tracking'] = tracking - item['ct']
|
||||
|
||||
if 'bboxes' in dets:
|
||||
bbox = transform_preds_with_trans(
|
||||
dets_bbox[2:6].reshape([2, 2]), trans).reshape(4)
|
||||
item['bbox'] = bbox
|
||||
|
||||
preds.append(item)
|
||||
return preds
|
||||
|
||||
def tracking(self, inputs, det_results):
|
||||
result = self.centertrack_post_process(det_results, inputs,
|
||||
self.tracker.out_thresh)
|
||||
online_targets = self.tracker.update(result)
|
||||
|
||||
online_tlwhs, online_scores, online_ids = [], [], []
|
||||
for t in online_targets:
|
||||
bbox = t['bbox']
|
||||
tlwh = [bbox[0], bbox[1], bbox[2] - bbox[0], bbox[3] - bbox[1]]
|
||||
tscore = float(t['score'])
|
||||
tid = int(t['tracking_id'])
|
||||
if tlwh[2] * tlwh[3] > 0:
|
||||
online_tlwhs.append(tlwh)
|
||||
online_ids.append(tid)
|
||||
online_scores.append(tscore)
|
||||
return online_tlwhs, online_scores, online_ids
|
||||
|
||||
def predict(self, repeats=1):
|
||||
'''
|
||||
Args:
|
||||
repeats (int): repeats number for prediction
|
||||
Returns:
|
||||
result (dict): include 'bboxes', 'cts' and 'tracking':
|
||||
np.ndarray: shape:[N,6],[N,2] and [N,2], N: number of box
|
||||
'''
|
||||
# model prediction
|
||||
np_bboxes, np_cts, np_tracking = None, None, None
|
||||
for i in range(repeats):
|
||||
self.predictor.run()
|
||||
output_names = self.predictor.get_output_names()
|
||||
bboxes_tensor = self.predictor.get_output_handle(output_names[0])
|
||||
np_bboxes = bboxes_tensor.copy_to_cpu()
|
||||
cts_tensor = self.predictor.get_output_handle(output_names[1])
|
||||
np_cts = cts_tensor.copy_to_cpu()
|
||||
tracking_tensor = self.predictor.get_output_handle(output_names[2])
|
||||
np_tracking = tracking_tensor.copy_to_cpu()
|
||||
|
||||
result = dict(bboxes=np_bboxes, cts=np_cts, tracking=np_tracking)
|
||||
return result
|
||||
|
||||
def predict_image(self,
|
||||
image_list,
|
||||
run_benchmark=False,
|
||||
repeats=1,
|
||||
visual=True,
|
||||
seq_name=None):
|
||||
mot_results = []
|
||||
num_classes = self.num_classes
|
||||
image_list.sort()
|
||||
ids2names = self.pred_config.labels
|
||||
data_type = 'mcmot' if num_classes > 1 else 'mot'
|
||||
for frame_id, img_file in enumerate(image_list):
|
||||
batch_image_list = [img_file] # bs=1 in MOT model
|
||||
if run_benchmark:
|
||||
# preprocess
|
||||
inputs = self.preprocess(batch_image_list) # warmup
|
||||
self.det_times.preprocess_time_s.start()
|
||||
inputs = self.preprocess(batch_image_list)
|
||||
self.det_times.preprocess_time_s.end()
|
||||
|
||||
# model prediction
|
||||
result_warmup = self.predict(repeats=repeats) # warmup
|
||||
self.det_times.inference_time_s.start()
|
||||
result = self.predict(repeats=repeats)
|
||||
self.det_times.inference_time_s.end(repeats=repeats)
|
||||
|
||||
# postprocess
|
||||
result_warmup = self.postprocess(inputs, result) # warmup
|
||||
self.det_times.postprocess_time_s.start()
|
||||
det_result = self.postprocess(inputs, result)
|
||||
self.det_times.postprocess_time_s.end()
|
||||
|
||||
# tracking
|
||||
result_warmup = self.tracking(inputs, det_result)
|
||||
self.det_times.tracking_time_s.start()
|
||||
online_tlwhs, online_scores, online_ids = self.tracking(
|
||||
inputs, det_result)
|
||||
self.det_times.tracking_time_s.end()
|
||||
self.det_times.img_num += 1
|
||||
|
||||
cm, gm, gu = get_current_memory_mb()
|
||||
self.cpu_mem += cm
|
||||
self.gpu_mem += gm
|
||||
self.gpu_util += gu
|
||||
|
||||
else:
|
||||
self.det_times.preprocess_time_s.start()
|
||||
inputs = self.preprocess(batch_image_list)
|
||||
self.det_times.preprocess_time_s.end()
|
||||
|
||||
self.det_times.inference_time_s.start()
|
||||
result = self.predict()
|
||||
self.det_times.inference_time_s.end()
|
||||
|
||||
self.det_times.postprocess_time_s.start()
|
||||
det_result = self.postprocess(inputs, result)
|
||||
self.det_times.postprocess_time_s.end()
|
||||
|
||||
# tracking process
|
||||
self.det_times.tracking_time_s.start()
|
||||
online_tlwhs, online_scores, online_ids = self.tracking(
|
||||
inputs, det_result)
|
||||
self.det_times.tracking_time_s.end()
|
||||
self.det_times.img_num += 1
|
||||
|
||||
if visual:
|
||||
if len(image_list) > 1 and frame_id % 10 == 0:
|
||||
print('Tracking frame {}'.format(frame_id))
|
||||
frame, _ = decode_image(img_file, {})
|
||||
|
||||
im = plot_tracking(
|
||||
frame,
|
||||
online_tlwhs,
|
||||
online_ids,
|
||||
online_scores,
|
||||
frame_id=frame_id,
|
||||
ids2names=ids2names)
|
||||
if seq_name is None:
|
||||
seq_name = image_list[0].split('/')[-2]
|
||||
save_dir = os.path.join(self.output_dir, seq_name)
|
||||
if not os.path.exists(save_dir):
|
||||
os.makedirs(save_dir)
|
||||
cv2.imwrite(
|
||||
os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im)
|
||||
|
||||
mot_results.append([online_tlwhs, online_scores, online_ids])
|
||||
return mot_results
|
||||
|
||||
def predict_video(self, video_file, camera_id):
|
||||
video_out_name = 'mot_output.mp4'
|
||||
if camera_id != -1:
|
||||
capture = cv2.VideoCapture(camera_id)
|
||||
else:
|
||||
capture = cv2.VideoCapture(video_file)
|
||||
video_out_name = os.path.split(video_file)[-1]
|
||||
# Get Video info : resolution, fps, frame count
|
||||
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
fps = int(capture.get(cv2.CAP_PROP_FPS))
|
||||
frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||
print("fps: %d, frame_count: %d" % (fps, frame_count))
|
||||
|
||||
if not os.path.exists(self.output_dir):
|
||||
os.makedirs(self.output_dir)
|
||||
out_path = os.path.join(self.output_dir, video_out_name)
|
||||
video_format = 'mp4v'
|
||||
fourcc = cv2.VideoWriter_fourcc(*video_format)
|
||||
writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
|
||||
|
||||
frame_id = 1
|
||||
timer = MOTTimer()
|
||||
results = defaultdict(list) # centertrack onpy support single class
|
||||
num_classes = self.num_classes
|
||||
data_type = 'mcmot' if num_classes > 1 else 'mot'
|
||||
ids2names = self.pred_config.labels
|
||||
while (1):
|
||||
ret, frame = capture.read()
|
||||
if not ret:
|
||||
break
|
||||
if frame_id % 10 == 0:
|
||||
print('Tracking frame: %d' % (frame_id))
|
||||
frame_id += 1
|
||||
|
||||
timer.tic()
|
||||
seq_name = video_out_name.split('.')[0]
|
||||
mot_results = self.predict_image(
|
||||
[frame[:, :, ::-1]], visual=False, seq_name=seq_name)
|
||||
timer.toc()
|
||||
|
||||
fps = 1. / timer.duration
|
||||
online_tlwhs, online_scores, online_ids = mot_results[0]
|
||||
results[0].append(
|
||||
(frame_id + 1, online_tlwhs, online_scores, online_ids))
|
||||
im = plot_tracking(
|
||||
frame,
|
||||
online_tlwhs,
|
||||
online_ids,
|
||||
online_scores,
|
||||
frame_id=frame_id,
|
||||
fps=fps,
|
||||
ids2names=ids2names)
|
||||
|
||||
writer.write(im)
|
||||
if camera_id != -1:
|
||||
cv2.imshow('Mask Detection', im)
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
if self.save_mot_txts:
|
||||
result_filename = os.path.join(
|
||||
self.output_dir, video_out_name.split('.')[-2] + '.txt')
|
||||
|
||||
write_mot_results(result_filename, results, data_type, num_classes)
|
||||
|
||||
writer.release()
|
||||
|
||||
|
||||
def main():
|
||||
detector = CenterTrack(
|
||||
FLAGS.model_dir,
|
||||
tracker_config=None,
|
||||
device=FLAGS.device,
|
||||
run_mode=FLAGS.run_mode,
|
||||
batch_size=1,
|
||||
trt_min_shape=FLAGS.trt_min_shape,
|
||||
trt_max_shape=FLAGS.trt_max_shape,
|
||||
trt_opt_shape=FLAGS.trt_opt_shape,
|
||||
trt_calib_mode=FLAGS.trt_calib_mode,
|
||||
cpu_threads=FLAGS.cpu_threads,
|
||||
enable_mkldnn=FLAGS.enable_mkldnn,
|
||||
output_dir=FLAGS.output_dir,
|
||||
threshold=FLAGS.threshold,
|
||||
save_images=FLAGS.save_images,
|
||||
save_mot_txts=FLAGS.save_mot_txts)
|
||||
|
||||
# predict from video file or camera video stream
|
||||
if FLAGS.video_file is not None or FLAGS.camera_id != -1:
|
||||
detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
|
||||
else:
|
||||
# predict from image
|
||||
img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
|
||||
detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10)
|
||||
|
||||
if not FLAGS.run_benchmark:
|
||||
detector.det_times.info(average=True)
|
||||
else:
|
||||
mode = FLAGS.run_mode
|
||||
model_dir = FLAGS.model_dir
|
||||
model_info = {
|
||||
'model_name': model_dir.strip('/').split('/')[-1],
|
||||
'precision': mode.split('_')[-1]
|
||||
}
|
||||
bench_log(detector, img_list, model_info, name='MOT')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
paddle.enable_static()
|
||||
parser = argsparser()
|
||||
FLAGS = parser.parse_args()
|
||||
print_arguments(FLAGS)
|
||||
FLAGS.device = FLAGS.device.upper()
|
||||
assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
|
||||
], "device should be CPU, GPU, NPU or XPU"
|
||||
|
||||
main()
|
||||
381
third-party/paddle-inference/mot_jde_infer.py
vendored
Normal file
381
third-party/paddle-inference/mot_jde_infer.py
vendored
Normal file
@@ -0,0 +1,381 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import os
|
||||
import time
|
||||
import yaml
|
||||
import cv2
|
||||
import numpy as np
|
||||
from collections import defaultdict
|
||||
import paddle
|
||||
|
||||
from benchmark_utils import PaddleInferBenchmark
|
||||
from preprocess import decode_image
|
||||
from utils import argsparser, Timer, get_current_memory_mb
|
||||
from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig
|
||||
|
||||
# add python path
|
||||
import sys
|
||||
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
|
||||
sys.path.insert(0, parent_path)
|
||||
|
||||
from pptracking.python.mot import JDETracker
|
||||
from pptracking.python.mot.utils import MOTTimer, write_mot_results
|
||||
from pptracking.python.mot.visualize import plot_tracking_dict
|
||||
|
||||
# Global dictionary
|
||||
MOT_JDE_SUPPORT_MODELS = {
|
||||
'JDE',
|
||||
'FairMOT',
|
||||
}
|
||||
|
||||
|
||||
class JDE_Detector(Detector):
|
||||
"""
|
||||
Args:
|
||||
model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
|
||||
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
|
||||
run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
|
||||
batch_size (int): size of pre batch in inference
|
||||
trt_min_shape (int): min shape for dynamic shape in trt
|
||||
trt_max_shape (int): max shape for dynamic shape in trt
|
||||
trt_opt_shape (int): opt shape for dynamic shape in trt
|
||||
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
|
||||
calibration, trt_calib_mode need to set True
|
||||
cpu_threads (int): cpu threads
|
||||
enable_mkldnn (bool): whether to open MKLDNN
|
||||
output_dir (string): The path of output, default as 'output'
|
||||
threshold (float): Score threshold of the detected bbox, default as 0.5
|
||||
save_images (bool): Whether to save visualization image results, default as False
|
||||
save_mot_txts (bool): Whether to save tracking results (txt), default as False
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_dir,
|
||||
tracker_config=None,
|
||||
device='CPU',
|
||||
run_mode='paddle',
|
||||
batch_size=1,
|
||||
trt_min_shape=1,
|
||||
trt_max_shape=1088,
|
||||
trt_opt_shape=608,
|
||||
trt_calib_mode=False,
|
||||
cpu_threads=1,
|
||||
enable_mkldnn=False,
|
||||
output_dir='output',
|
||||
threshold=0.5,
|
||||
save_images=False,
|
||||
save_mot_txts=False, ):
|
||||
super(JDE_Detector, self).__init__(
|
||||
model_dir=model_dir,
|
||||
device=device,
|
||||
run_mode=run_mode,
|
||||
batch_size=batch_size,
|
||||
trt_min_shape=trt_min_shape,
|
||||
trt_max_shape=trt_max_shape,
|
||||
trt_opt_shape=trt_opt_shape,
|
||||
trt_calib_mode=trt_calib_mode,
|
||||
cpu_threads=cpu_threads,
|
||||
enable_mkldnn=enable_mkldnn,
|
||||
output_dir=output_dir,
|
||||
threshold=threshold, )
|
||||
self.save_images = save_images
|
||||
self.save_mot_txts = save_mot_txts
|
||||
assert batch_size == 1, "MOT model only supports batch_size=1."
|
||||
self.det_times = Timer(with_tracker=True)
|
||||
self.num_classes = len(self.pred_config.labels)
|
||||
|
||||
# tracker config
|
||||
assert self.pred_config.tracker, "The exported JDE Detector model should have tracker."
|
||||
cfg = self.pred_config.tracker
|
||||
min_box_area = cfg.get('min_box_area', 0.0)
|
||||
vertical_ratio = cfg.get('vertical_ratio', 0.0)
|
||||
conf_thres = cfg.get('conf_thres', 0.0)
|
||||
tracked_thresh = cfg.get('tracked_thresh', 0.7)
|
||||
metric_type = cfg.get('metric_type', 'euclidean')
|
||||
|
||||
self.tracker = JDETracker(
|
||||
num_classes=self.num_classes,
|
||||
min_box_area=min_box_area,
|
||||
vertical_ratio=vertical_ratio,
|
||||
conf_thres=conf_thres,
|
||||
tracked_thresh=tracked_thresh,
|
||||
metric_type=metric_type)
|
||||
|
||||
def postprocess(self, inputs, result):
|
||||
# postprocess output of predictor
|
||||
np_boxes = result['pred_dets']
|
||||
if np_boxes.shape[0] <= 0:
|
||||
print('[WARNNING] No object detected.')
|
||||
result = {'pred_dets': np.zeros([0, 6]), 'pred_embs': None}
|
||||
result = {k: v for k, v in result.items() if v is not None}
|
||||
return result
|
||||
|
||||
def tracking(self, det_results):
|
||||
pred_dets = det_results['pred_dets'] # cls_id, score, x0, y0, x1, y1
|
||||
pred_embs = det_results['pred_embs']
|
||||
online_targets_dict = self.tracker.update(pred_dets, pred_embs)
|
||||
|
||||
online_tlwhs = defaultdict(list)
|
||||
online_scores = defaultdict(list)
|
||||
online_ids = defaultdict(list)
|
||||
for cls_id in range(self.num_classes):
|
||||
online_targets = online_targets_dict[cls_id]
|
||||
for t in online_targets:
|
||||
tlwh = t.tlwh
|
||||
tid = t.track_id
|
||||
tscore = t.score
|
||||
if tlwh[2] * tlwh[3] <= self.tracker.min_box_area: continue
|
||||
if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
|
||||
3] > self.tracker.vertical_ratio:
|
||||
continue
|
||||
online_tlwhs[cls_id].append(tlwh)
|
||||
online_ids[cls_id].append(tid)
|
||||
online_scores[cls_id].append(tscore)
|
||||
return online_tlwhs, online_scores, online_ids
|
||||
|
||||
def predict(self, repeats=1):
|
||||
'''
|
||||
Args:
|
||||
repeats (int): repeats number for prediction
|
||||
Returns:
|
||||
result (dict): include 'pred_dets': np.ndarray: shape:[N,6], N: number of box,
|
||||
matix element:[class, score, x_min, y_min, x_max, y_max]
|
||||
FairMOT(JDE)'s result include 'pred_embs': np.ndarray:
|
||||
shape: [N, 128]
|
||||
'''
|
||||
# model prediction
|
||||
np_pred_dets, np_pred_embs = None, None
|
||||
for i in range(repeats):
|
||||
self.predictor.run()
|
||||
output_names = self.predictor.get_output_names()
|
||||
boxes_tensor = self.predictor.get_output_handle(output_names[0])
|
||||
np_pred_dets = boxes_tensor.copy_to_cpu()
|
||||
embs_tensor = self.predictor.get_output_handle(output_names[1])
|
||||
np_pred_embs = embs_tensor.copy_to_cpu()
|
||||
|
||||
result = dict(pred_dets=np_pred_dets, pred_embs=np_pred_embs)
|
||||
return result
|
||||
|
||||
def predict_image(self,
|
||||
image_list,
|
||||
run_benchmark=False,
|
||||
repeats=1,
|
||||
visual=True,
|
||||
seq_name=None):
|
||||
mot_results = []
|
||||
num_classes = self.num_classes
|
||||
image_list.sort()
|
||||
ids2names = self.pred_config.labels
|
||||
data_type = 'mcmot' if num_classes > 1 else 'mot'
|
||||
for frame_id, img_file in enumerate(image_list):
|
||||
batch_image_list = [img_file] # bs=1 in MOT model
|
||||
if run_benchmark:
|
||||
# preprocess
|
||||
inputs = self.preprocess(batch_image_list) # warmup
|
||||
self.det_times.preprocess_time_s.start()
|
||||
inputs = self.preprocess(batch_image_list)
|
||||
self.det_times.preprocess_time_s.end()
|
||||
|
||||
# model prediction
|
||||
result_warmup = self.predict(repeats=repeats) # warmup
|
||||
self.det_times.inference_time_s.start()
|
||||
result = self.predict(repeats=repeats)
|
||||
self.det_times.inference_time_s.end(repeats=repeats)
|
||||
|
||||
# postprocess
|
||||
result_warmup = self.postprocess(inputs, result) # warmup
|
||||
self.det_times.postprocess_time_s.start()
|
||||
det_result = self.postprocess(inputs, result)
|
||||
self.det_times.postprocess_time_s.end()
|
||||
|
||||
# tracking
|
||||
result_warmup = self.tracking(det_result)
|
||||
self.det_times.tracking_time_s.start()
|
||||
online_tlwhs, online_scores, online_ids = self.tracking(
|
||||
det_result)
|
||||
self.det_times.tracking_time_s.end()
|
||||
self.det_times.img_num += 1
|
||||
|
||||
cm, gm, gu = get_current_memory_mb()
|
||||
self.cpu_mem += cm
|
||||
self.gpu_mem += gm
|
||||
self.gpu_util += gu
|
||||
|
||||
else:
|
||||
self.det_times.preprocess_time_s.start()
|
||||
inputs = self.preprocess(batch_image_list)
|
||||
self.det_times.preprocess_time_s.end()
|
||||
|
||||
self.det_times.inference_time_s.start()
|
||||
result = self.predict()
|
||||
self.det_times.inference_time_s.end()
|
||||
|
||||
self.det_times.postprocess_time_s.start()
|
||||
det_result = self.postprocess(inputs, result)
|
||||
self.det_times.postprocess_time_s.end()
|
||||
|
||||
# tracking process
|
||||
self.det_times.tracking_time_s.start()
|
||||
online_tlwhs, online_scores, online_ids = self.tracking(
|
||||
det_result)
|
||||
self.det_times.tracking_time_s.end()
|
||||
self.det_times.img_num += 1
|
||||
|
||||
if visual:
|
||||
if len(image_list) > 1 and frame_id % 10 == 0:
|
||||
print('Tracking frame {}'.format(frame_id))
|
||||
frame, _ = decode_image(img_file, {})
|
||||
|
||||
im = plot_tracking_dict(
|
||||
frame,
|
||||
num_classes,
|
||||
online_tlwhs,
|
||||
online_ids,
|
||||
online_scores,
|
||||
frame_id=frame_id,
|
||||
ids2names=ids2names)
|
||||
if seq_name is None:
|
||||
seq_name = image_list[0].split('/')[-2]
|
||||
save_dir = os.path.join(self.output_dir, seq_name)
|
||||
if not os.path.exists(save_dir):
|
||||
os.makedirs(save_dir)
|
||||
cv2.imwrite(
|
||||
os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im)
|
||||
|
||||
mot_results.append([online_tlwhs, online_scores, online_ids])
|
||||
return mot_results
|
||||
|
||||
def predict_video(self, video_file, camera_id):
|
||||
video_out_name = 'mot_output.mp4'
|
||||
if camera_id != -1:
|
||||
capture = cv2.VideoCapture(camera_id)
|
||||
else:
|
||||
capture = cv2.VideoCapture(video_file)
|
||||
video_out_name = os.path.split(video_file)[-1]
|
||||
# Get Video info : resolution, fps, frame count
|
||||
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
fps = int(capture.get(cv2.CAP_PROP_FPS))
|
||||
frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||
print("fps: %d, frame_count: %d" % (fps, frame_count))
|
||||
|
||||
if not os.path.exists(self.output_dir):
|
||||
os.makedirs(self.output_dir)
|
||||
out_path = os.path.join(self.output_dir, video_out_name)
|
||||
video_format = 'mp4v'
|
||||
fourcc = cv2.VideoWriter_fourcc(*video_format)
|
||||
writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
|
||||
|
||||
frame_id = 1
|
||||
timer = MOTTimer()
|
||||
results = defaultdict(list) # support single class and multi classes
|
||||
num_classes = self.num_classes
|
||||
data_type = 'mcmot' if num_classes > 1 else 'mot'
|
||||
ids2names = self.pred_config.labels
|
||||
while (1):
|
||||
ret, frame = capture.read()
|
||||
if not ret:
|
||||
break
|
||||
if frame_id % 10 == 0:
|
||||
print('Tracking frame: %d' % (frame_id))
|
||||
frame_id += 1
|
||||
|
||||
timer.tic()
|
||||
seq_name = video_out_name.split('.')[0]
|
||||
mot_results = self.predict_image(
|
||||
[frame[:, :, ::-1]], visual=False, seq_name=seq_name)
|
||||
timer.toc()
|
||||
|
||||
online_tlwhs, online_scores, online_ids = mot_results[0]
|
||||
for cls_id in range(num_classes):
|
||||
results[cls_id].append(
|
||||
(frame_id + 1, online_tlwhs[cls_id], online_scores[cls_id],
|
||||
online_ids[cls_id]))
|
||||
|
||||
fps = 1. / timer.duration
|
||||
im = plot_tracking_dict(
|
||||
frame,
|
||||
num_classes,
|
||||
online_tlwhs,
|
||||
online_ids,
|
||||
online_scores,
|
||||
frame_id=frame_id,
|
||||
fps=fps,
|
||||
ids2names=ids2names)
|
||||
|
||||
writer.write(im)
|
||||
if camera_id != -1:
|
||||
cv2.imshow('Mask Detection', im)
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
if self.save_mot_txts:
|
||||
result_filename = os.path.join(
|
||||
self.output_dir, video_out_name.split('.')[-2] + '.txt')
|
||||
|
||||
write_mot_results(result_filename, results, data_type, num_classes)
|
||||
|
||||
writer.release()
|
||||
|
||||
|
||||
def main():
|
||||
detector = JDE_Detector(
|
||||
FLAGS.model_dir,
|
||||
tracker_config=None,
|
||||
device=FLAGS.device,
|
||||
run_mode=FLAGS.run_mode,
|
||||
batch_size=1,
|
||||
trt_min_shape=FLAGS.trt_min_shape,
|
||||
trt_max_shape=FLAGS.trt_max_shape,
|
||||
trt_opt_shape=FLAGS.trt_opt_shape,
|
||||
trt_calib_mode=FLAGS.trt_calib_mode,
|
||||
cpu_threads=FLAGS.cpu_threads,
|
||||
enable_mkldnn=FLAGS.enable_mkldnn,
|
||||
output_dir=FLAGS.output_dir,
|
||||
threshold=FLAGS.threshold,
|
||||
save_images=FLAGS.save_images,
|
||||
save_mot_txts=FLAGS.save_mot_txts)
|
||||
|
||||
# predict from video file or camera video stream
|
||||
if FLAGS.video_file is not None or FLAGS.camera_id != -1:
|
||||
detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
|
||||
else:
|
||||
# predict from image
|
||||
img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
|
||||
detector.predict_image(img_list, FLAGS.run_benchmark, repeats=10)
|
||||
|
||||
if not FLAGS.run_benchmark:
|
||||
detector.det_times.info(average=True)
|
||||
else:
|
||||
mode = FLAGS.run_mode
|
||||
model_dir = FLAGS.model_dir
|
||||
model_info = {
|
||||
'model_name': model_dir.strip('/').split('/')[-1],
|
||||
'precision': mode.split('_')[-1]
|
||||
}
|
||||
bench_log(detector, img_list, model_info, name='MOT')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
paddle.enable_static()
|
||||
parser = argsparser()
|
||||
FLAGS = parser.parse_args()
|
||||
print_arguments(FLAGS)
|
||||
FLAGS.device = FLAGS.device.upper()
|
||||
assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
|
||||
], "device should be CPU, GPU, NPU or XPU"
|
||||
|
||||
main()
|
||||
301
third-party/paddle-inference/mot_keypoint_unite_infer.py
vendored
Normal file
301
third-party/paddle-inference/mot_keypoint_unite_infer.py
vendored
Normal file
@@ -0,0 +1,301 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import os
|
||||
import json
|
||||
import cv2
|
||||
import math
|
||||
import numpy as np
|
||||
import paddle
|
||||
import yaml
|
||||
import copy
|
||||
from collections import defaultdict
|
||||
|
||||
from mot_keypoint_unite_utils import argsparser
|
||||
from preprocess import decode_image
|
||||
from infer import print_arguments, get_test_images, bench_log
|
||||
from mot_sde_infer import SDE_Detector
|
||||
from mot_jde_infer import JDE_Detector, MOT_JDE_SUPPORT_MODELS
|
||||
from keypoint_infer import KeyPointDetector, KEYPOINT_SUPPORT_MODELS
|
||||
from det_keypoint_unite_infer import predict_with_given_det
|
||||
from visualize import visualize_pose
|
||||
from benchmark_utils import PaddleInferBenchmark
|
||||
from utils import get_current_memory_mb
|
||||
from keypoint_postprocess import translate_to_ori_images
|
||||
|
||||
# add python path
|
||||
import sys
|
||||
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
|
||||
sys.path.insert(0, parent_path)
|
||||
|
||||
from pptracking.python.mot.visualize import plot_tracking, plot_tracking_dict
|
||||
from pptracking.python.mot.utils import MOTTimer as FPSTimer
|
||||
|
||||
|
||||
def convert_mot_to_det(tlwhs, scores):
|
||||
results = {}
|
||||
num_mot = len(tlwhs)
|
||||
xyxys = copy.deepcopy(tlwhs)
|
||||
for xyxy in xyxys.copy():
|
||||
xyxy[2:] = xyxy[2:] + xyxy[:2]
|
||||
# support single class now
|
||||
results['boxes'] = np.vstack(
|
||||
[np.hstack([0, scores[i], xyxys[i]]) for i in range(num_mot)])
|
||||
results['boxes_num'] = np.array([num_mot])
|
||||
return results
|
||||
|
||||
|
||||
def mot_topdown_unite_predict(mot_detector,
|
||||
topdown_keypoint_detector,
|
||||
image_list,
|
||||
keypoint_batch_size=1,
|
||||
save_res=False):
|
||||
det_timer = mot_detector.get_timer()
|
||||
store_res = []
|
||||
image_list.sort()
|
||||
num_classes = mot_detector.num_classes
|
||||
for i, img_file in enumerate(image_list):
|
||||
# Decode image in advance in mot + pose prediction
|
||||
det_timer.preprocess_time_s.start()
|
||||
image, _ = decode_image(img_file, {})
|
||||
det_timer.preprocess_time_s.end()
|
||||
|
||||
if FLAGS.run_benchmark:
|
||||
mot_results = mot_detector.predict_image(
|
||||
[image], run_benchmark=True, repeats=10)
|
||||
|
||||
cm, gm, gu = get_current_memory_mb()
|
||||
mot_detector.cpu_mem += cm
|
||||
mot_detector.gpu_mem += gm
|
||||
mot_detector.gpu_util += gu
|
||||
else:
|
||||
mot_results = mot_detector.predict_image([image], visual=False)
|
||||
|
||||
online_tlwhs, online_scores, online_ids = mot_results[
|
||||
0] # only support bs=1 in MOT model
|
||||
results = convert_mot_to_det(
|
||||
online_tlwhs[0],
|
||||
online_scores[0]) # only support single class for mot + pose
|
||||
if results['boxes_num'] == 0:
|
||||
continue
|
||||
|
||||
keypoint_res = predict_with_given_det(
|
||||
image, results, topdown_keypoint_detector, keypoint_batch_size,
|
||||
FLAGS.run_benchmark)
|
||||
|
||||
if save_res:
|
||||
save_name = img_file if isinstance(img_file, str) else i
|
||||
store_res.append([
|
||||
save_name, keypoint_res['bbox'],
|
||||
[keypoint_res['keypoint'][0], keypoint_res['keypoint'][1]]
|
||||
])
|
||||
if FLAGS.run_benchmark:
|
||||
cm, gm, gu = get_current_memory_mb()
|
||||
topdown_keypoint_detector.cpu_mem += cm
|
||||
topdown_keypoint_detector.gpu_mem += gm
|
||||
topdown_keypoint_detector.gpu_util += gu
|
||||
else:
|
||||
if not os.path.exists(FLAGS.output_dir):
|
||||
os.makedirs(FLAGS.output_dir)
|
||||
visualize_pose(
|
||||
img_file,
|
||||
keypoint_res,
|
||||
visual_thresh=FLAGS.keypoint_threshold,
|
||||
save_dir=FLAGS.output_dir)
|
||||
|
||||
if save_res:
|
||||
"""
|
||||
1) store_res: a list of image_data
|
||||
2) image_data: [imageid, rects, [keypoints, scores]]
|
||||
3) rects: list of rect [xmin, ymin, xmax, ymax]
|
||||
4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list
|
||||
5) scores: mean of all joint conf
|
||||
"""
|
||||
with open("det_keypoint_unite_image_results.json", 'w') as wf:
|
||||
json.dump(store_res, wf, indent=4)
|
||||
|
||||
|
||||
def mot_topdown_unite_predict_video(mot_detector,
|
||||
topdown_keypoint_detector,
|
||||
camera_id,
|
||||
keypoint_batch_size=1,
|
||||
save_res=False):
|
||||
video_name = 'output.mp4'
|
||||
if camera_id != -1:
|
||||
capture = cv2.VideoCapture(camera_id)
|
||||
else:
|
||||
capture = cv2.VideoCapture(FLAGS.video_file)
|
||||
video_name = os.path.split(FLAGS.video_file)[-1]
|
||||
# Get Video info : resolution, fps, frame count
|
||||
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
fps = int(capture.get(cv2.CAP_PROP_FPS))
|
||||
frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||
print("fps: %d, frame_count: %d" % (fps, frame_count))
|
||||
|
||||
if not os.path.exists(FLAGS.output_dir):
|
||||
os.makedirs(FLAGS.output_dir)
|
||||
out_path = os.path.join(FLAGS.output_dir, video_name)
|
||||
fourcc = cv2.VideoWriter_fourcc(* 'mp4v')
|
||||
writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
|
||||
frame_id = 0
|
||||
timer_mot, timer_kp, timer_mot_kp = FPSTimer(), FPSTimer(), FPSTimer()
|
||||
|
||||
num_classes = mot_detector.num_classes
|
||||
assert num_classes == 1, 'Only one category mot model supported for uniting keypoint deploy.'
|
||||
data_type = 'mot'
|
||||
|
||||
while (1):
|
||||
ret, frame = capture.read()
|
||||
if not ret:
|
||||
break
|
||||
if frame_id % 10 == 0:
|
||||
print('Tracking frame: %d' % (frame_id))
|
||||
frame_id += 1
|
||||
timer_mot_kp.tic()
|
||||
|
||||
# mot model
|
||||
timer_mot.tic()
|
||||
|
||||
frame2 = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
||||
|
||||
mot_results = mot_detector.predict_image([frame2], visual=False)
|
||||
timer_mot.toc()
|
||||
online_tlwhs, online_scores, online_ids = mot_results[0]
|
||||
results = convert_mot_to_det(
|
||||
online_tlwhs[0],
|
||||
online_scores[0]) # only support single class for mot + pose
|
||||
if results['boxes_num'] == 0:
|
||||
continue
|
||||
|
||||
# keypoint model
|
||||
timer_kp.tic()
|
||||
keypoint_res = predict_with_given_det(
|
||||
frame2, results, topdown_keypoint_detector, keypoint_batch_size,
|
||||
FLAGS.run_benchmark)
|
||||
timer_kp.toc()
|
||||
timer_mot_kp.toc()
|
||||
|
||||
kp_fps = 1. / timer_kp.duration
|
||||
mot_kp_fps = 1. / timer_mot_kp.duration
|
||||
|
||||
im = visualize_pose(
|
||||
frame,
|
||||
keypoint_res,
|
||||
visual_thresh=FLAGS.keypoint_threshold,
|
||||
returnimg=True,
|
||||
ids=online_ids[0])
|
||||
|
||||
im = plot_tracking_dict(
|
||||
im,
|
||||
num_classes,
|
||||
online_tlwhs,
|
||||
online_ids,
|
||||
online_scores,
|
||||
frame_id=frame_id,
|
||||
fps=mot_kp_fps)
|
||||
|
||||
writer.write(im)
|
||||
if camera_id != -1:
|
||||
cv2.imshow('Tracking and keypoint results', im)
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
writer.release()
|
||||
print('output_video saved to: {}'.format(out_path))
|
||||
|
||||
|
||||
def main():
|
||||
deploy_file = os.path.join(FLAGS.mot_model_dir, 'infer_cfg.yml')
|
||||
with open(deploy_file) as f:
|
||||
yml_conf = yaml.safe_load(f)
|
||||
arch = yml_conf['arch']
|
||||
mot_detector_func = 'SDE_Detector'
|
||||
if arch in MOT_JDE_SUPPORT_MODELS:
|
||||
mot_detector_func = 'JDE_Detector'
|
||||
|
||||
mot_detector = eval(mot_detector_func)(FLAGS.mot_model_dir,
|
||||
FLAGS.tracker_config,
|
||||
device=FLAGS.device,
|
||||
run_mode=FLAGS.run_mode,
|
||||
batch_size=1,
|
||||
trt_min_shape=FLAGS.trt_min_shape,
|
||||
trt_max_shape=FLAGS.trt_max_shape,
|
||||
trt_opt_shape=FLAGS.trt_opt_shape,
|
||||
trt_calib_mode=FLAGS.trt_calib_mode,
|
||||
cpu_threads=FLAGS.cpu_threads,
|
||||
enable_mkldnn=FLAGS.enable_mkldnn,
|
||||
threshold=FLAGS.mot_threshold,
|
||||
output_dir=FLAGS.output_dir)
|
||||
|
||||
topdown_keypoint_detector = KeyPointDetector(
|
||||
FLAGS.keypoint_model_dir,
|
||||
device=FLAGS.device,
|
||||
run_mode=FLAGS.run_mode,
|
||||
batch_size=FLAGS.keypoint_batch_size,
|
||||
trt_min_shape=FLAGS.trt_min_shape,
|
||||
trt_max_shape=FLAGS.trt_max_shape,
|
||||
trt_opt_shape=FLAGS.trt_opt_shape,
|
||||
trt_calib_mode=FLAGS.trt_calib_mode,
|
||||
cpu_threads=FLAGS.cpu_threads,
|
||||
enable_mkldnn=FLAGS.enable_mkldnn,
|
||||
threshold=FLAGS.keypoint_threshold,
|
||||
output_dir=FLAGS.output_dir,
|
||||
use_dark=FLAGS.use_dark)
|
||||
keypoint_arch = topdown_keypoint_detector.pred_config.arch
|
||||
assert KEYPOINT_SUPPORT_MODELS[
|
||||
keypoint_arch] == 'keypoint_topdown', 'MOT-Keypoint unite inference only supports topdown models.'
|
||||
|
||||
# predict from video file or camera video stream
|
||||
if FLAGS.video_file is not None or FLAGS.camera_id != -1:
|
||||
mot_topdown_unite_predict_video(
|
||||
mot_detector, topdown_keypoint_detector, FLAGS.camera_id,
|
||||
FLAGS.keypoint_batch_size, FLAGS.save_res)
|
||||
else:
|
||||
# predict from image
|
||||
img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
|
||||
mot_topdown_unite_predict(mot_detector, topdown_keypoint_detector,
|
||||
img_list, FLAGS.keypoint_batch_size,
|
||||
FLAGS.save_res)
|
||||
if not FLAGS.run_benchmark:
|
||||
mot_detector.det_times.info(average=True)
|
||||
topdown_keypoint_detector.det_times.info(average=True)
|
||||
else:
|
||||
mode = FLAGS.run_mode
|
||||
mot_model_dir = FLAGS.mot_model_dir
|
||||
mot_model_info = {
|
||||
'model_name': mot_model_dir.strip('/').split('/')[-1],
|
||||
'precision': mode.split('_')[-1]
|
||||
}
|
||||
bench_log(mot_detector, img_list, mot_model_info, name='MOT')
|
||||
|
||||
keypoint_model_dir = FLAGS.keypoint_model_dir
|
||||
keypoint_model_info = {
|
||||
'model_name': keypoint_model_dir.strip('/').split('/')[-1],
|
||||
'precision': mode.split('_')[-1]
|
||||
}
|
||||
bench_log(topdown_keypoint_detector, img_list, keypoint_model_info,
|
||||
FLAGS.keypoint_batch_size, 'KeyPoint')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
paddle.enable_static()
|
||||
parser = argsparser()
|
||||
FLAGS = parser.parse_args()
|
||||
print_arguments(FLAGS)
|
||||
FLAGS.device = FLAGS.device.upper()
|
||||
assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
|
||||
], "device should be CPU, GPU, NPU or XPU"
|
||||
|
||||
main()
|
||||
139
third-party/paddle-inference/mot_keypoint_unite_utils.py
vendored
Normal file
139
third-party/paddle-inference/mot_keypoint_unite_utils.py
vendored
Normal file
@@ -0,0 +1,139 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import ast
|
||||
import argparse
|
||||
|
||||
|
||||
def argsparser():
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument(
|
||||
"--mot_model_dir",
|
||||
type=str,
|
||||
default=None,
|
||||
help=("Directory include:'model.pdiparams', 'model.pdmodel', "
|
||||
"'infer_cfg.yml', created by tools/export_model.py."),
|
||||
required=True)
|
||||
parser.add_argument(
|
||||
"--keypoint_model_dir",
|
||||
type=str,
|
||||
default=None,
|
||||
help=("Directory include:'model.pdiparams', 'model.pdmodel', "
|
||||
"'infer_cfg.yml', created by tools/export_model.py."),
|
||||
required=True)
|
||||
parser.add_argument(
|
||||
"--image_file", type=str, default=None, help="Path of image file.")
|
||||
parser.add_argument(
|
||||
"--image_dir",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Dir of image file, `image_file` has a higher priority.")
|
||||
parser.add_argument(
|
||||
"--keypoint_batch_size",
|
||||
type=int,
|
||||
default=1,
|
||||
help=("batch_size for keypoint inference. In detection-keypoint unit"
|
||||
"inference, the batch size in detection is 1. Then collate det "
|
||||
"result in batch for keypoint inference."))
|
||||
parser.add_argument(
|
||||
"--video_file",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Path of video file, `video_file` or `camera_id` has a highest priority."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--camera_id",
|
||||
type=int,
|
||||
default=-1,
|
||||
help="device id of camera to predict.")
|
||||
parser.add_argument(
|
||||
"--mot_threshold", type=float, default=0.5, help="Threshold of score.")
|
||||
parser.add_argument(
|
||||
"--keypoint_threshold",
|
||||
type=float,
|
||||
default=0.5,
|
||||
help="Threshold of score.")
|
||||
parser.add_argument(
|
||||
"--output_dir",
|
||||
type=str,
|
||||
default="output",
|
||||
help="Directory of output visualization files.")
|
||||
parser.add_argument(
|
||||
"--run_mode",
|
||||
type=str,
|
||||
default='paddle',
|
||||
help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
type=str,
|
||||
default='cpu',
|
||||
help="Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--run_benchmark",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="Whether to predict a image_file repeatedly for benchmark")
|
||||
parser.add_argument(
|
||||
"--enable_mkldnn",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="Whether use mkldnn with CPU.")
|
||||
parser.add_argument(
|
||||
"--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
|
||||
parser.add_argument(
|
||||
"--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
|
||||
parser.add_argument(
|
||||
"--trt_max_shape",
|
||||
type=int,
|
||||
default=1088,
|
||||
help="max_shape for TensorRT.")
|
||||
parser.add_argument(
|
||||
"--trt_opt_shape",
|
||||
type=int,
|
||||
default=608,
|
||||
help="opt_shape for TensorRT.")
|
||||
parser.add_argument(
|
||||
"--trt_calib_mode",
|
||||
type=bool,
|
||||
default=False,
|
||||
help="If the model is produced by TRT offline quantitative "
|
||||
"calibration, trt_calib_mode need to set True.")
|
||||
parser.add_argument(
|
||||
'--save_images',
|
||||
action='store_true',
|
||||
help='Save visualization image results.')
|
||||
parser.add_argument(
|
||||
'--save_mot_txts',
|
||||
action='store_true',
|
||||
help='Save tracking results (txt).')
|
||||
parser.add_argument(
|
||||
'--use_dark',
|
||||
type=bool,
|
||||
default=True,
|
||||
help='whether to use darkpose to get better keypoint position predict ')
|
||||
parser.add_argument(
|
||||
'--save_res',
|
||||
type=bool,
|
||||
default=False,
|
||||
help=(
|
||||
"whether to save predict results to json file"
|
||||
"1) store_res: a list of image_data"
|
||||
"2) image_data: [imageid, rects, [keypoints, scores]]"
|
||||
"3) rects: list of rect [xmin, ymin, xmax, ymax]"
|
||||
"4) keypoints: 17(joint numbers)*[x, y, conf], total 51 data in list"
|
||||
"5) scores: mean of all joint conf"))
|
||||
parser.add_argument(
|
||||
"--tracker_config", type=str, default=None, help=("tracker donfig"))
|
||||
return parser
|
||||
522
third-party/paddle-inference/mot_sde_infer.py
vendored
Normal file
522
third-party/paddle-inference/mot_sde_infer.py
vendored
Normal file
@@ -0,0 +1,522 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import os
|
||||
import time
|
||||
import yaml
|
||||
import cv2
|
||||
import numpy as np
|
||||
from collections import defaultdict
|
||||
import paddle
|
||||
|
||||
from benchmark_utils import PaddleInferBenchmark
|
||||
from preprocess import decode_image
|
||||
from utils import argsparser, Timer, get_current_memory_mb
|
||||
from infer import Detector, get_test_images, print_arguments, bench_log, PredictConfig, load_predictor
|
||||
|
||||
# add python path
|
||||
import sys
|
||||
parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
|
||||
sys.path.insert(0, parent_path)
|
||||
|
||||
from pptracking.python.mot import JDETracker, DeepSORTTracker
|
||||
from pptracking.python.mot.utils import MOTTimer, write_mot_results, get_crops, clip_box
|
||||
from pptracking.python.mot.visualize import plot_tracking, plot_tracking_dict
|
||||
|
||||
|
||||
class SDE_Detector(Detector):
|
||||
"""
|
||||
Args:
|
||||
model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
|
||||
tracker_config (str): tracker config path
|
||||
device (str): Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU
|
||||
run_mode (str): mode of running(paddle/trt_fp32/trt_fp16)
|
||||
batch_size (int): size of pre batch in inference
|
||||
trt_min_shape (int): min shape for dynamic shape in trt
|
||||
trt_max_shape (int): max shape for dynamic shape in trt
|
||||
trt_opt_shape (int): opt shape for dynamic shape in trt
|
||||
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
|
||||
calibration, trt_calib_mode need to set True
|
||||
cpu_threads (int): cpu threads
|
||||
enable_mkldnn (bool): whether to open MKLDNN
|
||||
output_dir (string): The path of output, default as 'output'
|
||||
threshold (float): Score threshold of the detected bbox, default as 0.5
|
||||
save_images (bool): Whether to save visualization image results, default as False
|
||||
save_mot_txts (bool): Whether to save tracking results (txt), default as False
|
||||
reid_model_dir (str): reid model dir, default None for ByteTrack, but set for DeepSORT
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
model_dir,
|
||||
tracker_config,
|
||||
device='CPU',
|
||||
run_mode='paddle',
|
||||
batch_size=1,
|
||||
trt_min_shape=1,
|
||||
trt_max_shape=1280,
|
||||
trt_opt_shape=640,
|
||||
trt_calib_mode=False,
|
||||
cpu_threads=1,
|
||||
enable_mkldnn=False,
|
||||
output_dir='output',
|
||||
threshold=0.5,
|
||||
save_images=False,
|
||||
save_mot_txts=False,
|
||||
reid_model_dir=None):
|
||||
super(SDE_Detector, self).__init__(
|
||||
model_dir=model_dir,
|
||||
device=device,
|
||||
run_mode=run_mode,
|
||||
batch_size=batch_size,
|
||||
trt_min_shape=trt_min_shape,
|
||||
trt_max_shape=trt_max_shape,
|
||||
trt_opt_shape=trt_opt_shape,
|
||||
trt_calib_mode=trt_calib_mode,
|
||||
cpu_threads=cpu_threads,
|
||||
enable_mkldnn=enable_mkldnn,
|
||||
output_dir=output_dir,
|
||||
threshold=threshold, )
|
||||
self.save_images = save_images
|
||||
self.save_mot_txts = save_mot_txts
|
||||
assert batch_size == 1, "MOT model only supports batch_size=1."
|
||||
self.det_times = Timer(with_tracker=True)
|
||||
self.num_classes = len(self.pred_config.labels)
|
||||
|
||||
# reid config
|
||||
self.use_reid = False if reid_model_dir is None else True
|
||||
if self.use_reid:
|
||||
self.reid_pred_config = self.set_config(reid_model_dir)
|
||||
self.reid_predictor, self.config = load_predictor(
|
||||
reid_model_dir,
|
||||
run_mode=run_mode,
|
||||
batch_size=50, # reid_batch_size
|
||||
min_subgraph_size=self.reid_pred_config.min_subgraph_size,
|
||||
device=device,
|
||||
use_dynamic_shape=self.reid_pred_config.use_dynamic_shape,
|
||||
trt_min_shape=trt_min_shape,
|
||||
trt_max_shape=trt_max_shape,
|
||||
trt_opt_shape=trt_opt_shape,
|
||||
trt_calib_mode=trt_calib_mode,
|
||||
cpu_threads=cpu_threads,
|
||||
enable_mkldnn=enable_mkldnn)
|
||||
else:
|
||||
self.reid_pred_config = None
|
||||
self.reid_predictor = None
|
||||
|
||||
assert tracker_config is not None, 'Note that tracker_config should be set.'
|
||||
self.tracker_config = tracker_config
|
||||
tracker_cfg = yaml.safe_load(open(self.tracker_config))
|
||||
cfg = tracker_cfg[tracker_cfg['type']]
|
||||
|
||||
# tracker config
|
||||
self.use_deepsort_tracker = True if tracker_cfg[
|
||||
'type'] == 'DeepSORTTracker' else False
|
||||
if self.use_deepsort_tracker:
|
||||
# use DeepSORTTracker
|
||||
if self.reid_pred_config is not None and hasattr(
|
||||
self.reid_pred_config, 'tracker'):
|
||||
cfg = self.reid_pred_config.tracker
|
||||
budget = cfg.get('budget', 100)
|
||||
max_age = cfg.get('max_age', 30)
|
||||
max_iou_distance = cfg.get('max_iou_distance', 0.7)
|
||||
matching_threshold = cfg.get('matching_threshold', 0.2)
|
||||
min_box_area = cfg.get('min_box_area', 0)
|
||||
vertical_ratio = cfg.get('vertical_ratio', 0)
|
||||
|
||||
self.tracker = DeepSORTTracker(
|
||||
budget=budget,
|
||||
max_age=max_age,
|
||||
max_iou_distance=max_iou_distance,
|
||||
matching_threshold=matching_threshold,
|
||||
min_box_area=min_box_area,
|
||||
vertical_ratio=vertical_ratio, )
|
||||
else:
|
||||
# use ByteTracker
|
||||
use_byte = cfg.get('use_byte', False)
|
||||
det_thresh = cfg.get('det_thresh', 0.3)
|
||||
min_box_area = cfg.get('min_box_area', 0)
|
||||
vertical_ratio = cfg.get('vertical_ratio', 0)
|
||||
match_thres = cfg.get('match_thres', 0.9)
|
||||
conf_thres = cfg.get('conf_thres', 0.6)
|
||||
low_conf_thres = cfg.get('low_conf_thres', 0.1)
|
||||
|
||||
self.tracker = JDETracker(
|
||||
use_byte=use_byte,
|
||||
det_thresh=det_thresh,
|
||||
num_classes=self.num_classes,
|
||||
min_box_area=min_box_area,
|
||||
vertical_ratio=vertical_ratio,
|
||||
match_thres=match_thres,
|
||||
conf_thres=conf_thres,
|
||||
low_conf_thres=low_conf_thres, )
|
||||
|
||||
def postprocess(self, inputs, result):
|
||||
# postprocess output of predictor
|
||||
np_boxes_num = result['boxes_num']
|
||||
if np_boxes_num[0] <= 0:
|
||||
print('[WARNNING] No object detected.')
|
||||
result = {'boxes': np.zeros([0, 6]), 'boxes_num': [0]}
|
||||
result = {k: v for k, v in result.items() if v is not None}
|
||||
return result
|
||||
|
||||
def reidprocess(self, det_results, repeats=1):
|
||||
pred_dets = det_results['boxes']
|
||||
pred_xyxys = pred_dets[:, 2:6]
|
||||
|
||||
ori_image = det_results['ori_image']
|
||||
ori_image_shape = ori_image.shape[:2]
|
||||
pred_xyxys, keep_idx = clip_box(pred_xyxys, ori_image_shape)
|
||||
|
||||
if len(keep_idx[0]) == 0:
|
||||
det_results['boxes'] = np.zeros((1, 6), dtype=np.float32)
|
||||
det_results['embeddings'] = None
|
||||
return det_results
|
||||
|
||||
pred_dets = pred_dets[keep_idx[0]]
|
||||
pred_xyxys = pred_dets[:, 2:6]
|
||||
|
||||
w, h = self.tracker.input_size
|
||||
crops = get_crops(pred_xyxys, ori_image, w, h)
|
||||
|
||||
# to keep fast speed, only use topk crops
|
||||
crops = crops[:50] # reid_batch_size
|
||||
det_results['crops'] = np.array(crops).astype('float32')
|
||||
det_results['boxes'] = pred_dets[:50]
|
||||
|
||||
input_names = self.reid_predictor.get_input_names()
|
||||
for i in range(len(input_names)):
|
||||
input_tensor = self.reid_predictor.get_input_handle(input_names[i])
|
||||
input_tensor.copy_from_cpu(det_results[input_names[i]])
|
||||
|
||||
# model prediction
|
||||
for i in range(repeats):
|
||||
self.reid_predictor.run()
|
||||
output_names = self.reid_predictor.get_output_names()
|
||||
feature_tensor = self.reid_predictor.get_output_handle(output_names[
|
||||
0])
|
||||
pred_embs = feature_tensor.copy_to_cpu()
|
||||
|
||||
det_results['embeddings'] = pred_embs
|
||||
return det_results
|
||||
|
||||
def tracking(self, det_results):
|
||||
pred_dets = det_results['boxes'] # 'cls_id, score, x0, y0, x1, y1'
|
||||
pred_embs = det_results.get('embeddings', None)
|
||||
|
||||
if self.use_deepsort_tracker:
|
||||
# use DeepSORTTracker, only support singe class
|
||||
self.tracker.predict()
|
||||
online_targets = self.tracker.update(pred_dets, pred_embs)
|
||||
online_tlwhs, online_scores, online_ids = [], [], []
|
||||
for t in online_targets:
|
||||
if not t.is_confirmed() or t.time_since_update > 1:
|
||||
continue
|
||||
tlwh = t.to_tlwh()
|
||||
tscore = t.score
|
||||
tid = t.track_id
|
||||
if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
|
||||
3] > self.tracker.vertical_ratio:
|
||||
continue
|
||||
online_tlwhs.append(tlwh)
|
||||
online_scores.append(tscore)
|
||||
online_ids.append(tid)
|
||||
|
||||
tracking_outs = {
|
||||
'online_tlwhs': online_tlwhs,
|
||||
'online_scores': online_scores,
|
||||
'online_ids': online_ids,
|
||||
}
|
||||
return tracking_outs
|
||||
else:
|
||||
# use ByteTracker, support multiple class
|
||||
online_tlwhs = defaultdict(list)
|
||||
online_scores = defaultdict(list)
|
||||
online_ids = defaultdict(list)
|
||||
online_targets_dict = self.tracker.update(pred_dets, pred_embs)
|
||||
for cls_id in range(self.num_classes):
|
||||
online_targets = online_targets_dict[cls_id]
|
||||
for t in online_targets:
|
||||
tlwh = t.tlwh
|
||||
tid = t.track_id
|
||||
tscore = t.score
|
||||
if tlwh[2] * tlwh[3] <= self.tracker.min_box_area:
|
||||
continue
|
||||
if self.tracker.vertical_ratio > 0 and tlwh[2] / tlwh[
|
||||
3] > self.tracker.vertical_ratio:
|
||||
continue
|
||||
online_tlwhs[cls_id].append(tlwh)
|
||||
online_ids[cls_id].append(tid)
|
||||
online_scores[cls_id].append(tscore)
|
||||
|
||||
tracking_outs = {
|
||||
'online_tlwhs': online_tlwhs,
|
||||
'online_scores': online_scores,
|
||||
'online_ids': online_ids,
|
||||
}
|
||||
return tracking_outs
|
||||
|
||||
def predict_image(self,
|
||||
image_list,
|
||||
run_benchmark=False,
|
||||
repeats=1,
|
||||
visual=True,
|
||||
seq_name=None):
|
||||
num_classes = self.num_classes
|
||||
image_list.sort()
|
||||
ids2names = self.pred_config.labels
|
||||
mot_results = []
|
||||
for frame_id, img_file in enumerate(image_list):
|
||||
batch_image_list = [img_file] # bs=1 in MOT model
|
||||
frame, _ = decode_image(img_file, {})
|
||||
if run_benchmark:
|
||||
# preprocess
|
||||
inputs = self.preprocess(batch_image_list) # warmup
|
||||
self.det_times.preprocess_time_s.start()
|
||||
inputs = self.preprocess(batch_image_list)
|
||||
self.det_times.preprocess_time_s.end()
|
||||
|
||||
# model prediction
|
||||
result_warmup = self.predict(repeats=repeats) # warmup
|
||||
self.det_times.inference_time_s.start()
|
||||
result = self.predict(repeats=repeats)
|
||||
self.det_times.inference_time_s.end(repeats=repeats)
|
||||
|
||||
# postprocess
|
||||
result_warmup = self.postprocess(inputs, result) # warmup
|
||||
self.det_times.postprocess_time_s.start()
|
||||
det_result = self.postprocess(inputs, result)
|
||||
self.det_times.postprocess_time_s.end()
|
||||
|
||||
# tracking
|
||||
if self.use_reid:
|
||||
det_result['frame_id'] = frame_id
|
||||
det_result['seq_name'] = seq_name
|
||||
det_result['ori_image'] = frame
|
||||
det_result = self.reidprocess(det_result)
|
||||
result_warmup = self.tracking(det_result)
|
||||
self.det_times.tracking_time_s.start()
|
||||
if self.use_reid:
|
||||
det_result = self.reidprocess(det_result)
|
||||
tracking_outs = self.tracking(det_result)
|
||||
self.det_times.tracking_time_s.end()
|
||||
self.det_times.img_num += 1
|
||||
|
||||
cm, gm, gu = get_current_memory_mb()
|
||||
self.cpu_mem += cm
|
||||
self.gpu_mem += gm
|
||||
self.gpu_util += gu
|
||||
|
||||
else:
|
||||
self.det_times.preprocess_time_s.start()
|
||||
inputs = self.preprocess(batch_image_list)
|
||||
self.det_times.preprocess_time_s.end()
|
||||
|
||||
self.det_times.inference_time_s.start()
|
||||
result = self.predict()
|
||||
self.det_times.inference_time_s.end()
|
||||
|
||||
self.det_times.postprocess_time_s.start()
|
||||
det_result = self.postprocess(inputs, result)
|
||||
self.det_times.postprocess_time_s.end()
|
||||
|
||||
# tracking process
|
||||
self.det_times.tracking_time_s.start()
|
||||
if self.use_reid:
|
||||
det_result['frame_id'] = frame_id
|
||||
det_result['seq_name'] = seq_name
|
||||
det_result['ori_image'] = frame
|
||||
det_result = self.reidprocess(det_result)
|
||||
tracking_outs = self.tracking(det_result)
|
||||
self.det_times.tracking_time_s.end()
|
||||
self.det_times.img_num += 1
|
||||
|
||||
online_tlwhs = tracking_outs['online_tlwhs']
|
||||
online_scores = tracking_outs['online_scores']
|
||||
online_ids = tracking_outs['online_ids']
|
||||
|
||||
mot_results.append([online_tlwhs, online_scores, online_ids])
|
||||
|
||||
if visual:
|
||||
if len(image_list) > 1 and frame_id % 10 == 0:
|
||||
print('Tracking frame {}'.format(frame_id))
|
||||
frame, _ = decode_image(img_file, {})
|
||||
if isinstance(online_tlwhs, defaultdict):
|
||||
im = plot_tracking_dict(
|
||||
frame,
|
||||
num_classes,
|
||||
online_tlwhs,
|
||||
online_ids,
|
||||
online_scores,
|
||||
frame_id=frame_id,
|
||||
ids2names=ids2names)
|
||||
else:
|
||||
im = plot_tracking(
|
||||
frame,
|
||||
online_tlwhs,
|
||||
online_ids,
|
||||
online_scores,
|
||||
frame_id=frame_id,
|
||||
ids2names=ids2names)
|
||||
save_dir = os.path.join(self.output_dir, seq_name)
|
||||
if not os.path.exists(save_dir):
|
||||
os.makedirs(save_dir)
|
||||
cv2.imwrite(
|
||||
os.path.join(save_dir, '{:05d}.jpg'.format(frame_id)), im)
|
||||
|
||||
return mot_results
|
||||
|
||||
def predict_video(self, video_file, camera_id):
|
||||
video_out_name = 'output.mp4'
|
||||
if camera_id != -1:
|
||||
capture = cv2.VideoCapture(camera_id)
|
||||
else:
|
||||
capture = cv2.VideoCapture(video_file)
|
||||
video_out_name = os.path.split(video_file)[-1]
|
||||
# Get Video info : resolution, fps, frame count
|
||||
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
|
||||
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
|
||||
fps = int(capture.get(cv2.CAP_PROP_FPS))
|
||||
frame_count = int(capture.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||
print("fps: %d, frame_count: %d" % (fps, frame_count))
|
||||
|
||||
if not os.path.exists(self.output_dir):
|
||||
os.makedirs(self.output_dir)
|
||||
out_path = os.path.join(self.output_dir, video_out_name)
|
||||
video_format = 'mp4v'
|
||||
fourcc = cv2.VideoWriter_fourcc(*video_format)
|
||||
writer = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
|
||||
|
||||
frame_id = 1
|
||||
timer = MOTTimer()
|
||||
results = defaultdict(list)
|
||||
num_classes = self.num_classes
|
||||
data_type = 'mcmot' if num_classes > 1 else 'mot'
|
||||
ids2names = self.pred_config.labels
|
||||
|
||||
while (1):
|
||||
ret, frame = capture.read()
|
||||
if not ret:
|
||||
break
|
||||
if frame_id % 10 == 0:
|
||||
print('Tracking frame: %d' % (frame_id))
|
||||
frame_id += 1
|
||||
|
||||
timer.tic()
|
||||
seq_name = video_out_name.split('.')[0]
|
||||
mot_results = self.predict_image(
|
||||
[frame[:, :, ::-1]], visual=False, seq_name=seq_name)
|
||||
timer.toc()
|
||||
|
||||
# bs=1 in MOT model
|
||||
online_tlwhs, online_scores, online_ids = mot_results[0]
|
||||
|
||||
fps = 1. / timer.duration
|
||||
if self.use_deepsort_tracker:
|
||||
# use DeepSORTTracker, only support singe class
|
||||
results[0].append(
|
||||
(frame_id + 1, online_tlwhs, online_scores, online_ids))
|
||||
im = plot_tracking(
|
||||
frame,
|
||||
online_tlwhs,
|
||||
online_ids,
|
||||
online_scores,
|
||||
frame_id=frame_id,
|
||||
fps=fps,
|
||||
ids2names=ids2names)
|
||||
else:
|
||||
# use ByteTracker, support multiple class
|
||||
for cls_id in range(num_classes):
|
||||
results[cls_id].append(
|
||||
(frame_id + 1, online_tlwhs[cls_id],
|
||||
online_scores[cls_id], online_ids[cls_id]))
|
||||
im = plot_tracking_dict(
|
||||
frame,
|
||||
num_classes,
|
||||
online_tlwhs,
|
||||
online_ids,
|
||||
online_scores,
|
||||
frame_id=frame_id,
|
||||
fps=fps,
|
||||
ids2names=ids2names)
|
||||
|
||||
writer.write(im)
|
||||
if camera_id != -1:
|
||||
cv2.imshow('Mask Detection', im)
|
||||
if cv2.waitKey(1) & 0xFF == ord('q'):
|
||||
break
|
||||
|
||||
if self.save_mot_txts:
|
||||
result_filename = os.path.join(
|
||||
self.output_dir, video_out_name.split('.')[-2] + '.txt')
|
||||
write_mot_results(result_filename, results)
|
||||
|
||||
writer.release()
|
||||
|
||||
|
||||
def main():
|
||||
deploy_file = os.path.join(FLAGS.model_dir, 'infer_cfg.yml')
|
||||
with open(deploy_file) as f:
|
||||
yml_conf = yaml.safe_load(f)
|
||||
arch = yml_conf['arch']
|
||||
detector = SDE_Detector(
|
||||
FLAGS.model_dir,
|
||||
tracker_config=FLAGS.tracker_config,
|
||||
device=FLAGS.device,
|
||||
run_mode=FLAGS.run_mode,
|
||||
batch_size=1,
|
||||
trt_min_shape=FLAGS.trt_min_shape,
|
||||
trt_max_shape=FLAGS.trt_max_shape,
|
||||
trt_opt_shape=FLAGS.trt_opt_shape,
|
||||
trt_calib_mode=FLAGS.trt_calib_mode,
|
||||
cpu_threads=FLAGS.cpu_threads,
|
||||
enable_mkldnn=FLAGS.enable_mkldnn,
|
||||
output_dir=FLAGS.output_dir,
|
||||
threshold=FLAGS.threshold,
|
||||
save_images=FLAGS.save_images,
|
||||
save_mot_txts=FLAGS.save_mot_txts, )
|
||||
|
||||
# predict from video file or camera video stream
|
||||
if FLAGS.video_file is not None or FLAGS.camera_id != -1:
|
||||
detector.predict_video(FLAGS.video_file, FLAGS.camera_id)
|
||||
else:
|
||||
# predict from image
|
||||
if FLAGS.image_dir is None and FLAGS.image_file is not None:
|
||||
assert FLAGS.batch_size == 1, "--batch_size should be 1 in MOT models."
|
||||
img_list = get_test_images(FLAGS.image_dir, FLAGS.image_file)
|
||||
seq_name = FLAGS.image_dir.split('/')[-1]
|
||||
detector.predict_image(
|
||||
img_list, FLAGS.run_benchmark, repeats=10, seq_name=seq_name)
|
||||
|
||||
if not FLAGS.run_benchmark:
|
||||
detector.det_times.info(average=True)
|
||||
else:
|
||||
mode = FLAGS.run_mode
|
||||
model_dir = FLAGS.model_dir
|
||||
model_info = {
|
||||
'model_name': model_dir.strip('/').split('/')[-1],
|
||||
'precision': mode.split('_')[-1]
|
||||
}
|
||||
bench_log(detector, img_list, model_info, name='MOT')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
paddle.enable_static()
|
||||
parser = argsparser()
|
||||
FLAGS = parser.parse_args()
|
||||
print_arguments(FLAGS)
|
||||
FLAGS.device = FLAGS.device.upper()
|
||||
assert FLAGS.device in ['CPU', 'GPU', 'XPU', 'NPU'
|
||||
], "device should be CPU, GPU, NPU or XPU"
|
||||
|
||||
main()
|
||||
227
third-party/paddle-inference/picodet_postprocess.py
vendored
Normal file
227
third-party/paddle-inference/picodet_postprocess.py
vendored
Normal file
@@ -0,0 +1,227 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import numpy as np
|
||||
from scipy.special import softmax
|
||||
|
||||
|
||||
def hard_nms(box_scores, iou_threshold, top_k=-1, candidate_size=200):
|
||||
"""
|
||||
Args:
|
||||
box_scores (N, 5): boxes in corner-form and probabilities.
|
||||
iou_threshold: intersection over union threshold.
|
||||
top_k: keep top_k results. If k <= 0, keep all the results.
|
||||
candidate_size: only consider the candidates with the highest scores.
|
||||
Returns:
|
||||
picked: a list of indexes of the kept boxes
|
||||
"""
|
||||
scores = box_scores[:, -1]
|
||||
boxes = box_scores[:, :-1]
|
||||
picked = []
|
||||
indexes = np.argsort(scores)
|
||||
indexes = indexes[-candidate_size:]
|
||||
while len(indexes) > 0:
|
||||
current = indexes[-1]
|
||||
picked.append(current)
|
||||
if 0 < top_k == len(picked) or len(indexes) == 1:
|
||||
break
|
||||
current_box = boxes[current, :]
|
||||
indexes = indexes[:-1]
|
||||
rest_boxes = boxes[indexes, :]
|
||||
iou = iou_of(
|
||||
rest_boxes,
|
||||
np.expand_dims(
|
||||
current_box, axis=0), )
|
||||
indexes = indexes[iou <= iou_threshold]
|
||||
|
||||
return box_scores[picked, :]
|
||||
|
||||
|
||||
def iou_of(boxes0, boxes1, eps=1e-5):
|
||||
"""Return intersection-over-union (Jaccard index) of boxes.
|
||||
Args:
|
||||
boxes0 (N, 4): ground truth boxes.
|
||||
boxes1 (N or 1, 4): predicted boxes.
|
||||
eps: a small number to avoid 0 as denominator.
|
||||
Returns:
|
||||
iou (N): IoU values.
|
||||
"""
|
||||
overlap_left_top = np.maximum(boxes0[..., :2], boxes1[..., :2])
|
||||
overlap_right_bottom = np.minimum(boxes0[..., 2:], boxes1[..., 2:])
|
||||
|
||||
overlap_area = area_of(overlap_left_top, overlap_right_bottom)
|
||||
area0 = area_of(boxes0[..., :2], boxes0[..., 2:])
|
||||
area1 = area_of(boxes1[..., :2], boxes1[..., 2:])
|
||||
return overlap_area / (area0 + area1 - overlap_area + eps)
|
||||
|
||||
|
||||
def area_of(left_top, right_bottom):
|
||||
"""Compute the areas of rectangles given two corners.
|
||||
Args:
|
||||
left_top (N, 2): left top corner.
|
||||
right_bottom (N, 2): right bottom corner.
|
||||
Returns:
|
||||
area (N): return the area.
|
||||
"""
|
||||
hw = np.clip(right_bottom - left_top, 0.0, None)
|
||||
return hw[..., 0] * hw[..., 1]
|
||||
|
||||
|
||||
class PicoDetPostProcess(object):
|
||||
"""
|
||||
Args:
|
||||
input_shape (int): network input image size
|
||||
ori_shape (int): ori image shape of before padding
|
||||
scale_factor (float): scale factor of ori image
|
||||
enable_mkldnn (bool): whether to open MKLDNN
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
input_shape,
|
||||
ori_shape,
|
||||
scale_factor,
|
||||
strides=[8, 16, 32, 64],
|
||||
score_threshold=0.4,
|
||||
nms_threshold=0.5,
|
||||
nms_top_k=1000,
|
||||
keep_top_k=100):
|
||||
self.ori_shape = ori_shape
|
||||
self.input_shape = input_shape
|
||||
self.scale_factor = scale_factor
|
||||
self.strides = strides
|
||||
self.score_threshold = score_threshold
|
||||
self.nms_threshold = nms_threshold
|
||||
self.nms_top_k = nms_top_k
|
||||
self.keep_top_k = keep_top_k
|
||||
|
||||
def warp_boxes(self, boxes, ori_shape):
|
||||
"""Apply transform to boxes
|
||||
"""
|
||||
width, height = ori_shape[1], ori_shape[0]
|
||||
n = len(boxes)
|
||||
if n:
|
||||
# warp points
|
||||
xy = np.ones((n * 4, 3))
|
||||
xy[:, :2] = boxes[:, [0, 1, 2, 3, 0, 3, 2, 1]].reshape(
|
||||
n * 4, 2) # x1y1, x2y2, x1y2, x2y1
|
||||
# xy = xy @ M.T # transform
|
||||
xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8) # rescale
|
||||
# create new boxes
|
||||
x = xy[:, [0, 2, 4, 6]]
|
||||
y = xy[:, [1, 3, 5, 7]]
|
||||
xy = np.concatenate(
|
||||
(x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T
|
||||
# clip boxes
|
||||
xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width)
|
||||
xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height)
|
||||
return xy.astype(np.float32)
|
||||
else:
|
||||
return boxes
|
||||
|
||||
def __call__(self, scores, raw_boxes):
|
||||
batch_size = raw_boxes[0].shape[0]
|
||||
reg_max = int(raw_boxes[0].shape[-1] / 4 - 1)
|
||||
out_boxes_num = []
|
||||
out_boxes_list = []
|
||||
for batch_id in range(batch_size):
|
||||
# generate centers
|
||||
decode_boxes = []
|
||||
select_scores = []
|
||||
for stride, box_distribute, score in zip(self.strides, raw_boxes,
|
||||
scores):
|
||||
box_distribute = box_distribute[batch_id]
|
||||
score = score[batch_id]
|
||||
# centers
|
||||
fm_h = self.input_shape[0] / stride
|
||||
fm_w = self.input_shape[1] / stride
|
||||
h_range = np.arange(fm_h)
|
||||
w_range = np.arange(fm_w)
|
||||
ww, hh = np.meshgrid(w_range, h_range)
|
||||
ct_row = (hh.flatten() + 0.5) * stride
|
||||
ct_col = (ww.flatten() + 0.5) * stride
|
||||
center = np.stack((ct_col, ct_row, ct_col, ct_row), axis=1)
|
||||
|
||||
# box distribution to distance
|
||||
reg_range = np.arange(reg_max + 1)
|
||||
box_distance = box_distribute.reshape((-1, reg_max + 1))
|
||||
box_distance = softmax(box_distance, axis=1)
|
||||
box_distance = box_distance * np.expand_dims(reg_range, axis=0)
|
||||
box_distance = np.sum(box_distance, axis=1).reshape((-1, 4))
|
||||
box_distance = box_distance * stride
|
||||
|
||||
# top K candidate
|
||||
topk_idx = np.argsort(score.max(axis=1))[::-1]
|
||||
topk_idx = topk_idx[:self.nms_top_k]
|
||||
center = center[topk_idx]
|
||||
score = score[topk_idx]
|
||||
box_distance = box_distance[topk_idx]
|
||||
|
||||
# decode box
|
||||
decode_box = center + [-1, -1, 1, 1] * box_distance
|
||||
|
||||
select_scores.append(score)
|
||||
decode_boxes.append(decode_box)
|
||||
|
||||
# nms
|
||||
bboxes = np.concatenate(decode_boxes, axis=0)
|
||||
confidences = np.concatenate(select_scores, axis=0)
|
||||
picked_box_probs = []
|
||||
picked_labels = []
|
||||
for class_index in range(0, confidences.shape[1]):
|
||||
probs = confidences[:, class_index]
|
||||
mask = probs > self.score_threshold
|
||||
probs = probs[mask]
|
||||
if probs.shape[0] == 0:
|
||||
continue
|
||||
subset_boxes = bboxes[mask, :]
|
||||
box_probs = np.concatenate(
|
||||
[subset_boxes, probs.reshape(-1, 1)], axis=1)
|
||||
box_probs = hard_nms(
|
||||
box_probs,
|
||||
iou_threshold=self.nms_threshold,
|
||||
top_k=self.keep_top_k, )
|
||||
picked_box_probs.append(box_probs)
|
||||
picked_labels.extend([class_index] * box_probs.shape[0])
|
||||
|
||||
if len(picked_box_probs) == 0:
|
||||
out_boxes_list.append(np.empty((0, 4)))
|
||||
out_boxes_num.append(0)
|
||||
|
||||
else:
|
||||
picked_box_probs = np.concatenate(picked_box_probs)
|
||||
|
||||
# resize output boxes
|
||||
picked_box_probs[:, :4] = self.warp_boxes(
|
||||
picked_box_probs[:, :4], self.ori_shape[batch_id])
|
||||
im_scale = np.concatenate([
|
||||
self.scale_factor[batch_id][::-1],
|
||||
self.scale_factor[batch_id][::-1]
|
||||
])
|
||||
picked_box_probs[:, :4] /= im_scale
|
||||
# clas score box
|
||||
out_boxes_list.append(
|
||||
np.concatenate(
|
||||
[
|
||||
np.expand_dims(
|
||||
np.array(picked_labels),
|
||||
axis=-1), np.expand_dims(
|
||||
picked_box_probs[:, 4], axis=-1),
|
||||
picked_box_probs[:, :4]
|
||||
],
|
||||
axis=1))
|
||||
out_boxes_num.append(len(picked_labels))
|
||||
|
||||
out_boxes_list = np.concatenate(out_boxes_list, axis=0)
|
||||
out_boxes_num = np.asarray(out_boxes_num).astype(np.int32)
|
||||
return out_boxes_list, out_boxes_num
|
||||
549
third-party/paddle-inference/preprocess.py
vendored
Normal file
549
third-party/paddle-inference/preprocess.py
vendored
Normal file
@@ -0,0 +1,549 @@
|
||||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import cv2
|
||||
import numpy as np
|
||||
import imgaug.augmenters as iaa
|
||||
from keypoint_preprocess import get_affine_transform
|
||||
from PIL import Image
|
||||
|
||||
|
||||
def decode_image(im_file, im_info):
|
||||
"""read rgb image
|
||||
Args:
|
||||
im_file (str|np.ndarray): input can be image path or np.ndarray
|
||||
im_info (dict): info of image
|
||||
Returns:
|
||||
im (np.ndarray): processed image (np.ndarray)
|
||||
im_info (dict): info of processed image
|
||||
"""
|
||||
if isinstance(im_file, str):
|
||||
with open(im_file, 'rb') as f:
|
||||
im_read = f.read()
|
||||
data = np.frombuffer(im_read, dtype='uint8')
|
||||
im = cv2.imdecode(data, 1) # BGR mode, but need RGB mode
|
||||
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
|
||||
else:
|
||||
im = im_file
|
||||
im_info['im_shape'] = np.array(im.shape[:2], dtype=np.float32)
|
||||
im_info['scale_factor'] = np.array([1., 1.], dtype=np.float32)
|
||||
return im, im_info
|
||||
|
||||
|
||||
class Resize_Mult32(object):
|
||||
"""resize image by target_size and max_size
|
||||
Args:
|
||||
target_size (int): the target size of image
|
||||
keep_ratio (bool): whether keep_ratio or not, default true
|
||||
interp (int): method of resize
|
||||
"""
|
||||
|
||||
def __init__(self, limit_side_len, limit_type, interp=cv2.INTER_LINEAR):
|
||||
self.limit_side_len = limit_side_len
|
||||
self.limit_type = limit_type
|
||||
self.interp = interp
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
"""
|
||||
Args:
|
||||
im (np.ndarray): image (np.ndarray)
|
||||
im_info (dict): info of image
|
||||
Returns:
|
||||
im (np.ndarray): processed image (np.ndarray)
|
||||
im_info (dict): info of processed image
|
||||
"""
|
||||
im_channel = im.shape[2]
|
||||
im_scale_y, im_scale_x = self.generate_scale(im)
|
||||
im = cv2.resize(
|
||||
im,
|
||||
None,
|
||||
None,
|
||||
fx=im_scale_x,
|
||||
fy=im_scale_y,
|
||||
interpolation=self.interp)
|
||||
im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
|
||||
im_info['scale_factor'] = np.array(
|
||||
[im_scale_y, im_scale_x]).astype('float32')
|
||||
return im, im_info
|
||||
|
||||
def generate_scale(self, img):
|
||||
"""
|
||||
Args:
|
||||
img (np.ndarray): image (np.ndarray)
|
||||
Returns:
|
||||
im_scale_x: the resize ratio of X
|
||||
im_scale_y: the resize ratio of Y
|
||||
"""
|
||||
limit_side_len = self.limit_side_len
|
||||
h, w, c = img.shape
|
||||
|
||||
# limit the max side
|
||||
if self.limit_type == 'max':
|
||||
if h > w:
|
||||
ratio = float(limit_side_len) / h
|
||||
else:
|
||||
ratio = float(limit_side_len) / w
|
||||
elif self.limit_type == 'min':
|
||||
if h < w:
|
||||
ratio = float(limit_side_len) / h
|
||||
else:
|
||||
ratio = float(limit_side_len) / w
|
||||
elif self.limit_type == 'resize_long':
|
||||
ratio = float(limit_side_len) / max(h, w)
|
||||
else:
|
||||
raise Exception('not support limit type, image ')
|
||||
resize_h = int(h * ratio)
|
||||
resize_w = int(w * ratio)
|
||||
|
||||
resize_h = max(int(round(resize_h / 32) * 32), 32)
|
||||
resize_w = max(int(round(resize_w / 32) * 32), 32)
|
||||
|
||||
im_scale_y = resize_h / float(h)
|
||||
im_scale_x = resize_w / float(w)
|
||||
return im_scale_y, im_scale_x
|
||||
|
||||
|
||||
class Resize(object):
|
||||
"""resize image by target_size and max_size
|
||||
Args:
|
||||
target_size (int): the target size of image
|
||||
keep_ratio (bool): whether keep_ratio or not, default true
|
||||
interp (int): method of resize
|
||||
"""
|
||||
|
||||
def __init__(self, target_size, keep_ratio=True, interp=cv2.INTER_LINEAR):
|
||||
if isinstance(target_size, int):
|
||||
target_size = [target_size, target_size]
|
||||
self.target_size = target_size
|
||||
self.keep_ratio = keep_ratio
|
||||
self.interp = interp
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
"""
|
||||
Args:
|
||||
im (np.ndarray): image (np.ndarray)
|
||||
im_info (dict): info of image
|
||||
Returns:
|
||||
im (np.ndarray): processed image (np.ndarray)
|
||||
im_info (dict): info of processed image
|
||||
"""
|
||||
assert len(self.target_size) == 2
|
||||
assert self.target_size[0] > 0 and self.target_size[1] > 0
|
||||
im_channel = im.shape[2]
|
||||
im_scale_y, im_scale_x = self.generate_scale(im)
|
||||
im = cv2.resize(
|
||||
im,
|
||||
None,
|
||||
None,
|
||||
fx=im_scale_x,
|
||||
fy=im_scale_y,
|
||||
interpolation=self.interp)
|
||||
im_info['im_shape'] = np.array(im.shape[:2]).astype('float32')
|
||||
im_info['scale_factor'] = np.array(
|
||||
[im_scale_y, im_scale_x]).astype('float32')
|
||||
return im, im_info
|
||||
|
||||
def generate_scale(self, im):
|
||||
"""
|
||||
Args:
|
||||
im (np.ndarray): image (np.ndarray)
|
||||
Returns:
|
||||
im_scale_x: the resize ratio of X
|
||||
im_scale_y: the resize ratio of Y
|
||||
"""
|
||||
origin_shape = im.shape[:2]
|
||||
im_c = im.shape[2]
|
||||
if self.keep_ratio:
|
||||
im_size_min = np.min(origin_shape)
|
||||
im_size_max = np.max(origin_shape)
|
||||
target_size_min = np.min(self.target_size)
|
||||
target_size_max = np.max(self.target_size)
|
||||
im_scale = float(target_size_min) / float(im_size_min)
|
||||
if np.round(im_scale * im_size_max) > target_size_max:
|
||||
im_scale = float(target_size_max) / float(im_size_max)
|
||||
im_scale_x = im_scale
|
||||
im_scale_y = im_scale
|
||||
else:
|
||||
resize_h, resize_w = self.target_size
|
||||
im_scale_y = resize_h / float(origin_shape[0])
|
||||
im_scale_x = resize_w / float(origin_shape[1])
|
||||
return im_scale_y, im_scale_x
|
||||
|
||||
|
||||
class ShortSizeScale(object):
|
||||
"""
|
||||
Scale images by short size.
|
||||
Args:
|
||||
short_size(float | int): Short size of an image will be scaled to the short_size.
|
||||
fixed_ratio(bool): Set whether to zoom according to a fixed ratio. default: True
|
||||
do_round(bool): Whether to round up when calculating the zoom ratio. default: False
|
||||
backend(str): Choose pillow or cv2 as the graphics processing backend. default: 'pillow'
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
short_size,
|
||||
fixed_ratio=True,
|
||||
keep_ratio=None,
|
||||
do_round=False,
|
||||
backend='pillow'):
|
||||
self.short_size = short_size
|
||||
assert (fixed_ratio and not keep_ratio) or (
|
||||
not fixed_ratio
|
||||
), "fixed_ratio and keep_ratio cannot be true at the same time"
|
||||
self.fixed_ratio = fixed_ratio
|
||||
self.keep_ratio = keep_ratio
|
||||
self.do_round = do_round
|
||||
|
||||
assert backend in [
|
||||
'pillow', 'cv2'
|
||||
], "Scale's backend must be pillow or cv2, but get {backend}"
|
||||
|
||||
self.backend = backend
|
||||
|
||||
def __call__(self, img):
|
||||
"""
|
||||
Performs resize operations.
|
||||
Args:
|
||||
img (PIL.Image): a PIL.Image.
|
||||
return:
|
||||
resized_img: a PIL.Image after scaling.
|
||||
"""
|
||||
|
||||
result_img = None
|
||||
|
||||
if isinstance(img, np.ndarray):
|
||||
h, w, _ = img.shape
|
||||
elif isinstance(img, Image.Image):
|
||||
w, h = img.size
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
if w <= h:
|
||||
ow = self.short_size
|
||||
if self.fixed_ratio: # default is True
|
||||
oh = int(self.short_size * 4.0 / 3.0)
|
||||
elif not self.keep_ratio: # no
|
||||
oh = self.short_size
|
||||
else:
|
||||
scale_factor = self.short_size / w
|
||||
oh = int(h * float(scale_factor) +
|
||||
0.5) if self.do_round else int(h * self.short_size / w)
|
||||
ow = int(w * float(scale_factor) +
|
||||
0.5) if self.do_round else int(w * self.short_size / h)
|
||||
else:
|
||||
oh = self.short_size
|
||||
if self.fixed_ratio:
|
||||
ow = int(self.short_size * 4.0 / 3.0)
|
||||
elif not self.keep_ratio: # no
|
||||
ow = self.short_size
|
||||
else:
|
||||
scale_factor = self.short_size / h
|
||||
oh = int(h * float(scale_factor) +
|
||||
0.5) if self.do_round else int(h * self.short_size / w)
|
||||
ow = int(w * float(scale_factor) +
|
||||
0.5) if self.do_round else int(w * self.short_size / h)
|
||||
|
||||
if type(img) == np.ndarray:
|
||||
img = Image.fromarray(img, mode='RGB')
|
||||
|
||||
if self.backend == 'pillow':
|
||||
result_img = img.resize((ow, oh), Image.BILINEAR)
|
||||
elif self.backend == 'cv2' and (self.keep_ratio is not None):
|
||||
result_img = cv2.resize(
|
||||
img, (ow, oh), interpolation=cv2.INTER_LINEAR)
|
||||
else:
|
||||
result_img = Image.fromarray(
|
||||
cv2.resize(
|
||||
np.asarray(img), (ow, oh), interpolation=cv2.INTER_LINEAR))
|
||||
|
||||
return result_img
|
||||
|
||||
|
||||
class NormalizeImage(object):
|
||||
"""normalize image
|
||||
Args:
|
||||
mean (list): im - mean
|
||||
std (list): im / std
|
||||
is_scale (bool): whether need im / 255
|
||||
norm_type (str): type in ['mean_std', 'none']
|
||||
"""
|
||||
|
||||
def __init__(self, mean, std, is_scale=True, norm_type='mean_std'):
|
||||
self.mean = mean
|
||||
self.std = std
|
||||
self.is_scale = is_scale
|
||||
self.norm_type = norm_type
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
"""
|
||||
Args:
|
||||
im (np.ndarray): image (np.ndarray)
|
||||
im_info (dict): info of image
|
||||
Returns:
|
||||
im (np.ndarray): processed image (np.ndarray)
|
||||
im_info (dict): info of processed image
|
||||
"""
|
||||
im = im.astype(np.float32, copy=False)
|
||||
if self.is_scale:
|
||||
scale = 1.0 / 255.0
|
||||
im *= scale
|
||||
|
||||
if self.norm_type == 'mean_std':
|
||||
mean = np.array(self.mean)[np.newaxis, np.newaxis, :]
|
||||
std = np.array(self.std)[np.newaxis, np.newaxis, :]
|
||||
im -= mean
|
||||
im /= std
|
||||
return im, im_info
|
||||
|
||||
|
||||
class Permute(object):
|
||||
"""permute image
|
||||
Args:
|
||||
to_bgr (bool): whether convert RGB to BGR
|
||||
channel_first (bool): whether convert HWC to CHW
|
||||
"""
|
||||
|
||||
def __init__(self, ):
|
||||
super(Permute, self).__init__()
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
"""
|
||||
Args:
|
||||
im (np.ndarray): image (np.ndarray)
|
||||
im_info (dict): info of image
|
||||
Returns:
|
||||
im (np.ndarray): processed image (np.ndarray)
|
||||
im_info (dict): info of processed image
|
||||
"""
|
||||
im = im.transpose((2, 0, 1)).copy()
|
||||
return im, im_info
|
||||
|
||||
|
||||
class PadStride(object):
|
||||
""" padding image for model with FPN, instead PadBatch(pad_to_stride) in original config
|
||||
Args:
|
||||
stride (bool): model with FPN need image shape % stride == 0
|
||||
"""
|
||||
|
||||
def __init__(self, stride=0):
|
||||
self.coarsest_stride = stride
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
"""
|
||||
Args:
|
||||
im (np.ndarray): image (np.ndarray)
|
||||
im_info (dict): info of image
|
||||
Returns:
|
||||
im (np.ndarray): processed image (np.ndarray)
|
||||
im_info (dict): info of processed image
|
||||
"""
|
||||
coarsest_stride = self.coarsest_stride
|
||||
if coarsest_stride <= 0:
|
||||
return im, im_info
|
||||
im_c, im_h, im_w = im.shape
|
||||
pad_h = int(np.ceil(float(im_h) / coarsest_stride) * coarsest_stride)
|
||||
pad_w = int(np.ceil(float(im_w) / coarsest_stride) * coarsest_stride)
|
||||
padding_im = np.zeros((im_c, pad_h, pad_w), dtype=np.float32)
|
||||
padding_im[:, :im_h, :im_w] = im
|
||||
return padding_im, im_info
|
||||
|
||||
|
||||
class LetterBoxResize(object):
|
||||
def __init__(self, target_size):
|
||||
"""
|
||||
Resize image to target size, convert normalized xywh to pixel xyxy
|
||||
format ([x_center, y_center, width, height] -> [x0, y0, x1, y1]).
|
||||
Args:
|
||||
target_size (int|list): image target size.
|
||||
"""
|
||||
super(LetterBoxResize, self).__init__()
|
||||
if isinstance(target_size, int):
|
||||
target_size = [target_size, target_size]
|
||||
self.target_size = target_size
|
||||
|
||||
def letterbox(self, img, height, width, color=(127.5, 127.5, 127.5)):
|
||||
# letterbox: resize a rectangular image to a padded rectangular
|
||||
shape = img.shape[:2] # [height, width]
|
||||
ratio_h = float(height) / shape[0]
|
||||
ratio_w = float(width) / shape[1]
|
||||
ratio = min(ratio_h, ratio_w)
|
||||
new_shape = (round(shape[1] * ratio),
|
||||
round(shape[0] * ratio)) # [width, height]
|
||||
padw = (width - new_shape[0]) / 2
|
||||
padh = (height - new_shape[1]) / 2
|
||||
top, bottom = round(padh - 0.1), round(padh + 0.1)
|
||||
left, right = round(padw - 0.1), round(padw + 0.1)
|
||||
|
||||
img = cv2.resize(
|
||||
img, new_shape, interpolation=cv2.INTER_AREA) # resized, no border
|
||||
img = cv2.copyMakeBorder(
|
||||
img, top, bottom, left, right, cv2.BORDER_CONSTANT,
|
||||
value=color) # padded rectangular
|
||||
return img, ratio, padw, padh
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
"""
|
||||
Args:
|
||||
im (np.ndarray): image (np.ndarray)
|
||||
im_info (dict): info of image
|
||||
Returns:
|
||||
im (np.ndarray): processed image (np.ndarray)
|
||||
im_info (dict): info of processed image
|
||||
"""
|
||||
assert len(self.target_size) == 2
|
||||
assert self.target_size[0] > 0 and self.target_size[1] > 0
|
||||
height, width = self.target_size
|
||||
h, w = im.shape[:2]
|
||||
im, ratio, padw, padh = self.letterbox(im, height=height, width=width)
|
||||
|
||||
new_shape = [round(h * ratio), round(w * ratio)]
|
||||
im_info['im_shape'] = np.array(new_shape, dtype=np.float32)
|
||||
im_info['scale_factor'] = np.array([ratio, ratio], dtype=np.float32)
|
||||
return im, im_info
|
||||
|
||||
|
||||
class Pad(object):
|
||||
def __init__(self, size, fill_value=[114.0, 114.0, 114.0]):
|
||||
"""
|
||||
Pad image to a specified size.
|
||||
Args:
|
||||
size (list[int]): image target size
|
||||
fill_value (list[float]): rgb value of pad area, default (114.0, 114.0, 114.0)
|
||||
"""
|
||||
super(Pad, self).__init__()
|
||||
if isinstance(size, int):
|
||||
size = [size, size]
|
||||
self.size = size
|
||||
self.fill_value = fill_value
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
im_h, im_w = im.shape[:2]
|
||||
h, w = self.size
|
||||
if h == im_h and w == im_w:
|
||||
im = im.astype(np.float32)
|
||||
return im, im_info
|
||||
|
||||
canvas = np.ones((h, w, 3), dtype=np.float32)
|
||||
canvas *= np.array(self.fill_value, dtype=np.float32)
|
||||
canvas[0:im_h, 0:im_w, :] = im.astype(np.float32)
|
||||
im = canvas
|
||||
return im, im_info
|
||||
|
||||
|
||||
class WarpAffine(object):
|
||||
"""Warp affine the image
|
||||
"""
|
||||
|
||||
def __init__(self,
|
||||
keep_res=False,
|
||||
pad=31,
|
||||
input_h=512,
|
||||
input_w=512,
|
||||
scale=0.4,
|
||||
shift=0.1,
|
||||
down_ratio=4):
|
||||
self.keep_res = keep_res
|
||||
self.pad = pad
|
||||
self.input_h = input_h
|
||||
self.input_w = input_w
|
||||
self.scale = scale
|
||||
self.shift = shift
|
||||
self.down_ratio = down_ratio
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
"""
|
||||
Args:
|
||||
im (np.ndarray): image (np.ndarray)
|
||||
im_info (dict): info of image
|
||||
Returns:
|
||||
im (np.ndarray): processed image (np.ndarray)
|
||||
im_info (dict): info of processed image
|
||||
"""
|
||||
img = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
|
||||
|
||||
h, w = img.shape[:2]
|
||||
|
||||
if self.keep_res:
|
||||
# True in detection eval/infer
|
||||
input_h = (h | self.pad) + 1
|
||||
input_w = (w | self.pad) + 1
|
||||
s = np.array([input_w, input_h], dtype=np.float32)
|
||||
c = np.array([w // 2, h // 2], dtype=np.float32)
|
||||
|
||||
else:
|
||||
# False in centertrack eval_mot/eval_mot
|
||||
s = max(h, w) * 1.0
|
||||
input_h, input_w = self.input_h, self.input_w
|
||||
c = np.array([w / 2., h / 2.], dtype=np.float32)
|
||||
|
||||
trans_input = get_affine_transform(c, s, 0, [input_w, input_h])
|
||||
img = cv2.resize(img, (w, h))
|
||||
inp = cv2.warpAffine(
|
||||
img, trans_input, (input_w, input_h), flags=cv2.INTER_LINEAR)
|
||||
|
||||
if not self.keep_res:
|
||||
out_h = input_h // self.down_ratio
|
||||
out_w = input_w // self.down_ratio
|
||||
trans_output = get_affine_transform(c, s, 0, [out_w, out_h])
|
||||
|
||||
im_info.update({
|
||||
'center': c,
|
||||
'scale': s,
|
||||
'out_height': out_h,
|
||||
'out_width': out_w,
|
||||
'inp_height': input_h,
|
||||
'inp_width': input_w,
|
||||
'trans_input': trans_input,
|
||||
'trans_output': trans_output,
|
||||
})
|
||||
return inp, im_info
|
||||
|
||||
|
||||
class CULaneResize(object):
|
||||
def __init__(self, img_h, img_w, cut_height, prob=0.5):
|
||||
super(CULaneResize, self).__init__()
|
||||
self.img_h = img_h
|
||||
self.img_w = img_w
|
||||
self.cut_height = cut_height
|
||||
self.prob = prob
|
||||
|
||||
def __call__(self, im, im_info):
|
||||
# cut
|
||||
im = im[self.cut_height:, :, :]
|
||||
# resize
|
||||
transform = iaa.Sometimes(self.prob,
|
||||
iaa.Resize({
|
||||
"height": self.img_h,
|
||||
"width": self.img_w
|
||||
}))
|
||||
im = transform(image=im.copy().astype(np.uint8))
|
||||
|
||||
im = im.astype(np.float32) / 255.
|
||||
# check transpose is need whether the func decode_image is equal to CULaneDataSet cv.imread
|
||||
im = im.transpose(2, 0, 1)
|
||||
|
||||
return im, im_info
|
||||
|
||||
|
||||
def preprocess(im, preprocess_ops):
|
||||
# process image by preprocess_ops
|
||||
im_info = {
|
||||
'scale_factor': np.array(
|
||||
[1., 1.], dtype=np.float32),
|
||||
'im_shape': None,
|
||||
}
|
||||
im, im_info = decode_image(im, im_info)
|
||||
for operator in preprocess_ops:
|
||||
im, im_info = operator(im, im_info)
|
||||
return im, im_info
|
||||
32
third-party/paddle-inference/tracker_config.yml
vendored
Normal file
32
third-party/paddle-inference/tracker_config.yml
vendored
Normal file
@@ -0,0 +1,32 @@
|
||||
# config of tracker for MOT SDE Detector, use 'JDETracker' as default.
|
||||
# The tracker of MOT JDE Detector (such as FairMOT) is exported together with the model.
|
||||
# Here 'min_box_area' and 'vertical_ratio' are set for pedestrian, you can modify for other objects tracking.
|
||||
|
||||
type: JDETracker # 'JDETracker', 'DeepSORTTracker' or 'CenterTracker'
|
||||
|
||||
# BYTETracker
|
||||
JDETracker:
|
||||
use_byte: True
|
||||
det_thresh: 0.3
|
||||
conf_thres: 0.6
|
||||
low_conf_thres: 0.1
|
||||
match_thres: 0.9
|
||||
min_box_area: 0
|
||||
vertical_ratio: 0 # 1.6 for pedestrian
|
||||
|
||||
DeepSORTTracker:
|
||||
input_size: [64, 192]
|
||||
min_box_area: 0
|
||||
vertical_ratio: -1
|
||||
budget: 100
|
||||
max_age: 70
|
||||
n_init: 3
|
||||
metric_type: cosine
|
||||
matching_threshold: 0.2
|
||||
max_iou_distance: 0.9
|
||||
|
||||
CenterTracker:
|
||||
min_box_area: -1
|
||||
vertical_ratio: -1
|
||||
track_thresh: 0.4
|
||||
pre_thresh: 0.5
|
||||
551
third-party/paddle-inference/utils.py
vendored
Normal file
551
third-party/paddle-inference/utils.py
vendored
Normal file
@@ -0,0 +1,551 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
import time
|
||||
import os
|
||||
import ast
|
||||
import argparse
|
||||
import numpy as np
|
||||
|
||||
|
||||
def argsparser():
|
||||
parser = argparse.ArgumentParser(description=__doc__)
|
||||
parser.add_argument(
|
||||
"--model_dir",
|
||||
type=str,
|
||||
default=None,
|
||||
help=("Directory include:'model.pdiparams', 'model.pdmodel', "
|
||||
"'infer_cfg.yml', created by tools/export_model.py."),
|
||||
required=True)
|
||||
parser.add_argument(
|
||||
"--image_file", type=str, default=None, help="Path of image file.")
|
||||
parser.add_argument(
|
||||
"--image_dir",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Dir of image file, `image_file` has a higher priority.")
|
||||
parser.add_argument(
|
||||
"--batch_size", type=int, default=1, help="batch_size for inference.")
|
||||
parser.add_argument(
|
||||
"--video_file",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Path of video file, `video_file` or `camera_id` has a highest priority."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--camera_id",
|
||||
type=int,
|
||||
default=-1,
|
||||
help="device id of camera to predict.")
|
||||
parser.add_argument(
|
||||
"--threshold", type=float, default=0.5, help="Threshold of score.")
|
||||
parser.add_argument(
|
||||
"--output_dir",
|
||||
type=str,
|
||||
default="output",
|
||||
help="Directory of output visualization files.")
|
||||
parser.add_argument(
|
||||
"--run_mode",
|
||||
type=str,
|
||||
default='paddle',
|
||||
help="mode of running(paddle/trt_fp32/trt_fp16/trt_int8)")
|
||||
parser.add_argument(
|
||||
"--device",
|
||||
type=str,
|
||||
default='cpu',
|
||||
help="Choose the device you want to run, it can be: CPU/GPU/XPU/NPU, default is CPU."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--use_gpu",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="Deprecated, please use `--device`.")
|
||||
parser.add_argument(
|
||||
"--run_benchmark",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="Whether to predict a image_file repeatedly for benchmark")
|
||||
parser.add_argument(
|
||||
"--enable_mkldnn",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="Whether use mkldnn with CPU.")
|
||||
parser.add_argument(
|
||||
"--enable_mkldnn_bfloat16",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="Whether use mkldnn bfloat16 inference with CPU.")
|
||||
parser.add_argument(
|
||||
"--cpu_threads", type=int, default=1, help="Num of threads with CPU.")
|
||||
parser.add_argument(
|
||||
"--trt_min_shape", type=int, default=1, help="min_shape for TensorRT.")
|
||||
parser.add_argument(
|
||||
"--trt_max_shape",
|
||||
type=int,
|
||||
default=1280,
|
||||
help="max_shape for TensorRT.")
|
||||
parser.add_argument(
|
||||
"--trt_opt_shape",
|
||||
type=int,
|
||||
default=640,
|
||||
help="opt_shape for TensorRT.")
|
||||
parser.add_argument(
|
||||
"--trt_calib_mode",
|
||||
type=bool,
|
||||
default=False,
|
||||
help="If the model is produced by TRT offline quantitative "
|
||||
"calibration, trt_calib_mode need to set True.")
|
||||
parser.add_argument(
|
||||
'--save_images',
|
||||
type=ast.literal_eval,
|
||||
default=True,
|
||||
help='Save visualization image results.')
|
||||
parser.add_argument(
|
||||
'--save_mot_txts',
|
||||
action='store_true',
|
||||
help='Save tracking results (txt).')
|
||||
parser.add_argument(
|
||||
'--save_mot_txt_per_img',
|
||||
action='store_true',
|
||||
help='Save tracking results (txt) for each image.')
|
||||
parser.add_argument(
|
||||
'--scaled',
|
||||
type=bool,
|
||||
default=False,
|
||||
help="Whether coords after detector outputs are scaled, False in JDE YOLOv3 "
|
||||
"True in general detector.")
|
||||
parser.add_argument(
|
||||
"--tracker_config", type=str, default=None, help=("tracker donfig"))
|
||||
parser.add_argument(
|
||||
"--reid_model_dir",
|
||||
type=str,
|
||||
default=None,
|
||||
help=("Directory include:'model.pdiparams', 'model.pdmodel', "
|
||||
"'infer_cfg.yml', created by tools/export_model.py."))
|
||||
parser.add_argument(
|
||||
"--reid_batch_size",
|
||||
type=int,
|
||||
default=50,
|
||||
help="max batch_size for reid model inference.")
|
||||
parser.add_argument(
|
||||
'--use_dark',
|
||||
type=ast.literal_eval,
|
||||
default=True,
|
||||
help='whether to use darkpose to get better keypoint position predict ')
|
||||
parser.add_argument(
|
||||
"--action_file",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Path of input file for action recognition.")
|
||||
parser.add_argument(
|
||||
"--window_size",
|
||||
type=int,
|
||||
default=50,
|
||||
help="Temporal size of skeleton feature for action recognition.")
|
||||
parser.add_argument(
|
||||
"--random_pad",
|
||||
type=ast.literal_eval,
|
||||
default=False,
|
||||
help="Whether do random padding for action recognition.")
|
||||
parser.add_argument(
|
||||
"--save_results",
|
||||
action='store_true',
|
||||
default=False,
|
||||
help="Whether save detection result to file using coco format")
|
||||
parser.add_argument(
|
||||
'--use_coco_category',
|
||||
action='store_true',
|
||||
default=False,
|
||||
help='Whether to use the coco format dictionary `clsid2catid`')
|
||||
parser.add_argument(
|
||||
"--slice_infer",
|
||||
action='store_true',
|
||||
help="Whether to slice the image and merge the inference results for small object detection."
|
||||
)
|
||||
parser.add_argument(
|
||||
'--slice_size',
|
||||
nargs='+',
|
||||
type=int,
|
||||
default=[640, 640],
|
||||
help="Height of the sliced image.")
|
||||
parser.add_argument(
|
||||
"--overlap_ratio",
|
||||
nargs='+',
|
||||
type=float,
|
||||
default=[0.25, 0.25],
|
||||
help="Overlap height ratio of the sliced image.")
|
||||
parser.add_argument(
|
||||
"--combine_method",
|
||||
type=str,
|
||||
default='nms',
|
||||
help="Combine method of the sliced images' detection results, choose in ['nms', 'nmm', 'concat']."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--match_threshold",
|
||||
type=float,
|
||||
default=0.6,
|
||||
help="Combine method matching threshold.")
|
||||
parser.add_argument(
|
||||
"--match_metric",
|
||||
type=str,
|
||||
default='ios',
|
||||
help="Combine method matching metric, choose in ['iou', 'ios'].")
|
||||
parser.add_argument(
|
||||
"--collect_trt_shape_info",
|
||||
action='store_true',
|
||||
default=False,
|
||||
help="Whether to collect dynamic shape before using tensorrt.")
|
||||
parser.add_argument(
|
||||
"--tuned_trt_shape_file",
|
||||
type=str,
|
||||
default="shape_range_info.pbtxt",
|
||||
help="Path of a dynamic shape file for tensorrt.")
|
||||
parser.add_argument("--use_fd_format", action="store_true")
|
||||
parser.add_argument(
|
||||
"--task_type",
|
||||
type=str,
|
||||
default='Detection',
|
||||
help="How to save the coco result, it only work with save_results==True. Optional inputs are Rotate or Detection, default is Detection."
|
||||
)
|
||||
return parser
|
||||
|
||||
|
||||
class Times(object):
|
||||
def __init__(self):
|
||||
self.time = 0.
|
||||
# start time
|
||||
self.st = 0.
|
||||
# end time
|
||||
self.et = 0.
|
||||
|
||||
def start(self):
|
||||
self.st = time.time()
|
||||
|
||||
def end(self, repeats=1, accumulative=True):
|
||||
self.et = time.time()
|
||||
if accumulative:
|
||||
self.time += (self.et - self.st) / repeats
|
||||
else:
|
||||
self.time = (self.et - self.st) / repeats
|
||||
|
||||
def reset(self):
|
||||
self.time = 0.
|
||||
self.st = 0.
|
||||
self.et = 0.
|
||||
|
||||
def value(self):
|
||||
return round(self.time, 4)
|
||||
|
||||
|
||||
class Timer(Times):
|
||||
def __init__(self, with_tracker=False):
|
||||
super(Timer, self).__init__()
|
||||
self.with_tracker = with_tracker
|
||||
self.preprocess_time_s = Times()
|
||||
self.inference_time_s = Times()
|
||||
self.postprocess_time_s = Times()
|
||||
self.tracking_time_s = Times()
|
||||
self.img_num = 0
|
||||
|
||||
def info(self, average=False):
|
||||
pre_time = self.preprocess_time_s.value()
|
||||
infer_time = self.inference_time_s.value()
|
||||
post_time = self.postprocess_time_s.value()
|
||||
track_time = self.tracking_time_s.value()
|
||||
|
||||
total_time = pre_time + infer_time + post_time
|
||||
if self.with_tracker:
|
||||
total_time = total_time + track_time
|
||||
total_time = round(total_time, 4)
|
||||
print("------------------ Inference Time Info ----------------------")
|
||||
print("total_time(ms): {}, img_num: {}".format(total_time * 1000,
|
||||
self.img_num))
|
||||
preprocess_time = round(pre_time / max(1, self.img_num),
|
||||
4) if average else pre_time
|
||||
postprocess_time = round(post_time / max(1, self.img_num),
|
||||
4) if average else post_time
|
||||
inference_time = round(infer_time / max(1, self.img_num),
|
||||
4) if average else infer_time
|
||||
tracking_time = round(track_time / max(1, self.img_num),
|
||||
4) if average else track_time
|
||||
|
||||
average_latency = total_time / max(1, self.img_num)
|
||||
qps = 0
|
||||
if total_time > 0:
|
||||
qps = 1 / average_latency
|
||||
print("average latency time(ms): {:.2f}, QPS: {:2f}".format(
|
||||
average_latency * 1000, qps))
|
||||
if self.with_tracker:
|
||||
print(
|
||||
"preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}, tracking_time(ms): {:.2f}".
|
||||
format(preprocess_time * 1000, inference_time * 1000,
|
||||
postprocess_time * 1000, tracking_time * 1000))
|
||||
else:
|
||||
print(
|
||||
"preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}".
|
||||
format(preprocess_time * 1000, inference_time * 1000,
|
||||
postprocess_time * 1000))
|
||||
|
||||
def report(self, average=False):
|
||||
dic = {}
|
||||
pre_time = self.preprocess_time_s.value()
|
||||
infer_time = self.inference_time_s.value()
|
||||
post_time = self.postprocess_time_s.value()
|
||||
track_time = self.tracking_time_s.value()
|
||||
|
||||
dic['preprocess_time_s'] = round(pre_time / max(1, self.img_num),
|
||||
4) if average else pre_time
|
||||
dic['inference_time_s'] = round(infer_time / max(1, self.img_num),
|
||||
4) if average else infer_time
|
||||
dic['postprocess_time_s'] = round(post_time / max(1, self.img_num),
|
||||
4) if average else post_time
|
||||
dic['img_num'] = self.img_num
|
||||
total_time = pre_time + infer_time + post_time
|
||||
if self.with_tracker:
|
||||
dic['tracking_time_s'] = round(track_time / max(1, self.img_num),
|
||||
4) if average else track_time
|
||||
total_time = total_time + track_time
|
||||
dic['total_time_s'] = round(total_time, 4)
|
||||
return dic
|
||||
|
||||
|
||||
def get_current_memory_mb():
|
||||
"""
|
||||
It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
|
||||
And this function Current program is time-consuming.
|
||||
"""
|
||||
import pynvml
|
||||
import psutil
|
||||
import GPUtil
|
||||
gpu_id = int(os.environ.get('CUDA_VISIBLE_DEVICES', 0))
|
||||
|
||||
pid = os.getpid()
|
||||
p = psutil.Process(pid)
|
||||
info = p.memory_full_info()
|
||||
cpu_mem = info.uss / 1024. / 1024.
|
||||
gpu_mem = 0
|
||||
gpu_percent = 0
|
||||
gpus = GPUtil.getGPUs()
|
||||
if gpu_id is not None and len(gpus) > 0:
|
||||
gpu_percent = gpus[gpu_id].load
|
||||
pynvml.nvmlInit()
|
||||
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
|
||||
meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
|
||||
gpu_mem = meminfo.used / 1024. / 1024.
|
||||
return round(cpu_mem, 4), round(gpu_mem, 4), round(gpu_percent, 4)
|
||||
|
||||
|
||||
def multiclass_nms(bboxs, num_classes, match_threshold=0.6, match_metric='iou'):
|
||||
final_boxes = []
|
||||
for c in range(num_classes):
|
||||
idxs = bboxs[:, 0] == c
|
||||
if np.count_nonzero(idxs) == 0: continue
|
||||
r = nms(bboxs[idxs, 1:], match_threshold, match_metric)
|
||||
final_boxes.append(np.concatenate([np.full((r.shape[0], 1), c), r], 1))
|
||||
return final_boxes
|
||||
|
||||
|
||||
def nms(dets, match_threshold=0.6, match_metric='iou'):
|
||||
""" Apply NMS to avoid detecting too many overlapping bounding boxes.
|
||||
Args:
|
||||
dets: shape [N, 5], [score, x1, y1, x2, y2]
|
||||
match_metric: 'iou' or 'ios'
|
||||
match_threshold: overlap thresh for match metric.
|
||||
"""
|
||||
if dets.shape[0] == 0:
|
||||
return dets[[], :]
|
||||
scores = dets[:, 0]
|
||||
x1 = dets[:, 1]
|
||||
y1 = dets[:, 2]
|
||||
x2 = dets[:, 3]
|
||||
y2 = dets[:, 4]
|
||||
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
|
||||
order = scores.argsort()[::-1]
|
||||
|
||||
ndets = dets.shape[0]
|
||||
suppressed = np.zeros((ndets), dtype=np.int32)
|
||||
|
||||
for _i in range(ndets):
|
||||
i = order[_i]
|
||||
if suppressed[i] == 1:
|
||||
continue
|
||||
ix1 = x1[i]
|
||||
iy1 = y1[i]
|
||||
ix2 = x2[i]
|
||||
iy2 = y2[i]
|
||||
iarea = areas[i]
|
||||
for _j in range(_i + 1, ndets):
|
||||
j = order[_j]
|
||||
if suppressed[j] == 1:
|
||||
continue
|
||||
xx1 = max(ix1, x1[j])
|
||||
yy1 = max(iy1, y1[j])
|
||||
xx2 = min(ix2, x2[j])
|
||||
yy2 = min(iy2, y2[j])
|
||||
w = max(0.0, xx2 - xx1 + 1)
|
||||
h = max(0.0, yy2 - yy1 + 1)
|
||||
inter = w * h
|
||||
if match_metric == 'iou':
|
||||
union = iarea + areas[j] - inter
|
||||
match_value = inter / union
|
||||
elif match_metric == 'ios':
|
||||
smaller = min(iarea, areas[j])
|
||||
match_value = inter / smaller
|
||||
else:
|
||||
raise ValueError()
|
||||
if match_value >= match_threshold:
|
||||
suppressed[j] = 1
|
||||
keep = np.where(suppressed == 0)[0]
|
||||
dets = dets[keep, :]
|
||||
return dets
|
||||
|
||||
|
||||
coco_clsid2catid = {
|
||||
0: 1,
|
||||
1: 2,
|
||||
2: 3,
|
||||
3: 4,
|
||||
4: 5,
|
||||
5: 6,
|
||||
6: 7,
|
||||
7: 8,
|
||||
8: 9,
|
||||
9: 10,
|
||||
10: 11,
|
||||
11: 13,
|
||||
12: 14,
|
||||
13: 15,
|
||||
14: 16,
|
||||
15: 17,
|
||||
16: 18,
|
||||
17: 19,
|
||||
18: 20,
|
||||
19: 21,
|
||||
20: 22,
|
||||
21: 23,
|
||||
22: 24,
|
||||
23: 25,
|
||||
24: 27,
|
||||
25: 28,
|
||||
26: 31,
|
||||
27: 32,
|
||||
28: 33,
|
||||
29: 34,
|
||||
30: 35,
|
||||
31: 36,
|
||||
32: 37,
|
||||
33: 38,
|
||||
34: 39,
|
||||
35: 40,
|
||||
36: 41,
|
||||
37: 42,
|
||||
38: 43,
|
||||
39: 44,
|
||||
40: 46,
|
||||
41: 47,
|
||||
42: 48,
|
||||
43: 49,
|
||||
44: 50,
|
||||
45: 51,
|
||||
46: 52,
|
||||
47: 53,
|
||||
48: 54,
|
||||
49: 55,
|
||||
50: 56,
|
||||
51: 57,
|
||||
52: 58,
|
||||
53: 59,
|
||||
54: 60,
|
||||
55: 61,
|
||||
56: 62,
|
||||
57: 63,
|
||||
58: 64,
|
||||
59: 65,
|
||||
60: 67,
|
||||
61: 70,
|
||||
62: 72,
|
||||
63: 73,
|
||||
64: 74,
|
||||
65: 75,
|
||||
66: 76,
|
||||
67: 77,
|
||||
68: 78,
|
||||
69: 79,
|
||||
70: 80,
|
||||
71: 81,
|
||||
72: 82,
|
||||
73: 84,
|
||||
74: 85,
|
||||
75: 86,
|
||||
76: 87,
|
||||
77: 88,
|
||||
78: 89,
|
||||
79: 90
|
||||
}
|
||||
|
||||
|
||||
def gaussian_radius(bbox_size, min_overlap):
|
||||
height, width = bbox_size
|
||||
|
||||
a1 = 1
|
||||
b1 = (height + width)
|
||||
c1 = width * height * (1 - min_overlap) / (1 + min_overlap)
|
||||
sq1 = np.sqrt(b1**2 - 4 * a1 * c1)
|
||||
radius1 = (b1 + sq1) / (2 * a1)
|
||||
|
||||
a2 = 4
|
||||
b2 = 2 * (height + width)
|
||||
c2 = (1 - min_overlap) * width * height
|
||||
sq2 = np.sqrt(b2**2 - 4 * a2 * c2)
|
||||
radius2 = (b2 + sq2) / 2
|
||||
|
||||
a3 = 4 * min_overlap
|
||||
b3 = -2 * min_overlap * (height + width)
|
||||
c3 = (min_overlap - 1) * width * height
|
||||
sq3 = np.sqrt(b3**2 - 4 * a3 * c3)
|
||||
radius3 = (b3 + sq3) / 2
|
||||
return min(radius1, radius2, radius3)
|
||||
|
||||
|
||||
def gaussian2D(shape, sigma_x=1, sigma_y=1):
|
||||
m, n = [(ss - 1.) / 2. for ss in shape]
|
||||
y, x = np.ogrid[-m:m + 1, -n:n + 1]
|
||||
|
||||
h = np.exp(-(x * x / (2 * sigma_x * sigma_x) + y * y / (2 * sigma_y *
|
||||
sigma_y)))
|
||||
h[h < np.finfo(h.dtype).eps * h.max()] = 0
|
||||
return h
|
||||
|
||||
|
||||
def draw_umich_gaussian(heatmap, center, radius, k=1):
|
||||
"""
|
||||
draw_umich_gaussian, refer to https://github.com/xingyizhou/CenterNet/blob/master/src/lib/utils/image.py#L126
|
||||
"""
|
||||
diameter = 2 * radius + 1
|
||||
gaussian = gaussian2D(
|
||||
(diameter, diameter), sigma_x=diameter / 6, sigma_y=diameter / 6)
|
||||
|
||||
x, y = int(center[0]), int(center[1])
|
||||
|
||||
height, width = heatmap.shape[0:2]
|
||||
|
||||
left, right = min(x, radius), min(width - x, radius + 1)
|
||||
top, bottom = min(y, radius), min(height - y, radius + 1)
|
||||
|
||||
masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right]
|
||||
masked_gaussian = gaussian[radius - top:radius + bottom, radius - left:
|
||||
radius + right]
|
||||
if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0:
|
||||
np.maximum(masked_heatmap, masked_gaussian * k, out=masked_heatmap)
|
||||
return heatmap
|
||||
665
third-party/paddle-inference/visualize.py
vendored
Normal file
665
third-party/paddle-inference/visualize.py
vendored
Normal file
@@ -0,0 +1,665 @@
|
||||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from __future__ import division
|
||||
|
||||
import os
|
||||
import cv2
|
||||
import math
|
||||
import numpy as np
|
||||
import PIL
|
||||
from PIL import Image, ImageDraw, ImageFile
|
||||
ImageFile.LOAD_TRUNCATED_IMAGES = True
|
||||
|
||||
def imagedraw_textsize_c(draw, text):
|
||||
if int(PIL.__version__.split('.')[0]) < 10:
|
||||
tw, th = draw.textsize(text)
|
||||
else:
|
||||
left, top, right, bottom = draw.textbbox((0, 0), text)
|
||||
tw, th = right - left, bottom - top
|
||||
|
||||
return tw, th
|
||||
|
||||
|
||||
def visualize_box_mask(im, results, labels, threshold=0.5):
|
||||
"""
|
||||
Args:
|
||||
im (str/np.ndarray): path of image/np.ndarray read by cv2
|
||||
results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
|
||||
matix element:[class, score, x_min, y_min, x_max, y_max]
|
||||
MaskRCNN's results include 'masks': np.ndarray:
|
||||
shape:[N, im_h, im_w]
|
||||
labels (list): labels:['class1', ..., 'classn']
|
||||
threshold (float): Threshold of score.
|
||||
Returns:
|
||||
im (PIL.Image.Image): visualized image
|
||||
"""
|
||||
if isinstance(im, str):
|
||||
im = Image.open(im).convert('RGB')
|
||||
elif isinstance(im, np.ndarray):
|
||||
im = Image.fromarray(im)
|
||||
if 'masks' in results and 'boxes' in results and len(results['boxes']) > 0:
|
||||
im = draw_mask(
|
||||
im, results['boxes'], results['masks'], labels, threshold=threshold)
|
||||
if 'boxes' in results and len(results['boxes']) > 0:
|
||||
im = draw_box(im, results['boxes'], labels, threshold=threshold)
|
||||
if 'segm' in results:
|
||||
im = draw_segm(
|
||||
im,
|
||||
results['segm'],
|
||||
results['label'],
|
||||
results['score'],
|
||||
labels,
|
||||
threshold=threshold)
|
||||
return im
|
||||
|
||||
|
||||
def get_color_map_list(num_classes):
|
||||
"""
|
||||
Args:
|
||||
num_classes (int): number of class
|
||||
Returns:
|
||||
color_map (list): RGB color list
|
||||
"""
|
||||
color_map = num_classes * [0, 0, 0]
|
||||
for i in range(0, num_classes):
|
||||
j = 0
|
||||
lab = i
|
||||
while lab:
|
||||
color_map[i * 3] |= (((lab >> 0) & 1) << (7 - j))
|
||||
color_map[i * 3 + 1] |= (((lab >> 1) & 1) << (7 - j))
|
||||
color_map[i * 3 + 2] |= (((lab >> 2) & 1) << (7 - j))
|
||||
j += 1
|
||||
lab >>= 3
|
||||
color_map = [color_map[i:i + 3] for i in range(0, len(color_map), 3)]
|
||||
return color_map
|
||||
|
||||
|
||||
def draw_mask(im, np_boxes, np_masks, labels, threshold=0.5):
|
||||
"""
|
||||
Args:
|
||||
im (PIL.Image.Image): PIL image
|
||||
np_boxes (np.ndarray): shape:[N,6], N: number of box,
|
||||
matix element:[class, score, x_min, y_min, x_max, y_max]
|
||||
np_masks (np.ndarray): shape:[N, im_h, im_w]
|
||||
labels (list): labels:['class1', ..., 'classn']
|
||||
threshold (float): threshold of mask
|
||||
Returns:
|
||||
im (PIL.Image.Image): visualized image
|
||||
"""
|
||||
color_list = get_color_map_list(len(labels))
|
||||
w_ratio = 0.4
|
||||
alpha = 0.7
|
||||
im = np.array(im).astype('float32')
|
||||
clsid2color = {}
|
||||
expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1)
|
||||
np_boxes = np_boxes[expect_boxes, :]
|
||||
np_masks = np_masks[expect_boxes, :, :]
|
||||
im_h, im_w = im.shape[:2]
|
||||
np_masks = np_masks[:, :im_h, :im_w]
|
||||
for i in range(len(np_masks)):
|
||||
clsid, score = int(np_boxes[i][0]), np_boxes[i][1]
|
||||
mask = np_masks[i]
|
||||
if clsid not in clsid2color:
|
||||
clsid2color[clsid] = color_list[clsid]
|
||||
color_mask = clsid2color[clsid]
|
||||
for c in range(3):
|
||||
color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
|
||||
idx = np.nonzero(mask)
|
||||
color_mask = np.array(color_mask)
|
||||
im[idx[0], idx[1], :] *= 1.0 - alpha
|
||||
im[idx[0], idx[1], :] += alpha * color_mask
|
||||
return Image.fromarray(im.astype('uint8'))
|
||||
|
||||
|
||||
def draw_box(im, np_boxes, labels, threshold=0.5):
|
||||
"""
|
||||
Args:
|
||||
im (PIL.Image.Image): PIL image
|
||||
np_boxes (np.ndarray): shape:[N,6], N: number of box,
|
||||
matix element:[class, score, x_min, y_min, x_max, y_max]
|
||||
labels (list): labels:['class1', ..., 'classn']
|
||||
threshold (float): threshold of box
|
||||
Returns:
|
||||
im (PIL.Image.Image): visualized image
|
||||
"""
|
||||
draw_thickness = min(im.size) // 320
|
||||
draw = ImageDraw.Draw(im)
|
||||
clsid2color = {}
|
||||
color_list = get_color_map_list(len(labels))
|
||||
expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1)
|
||||
np_boxes = np_boxes[expect_boxes, :]
|
||||
|
||||
vis_order = False
|
||||
if len(np_boxes) > 0 and len(np_boxes[0]) == 7:
|
||||
np_boxes = sorted(np_boxes, key=lambda x: x[6])
|
||||
vis_order = True
|
||||
|
||||
centers = []
|
||||
for dt in np_boxes:
|
||||
if len(dt) == 7:
|
||||
clsid, bbox, score, read_order = int(dt[0]), dt[2:6], dt[1], int(dt[6])
|
||||
else:
|
||||
clsid, bbox, score = int(dt[0]), dt[2:], dt[1]
|
||||
if clsid not in clsid2color:
|
||||
clsid2color[clsid] = color_list[clsid]
|
||||
color = tuple(clsid2color[clsid])
|
||||
|
||||
if len(bbox) == 4:
|
||||
xmin, ymin, xmax, ymax = bbox
|
||||
print('class_id:{:d}, confidence:{:.4f}, left_top:[{:.2f},{:.2f}],'
|
||||
'right_bottom:[{:.2f},{:.2f}]'.format(
|
||||
int(clsid), score, xmin, ymin, xmax, ymax))
|
||||
# draw bbox
|
||||
draw.line(
|
||||
[(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
|
||||
(xmin, ymin)],
|
||||
width=draw_thickness,
|
||||
fill=color)
|
||||
cx, cy = int((xmin + xmax)/2), int((ymin + ymax)/2)
|
||||
centers.append((cx, cy))
|
||||
elif len(bbox) == 8:
|
||||
x1, y1, x2, y2, x3, y3, x4, y4 = bbox
|
||||
draw.line(
|
||||
[(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)],
|
||||
width=2,
|
||||
fill=color)
|
||||
xmin = min(x1, x2, x3, x4)
|
||||
ymin = min(y1, y2, y3, y4)
|
||||
|
||||
# draw label
|
||||
text = "{} {:.4f}".format(labels[clsid], score)
|
||||
tw, th = imagedraw_textsize_c(draw, text)
|
||||
draw.rectangle(
|
||||
[(xmin + 1, ymin - th), (xmin + tw + 1, ymin)], fill=color)
|
||||
draw.text((xmin + 1, ymin - th), text, fill=(255, 255, 255))
|
||||
|
||||
if vis_order:
|
||||
for i in range(len(centers)-1):
|
||||
draw.line([centers[i], centers[i+1]], fill=(255, 0, 0), width=2)
|
||||
|
||||
return im
|
||||
|
||||
|
||||
def draw_segm(im,
|
||||
np_segms,
|
||||
np_label,
|
||||
np_score,
|
||||
labels,
|
||||
threshold=0.5,
|
||||
alpha=0.7):
|
||||
"""
|
||||
Draw segmentation on image
|
||||
"""
|
||||
mask_color_id = 0
|
||||
w_ratio = .4
|
||||
color_list = get_color_map_list(len(labels))
|
||||
im = np.array(im).astype('float32')
|
||||
clsid2color = {}
|
||||
np_segms = np_segms.astype(np.uint8)
|
||||
for i in range(np_segms.shape[0]):
|
||||
mask, score, clsid = np_segms[i], np_score[i], np_label[i]
|
||||
if score < threshold:
|
||||
continue
|
||||
|
||||
if clsid not in clsid2color:
|
||||
clsid2color[clsid] = color_list[clsid]
|
||||
color_mask = clsid2color[clsid]
|
||||
for c in range(3):
|
||||
color_mask[c] = color_mask[c] * (1 - w_ratio) + w_ratio * 255
|
||||
idx = np.nonzero(mask)
|
||||
color_mask = np.array(color_mask)
|
||||
idx0 = np.minimum(idx[0], im.shape[0] - 1)
|
||||
idx1 = np.minimum(idx[1], im.shape[1] - 1)
|
||||
im[idx0, idx1, :] *= 1.0 - alpha
|
||||
im[idx0, idx1, :] += alpha * color_mask
|
||||
sum_x = np.sum(mask, axis=0)
|
||||
x = np.where(sum_x > 0.5)[0]
|
||||
sum_y = np.sum(mask, axis=1)
|
||||
y = np.where(sum_y > 0.5)[0]
|
||||
x0, x1, y0, y1 = x[0], x[-1], y[0], y[-1]
|
||||
cv2.rectangle(im, (x0, y0), (x1, y1),
|
||||
tuple(color_mask.astype('int32').tolist()), 1)
|
||||
bbox_text = '%s %.2f' % (labels[clsid], score)
|
||||
t_size = cv2.getTextSize(bbox_text, 0, 0.3, thickness=1)[0]
|
||||
cv2.rectangle(im, (x0, y0), (x0 + t_size[0], y0 - t_size[1] - 3),
|
||||
tuple(color_mask.astype('int32').tolist()), -1)
|
||||
cv2.putText(
|
||||
im,
|
||||
bbox_text, (x0, y0 - 2),
|
||||
cv2.FONT_HERSHEY_SIMPLEX,
|
||||
0.3, (0, 0, 0),
|
||||
1,
|
||||
lineType=cv2.LINE_AA)
|
||||
return Image.fromarray(im.astype('uint8'))
|
||||
|
||||
|
||||
def get_color(idx):
|
||||
idx = idx * 3
|
||||
color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255)
|
||||
return color
|
||||
|
||||
|
||||
def visualize_pose(imgfile,
|
||||
results,
|
||||
visual_thresh=0.6,
|
||||
save_name='pose.jpg',
|
||||
save_dir='output',
|
||||
returnimg=False,
|
||||
ids=None):
|
||||
try:
|
||||
import matplotlib.pyplot as plt
|
||||
import matplotlib
|
||||
plt.switch_backend('agg')
|
||||
except Exception as e:
|
||||
print('Matplotlib not found, please install matplotlib.'
|
||||
'for example: `pip install matplotlib`.')
|
||||
raise e
|
||||
skeletons, scores = results['keypoint']
|
||||
skeletons = np.array(skeletons)
|
||||
kpt_nums = 17
|
||||
if len(skeletons) > 0:
|
||||
kpt_nums = skeletons.shape[1]
|
||||
if kpt_nums == 17: #plot coco keypoint
|
||||
EDGES = [(0, 1), (0, 2), (1, 3), (2, 4), (3, 5), (4, 6), (5, 7), (6, 8),
|
||||
(7, 9), (8, 10), (5, 11), (6, 12), (11, 13), (12, 14),
|
||||
(13, 15), (14, 16), (11, 12)]
|
||||
else: #plot mpii keypoint
|
||||
EDGES = [(0, 1), (1, 2), (3, 4), (4, 5), (2, 6), (3, 6), (6, 7), (7, 8),
|
||||
(8, 9), (10, 11), (11, 12), (13, 14), (14, 15), (8, 12),
|
||||
(8, 13)]
|
||||
NUM_EDGES = len(EDGES)
|
||||
|
||||
colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
|
||||
[0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
|
||||
[170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]
|
||||
cmap = matplotlib.cm.get_cmap('hsv')
|
||||
plt.figure()
|
||||
|
||||
img = cv2.imread(imgfile) if type(imgfile) == str else imgfile
|
||||
|
||||
color_set = results['colors'] if 'colors' in results else None
|
||||
|
||||
if 'bbox' in results and ids is None:
|
||||
bboxs = results['bbox']
|
||||
for j, rect in enumerate(bboxs):
|
||||
xmin, ymin, xmax, ymax = rect
|
||||
color = colors[0] if color_set is None else colors[color_set[j] %
|
||||
len(colors)]
|
||||
cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, 1)
|
||||
|
||||
canvas = img.copy()
|
||||
for i in range(kpt_nums):
|
||||
for j in range(len(skeletons)):
|
||||
if skeletons[j][i, 2] < visual_thresh:
|
||||
continue
|
||||
if ids is None:
|
||||
color = colors[i] if color_set is None else colors[color_set[j]
|
||||
%
|
||||
len(colors)]
|
||||
else:
|
||||
color = get_color(ids[j])
|
||||
|
||||
cv2.circle(
|
||||
canvas,
|
||||
tuple(skeletons[j][i, 0:2].astype('int32')),
|
||||
2,
|
||||
color,
|
||||
thickness=-1)
|
||||
|
||||
to_plot = cv2.addWeighted(img, 0.3, canvas, 0.7, 0)
|
||||
fig = matplotlib.pyplot.gcf()
|
||||
|
||||
stickwidth = 2
|
||||
|
||||
for i in range(NUM_EDGES):
|
||||
for j in range(len(skeletons)):
|
||||
edge = EDGES[i]
|
||||
if skeletons[j][edge[0], 2] < visual_thresh or skeletons[j][edge[
|
||||
1], 2] < visual_thresh:
|
||||
continue
|
||||
|
||||
cur_canvas = canvas.copy()
|
||||
X = [skeletons[j][edge[0], 1], skeletons[j][edge[1], 1]]
|
||||
Y = [skeletons[j][edge[0], 0], skeletons[j][edge[1], 0]]
|
||||
mX = np.mean(X)
|
||||
mY = np.mean(Y)
|
||||
length = ((X[0] - X[1])**2 + (Y[0] - Y[1])**2)**0.5
|
||||
angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
|
||||
polygon = cv2.ellipse2Poly((int(mY), int(mX)),
|
||||
(int(length / 2), stickwidth),
|
||||
int(angle), 0, 360, 1)
|
||||
if ids is None:
|
||||
color = colors[i] if color_set is None else colors[color_set[j]
|
||||
%
|
||||
len(colors)]
|
||||
else:
|
||||
color = get_color(ids[j])
|
||||
cv2.fillConvexPoly(cur_canvas, polygon, color)
|
||||
canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0)
|
||||
if returnimg:
|
||||
return canvas
|
||||
save_name = os.path.join(
|
||||
save_dir, os.path.splitext(os.path.basename(imgfile))[0] + '_vis.jpg')
|
||||
plt.imsave(save_name, canvas[:, :, ::-1])
|
||||
print("keypoint visualize image saved to: " + save_name)
|
||||
plt.close()
|
||||
|
||||
|
||||
def visualize_attr(im, results, boxes=None, is_mtmct=False):
|
||||
if isinstance(im, str):
|
||||
im = Image.open(im)
|
||||
im = np.ascontiguousarray(np.copy(im))
|
||||
im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
|
||||
else:
|
||||
im = np.ascontiguousarray(np.copy(im))
|
||||
|
||||
im_h, im_w = im.shape[:2]
|
||||
text_scale = max(0.5, im.shape[0] / 3000.)
|
||||
text_thickness = 1
|
||||
|
||||
line_inter = im.shape[0] / 40.
|
||||
for i, res in enumerate(results):
|
||||
if boxes is None:
|
||||
text_w = 3
|
||||
text_h = 1
|
||||
elif is_mtmct:
|
||||
box = boxes[i] # multi camera, bbox shape is x,y, w,h
|
||||
text_w = int(box[0]) + 3
|
||||
text_h = int(box[1])
|
||||
else:
|
||||
box = boxes[i] # single camera, bbox shape is 0, 0, x,y, w,h
|
||||
text_w = int(box[2]) + 3
|
||||
text_h = int(box[3])
|
||||
for text in res:
|
||||
text_h += int(line_inter)
|
||||
text_loc = (text_w, text_h)
|
||||
cv2.putText(
|
||||
im,
|
||||
text,
|
||||
text_loc,
|
||||
cv2.FONT_ITALIC,
|
||||
text_scale, (0, 255, 255),
|
||||
thickness=text_thickness)
|
||||
return im
|
||||
|
||||
|
||||
def visualize_action(im,
|
||||
mot_boxes,
|
||||
action_visual_collector=None,
|
||||
action_text="",
|
||||
video_action_score=None,
|
||||
video_action_text=""):
|
||||
im = cv2.imread(im) if isinstance(im, str) else im
|
||||
im_h, im_w = im.shape[:2]
|
||||
|
||||
text_scale = max(1, im.shape[1] / 400.)
|
||||
text_thickness = 2
|
||||
|
||||
if action_visual_collector:
|
||||
id_action_dict = {}
|
||||
for collector, action_type in zip(action_visual_collector, action_text):
|
||||
id_detected = collector.get_visualize_ids()
|
||||
for pid in id_detected:
|
||||
id_action_dict[pid] = id_action_dict.get(pid, [])
|
||||
id_action_dict[pid].append(action_type)
|
||||
for mot_box in mot_boxes:
|
||||
# mot_box is a format with [mot_id, class, score, xmin, ymin, w, h]
|
||||
if mot_box[0] in id_action_dict:
|
||||
text_position = (int(mot_box[3] + mot_box[5] * 0.75),
|
||||
int(mot_box[4] - 10))
|
||||
display_text = ', '.join(id_action_dict[mot_box[0]])
|
||||
cv2.putText(im, display_text, text_position,
|
||||
cv2.FONT_HERSHEY_PLAIN, text_scale, (0, 0, 255), 2)
|
||||
|
||||
if video_action_score:
|
||||
cv2.putText(
|
||||
im,
|
||||
video_action_text + ': %.2f' % video_action_score,
|
||||
(int(im_w / 2), int(15 * text_scale) + 5),
|
||||
cv2.FONT_ITALIC,
|
||||
text_scale, (0, 0, 255),
|
||||
thickness=text_thickness)
|
||||
|
||||
return im
|
||||
|
||||
|
||||
def visualize_vehicleplate(im, results, boxes=None):
|
||||
if isinstance(im, str):
|
||||
im = Image.open(im)
|
||||
im = np.ascontiguousarray(np.copy(im))
|
||||
im = cv2.cvtColor(im, cv2.COLOR_RGB2BGR)
|
||||
else:
|
||||
im = np.ascontiguousarray(np.copy(im))
|
||||
|
||||
im_h, im_w = im.shape[:2]
|
||||
text_scale = max(1.0, im.shape[0] / 400.)
|
||||
text_thickness = 2
|
||||
|
||||
line_inter = im.shape[0] / 40.
|
||||
for i, res in enumerate(results):
|
||||
if boxes is None:
|
||||
text_w = 3
|
||||
text_h = 1
|
||||
else:
|
||||
box = boxes[i]
|
||||
text = res
|
||||
if text == "":
|
||||
continue
|
||||
text_w = int(box[2])
|
||||
text_h = int(box[5] + box[3])
|
||||
text_loc = (text_w, text_h)
|
||||
cv2.putText(
|
||||
im,
|
||||
"LP: " + text,
|
||||
text_loc,
|
||||
cv2.FONT_ITALIC,
|
||||
text_scale, (0, 255, 255),
|
||||
thickness=text_thickness)
|
||||
return im
|
||||
|
||||
|
||||
def draw_press_box_lanes(im, np_boxes, labels, threshold=0.5):
|
||||
"""
|
||||
Args:
|
||||
im (PIL.Image.Image): PIL image
|
||||
np_boxes (np.ndarray): shape:[N,6], N: number of box,
|
||||
matix element:[class, score, x_min, y_min, x_max, y_max]
|
||||
labels (list): labels:['class1', ..., 'classn']
|
||||
threshold (float): threshold of box
|
||||
Returns:
|
||||
im (PIL.Image.Image): visualized image
|
||||
"""
|
||||
|
||||
if isinstance(im, str):
|
||||
im = Image.open(im).convert('RGB')
|
||||
elif isinstance(im, np.ndarray):
|
||||
im = Image.fromarray(im)
|
||||
|
||||
draw_thickness = min(im.size) // 320
|
||||
draw = ImageDraw.Draw(im)
|
||||
clsid2color = {}
|
||||
color_list = get_color_map_list(len(labels))
|
||||
|
||||
if np_boxes.shape[1] == 7:
|
||||
np_boxes = np_boxes[:, 1:]
|
||||
|
||||
expect_boxes = (np_boxes[:, 1] > threshold) & (np_boxes[:, 0] > -1)
|
||||
np_boxes = np_boxes[expect_boxes, :]
|
||||
|
||||
for dt in np_boxes:
|
||||
clsid, bbox, score = int(dt[0]), dt[2:], dt[1]
|
||||
if clsid not in clsid2color:
|
||||
clsid2color[clsid] = color_list[clsid]
|
||||
color = tuple(clsid2color[clsid])
|
||||
|
||||
if len(bbox) == 4:
|
||||
xmin, ymin, xmax, ymax = bbox
|
||||
# draw bbox
|
||||
draw.line(
|
||||
[(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
|
||||
(xmin, ymin)],
|
||||
width=draw_thickness,
|
||||
fill=(0, 0, 255))
|
||||
elif len(bbox) == 8:
|
||||
x1, y1, x2, y2, x3, y3, x4, y4 = bbox
|
||||
draw.line(
|
||||
[(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)],
|
||||
width=2,
|
||||
fill=color)
|
||||
xmin = min(x1, x2, x3, x4)
|
||||
ymin = min(y1, y2, y3, y4)
|
||||
|
||||
# draw label
|
||||
text = "{}".format(labels[clsid])
|
||||
tw, th = imagedraw_textsize_c(draw, text)
|
||||
draw.rectangle(
|
||||
[(xmin + 1, ymax - th), (xmin + tw + 1, ymax)], fill=color)
|
||||
draw.text((xmin + 1, ymax - th), text, fill=(0, 0, 255))
|
||||
return im
|
||||
|
||||
|
||||
def visualize_vehiclepress(im, results, threshold=0.5):
|
||||
results = np.array(results)
|
||||
labels = ['violation']
|
||||
im = draw_press_box_lanes(im, results, labels, threshold=threshold)
|
||||
return im
|
||||
|
||||
|
||||
def visualize_lane(im, lanes):
|
||||
if isinstance(im, str):
|
||||
im = Image.open(im).convert('RGB')
|
||||
elif isinstance(im, np.ndarray):
|
||||
im = Image.fromarray(im)
|
||||
|
||||
draw_thickness = min(im.size) // 320
|
||||
draw = ImageDraw.Draw(im)
|
||||
|
||||
if len(lanes) > 0:
|
||||
for lane in lanes:
|
||||
draw.line(
|
||||
[(lane[0], lane[1]), (lane[2], lane[3])],
|
||||
width=draw_thickness,
|
||||
fill=(0, 0, 255))
|
||||
|
||||
return im
|
||||
|
||||
|
||||
def visualize_vehicle_retrograde(im, mot_res, vehicle_retrograde_res):
|
||||
if isinstance(im, str):
|
||||
im = Image.open(im).convert('RGB')
|
||||
elif isinstance(im, np.ndarray):
|
||||
im = Image.fromarray(im)
|
||||
|
||||
draw_thickness = min(im.size) // 320
|
||||
draw = ImageDraw.Draw(im)
|
||||
|
||||
lane = vehicle_retrograde_res['fence_line']
|
||||
if lane is not None:
|
||||
draw.line(
|
||||
[(lane[0], lane[1]), (lane[2], lane[3])],
|
||||
width=draw_thickness,
|
||||
fill=(0, 0, 0))
|
||||
|
||||
mot_id = vehicle_retrograde_res['output']
|
||||
if mot_id is None or len(mot_id) == 0:
|
||||
return im
|
||||
|
||||
if mot_res is None:
|
||||
return im
|
||||
np_boxes = mot_res['boxes']
|
||||
|
||||
if np_boxes is not None:
|
||||
for dt in np_boxes:
|
||||
if dt[0] not in mot_id:
|
||||
continue
|
||||
bbox = dt[3:]
|
||||
if len(bbox) == 4:
|
||||
xmin, ymin, xmax, ymax = bbox
|
||||
# draw bbox
|
||||
draw.line(
|
||||
[(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
|
||||
(xmin, ymin)],
|
||||
width=draw_thickness,
|
||||
fill=(0, 255, 0))
|
||||
|
||||
# draw label
|
||||
text = "retrograde"
|
||||
tw, th = imagedraw_textsize_c(draw, text)
|
||||
draw.rectangle(
|
||||
[(xmax + 1, ymin - th), (xmax + tw + 1, ymin)],
|
||||
fill=(0, 255, 0))
|
||||
draw.text((xmax + 1, ymin - th), text, fill=(0, 255, 0))
|
||||
|
||||
return im
|
||||
|
||||
|
||||
COLORS = [
|
||||
(255, 0, 0),
|
||||
(0, 255, 0),
|
||||
(0, 0, 255),
|
||||
(255, 255, 0),
|
||||
(255, 0, 255),
|
||||
(0, 255, 255),
|
||||
(128, 255, 0),
|
||||
(255, 128, 0),
|
||||
(128, 0, 255),
|
||||
(255, 0, 128),
|
||||
(0, 128, 255),
|
||||
(0, 255, 128),
|
||||
(128, 255, 255),
|
||||
(255, 128, 255),
|
||||
(255, 255, 128),
|
||||
(60, 180, 0),
|
||||
(180, 60, 0),
|
||||
(0, 60, 180),
|
||||
(0, 180, 60),
|
||||
(60, 0, 180),
|
||||
(180, 0, 60),
|
||||
(255, 0, 0),
|
||||
(0, 255, 0),
|
||||
(0, 0, 255),
|
||||
(255, 255, 0),
|
||||
(255, 0, 255),
|
||||
(0, 255, 255),
|
||||
(128, 255, 0),
|
||||
(255, 128, 0),
|
||||
(128, 0, 255),
|
||||
]
|
||||
|
||||
|
||||
def imshow_lanes(img, lanes, show=False, out_file=None, width=4):
|
||||
lanes_xys = []
|
||||
for _, lane in enumerate(lanes):
|
||||
xys = []
|
||||
for x, y in lane:
|
||||
if x <= 0 or y <= 0:
|
||||
continue
|
||||
x, y = int(x), int(y)
|
||||
xys.append((x, y))
|
||||
lanes_xys.append(xys)
|
||||
lanes_xys.sort(key=lambda xys: xys[0][0] if len(xys) > 0 else 0)
|
||||
|
||||
for idx, xys in enumerate(lanes_xys):
|
||||
for i in range(1, len(xys)):
|
||||
cv2.line(img, xys[i - 1], xys[i], COLORS[idx], thickness=width)
|
||||
|
||||
if show:
|
||||
cv2.imshow('view', img)
|
||||
cv2.waitKey(0)
|
||||
|
||||
if out_file:
|
||||
if not os.path.exists(os.path.dirname(out_file)):
|
||||
os.makedirs(os.path.dirname(out_file))
|
||||
cv2.imwrite(out_file, img)
|
||||
Reference in New Issue
Block a user