Compare commits

...

773 Commits

Author SHA1 Message Date
Jiang-Jia-Jun
6580e3331b Merge branch 'release/2.2' into fix-gpu-memory-oom 2025-09-22 21:19:19 +08:00
luukunn
6b47773bd6 [fix]Modify follow-up push parameters and Modify the verification method for thinking length (#4177)
* [fix]Modify follow-up push parameters and Modify the verification method for thinking length (#4086)

* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* add completion_token_ids

* add logger

* fix reasoning_max_tokens ParameterError

* add unittest

* add unittest

* add unittest

* add unittest

* add unittest

* add unit test

* fix
2025-09-22 21:12:05 +08:00
李泳桦
0358329946 [fix] initialize available_gpu_block_num with max_gpu_block_num (#4193) 2025-09-22 18:56:00 +08:00
RAM
01f6934162 [Executor] Adjust signal sending order in RL training (#3773) (#4066) (#4178)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* Adjust processing order

* fix bug

* fix update_parameters bug

* refine code
2025-09-22 14:31:36 +08:00
chen
7bdc6f41e5 fix glm all_reduce tp group (#4188) 2025-09-22 10:57:13 +08:00
ltd0924
bba279cf38 [Feature] support rdma IB transfer (#4123)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* Update serving_chat.py

* Update serving_completion.py

* Update serving_completion.py

* mv connection_manager init

* [BugFix] fix kv cache

* fix format

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-09-19 12:54:49 +08:00
Sunny-bot1
4f460db556 [CP2.2] Machete support group scale & wint8 & v1 loader (#4166)
* support v1 loader for machete (#3999)

* [Optimize] Support WINT8 and group scale for Machete (#3905)

* [Optimize] Machete using group scale default (#4121)
2025-09-19 11:13:12 +08:00
JYChen
74d7b9151d fix mtp (#4153)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>
2025-09-18 10:53:07 +08:00
李泳桦
0fa28b1068 [fix] fix ep group all-reduce (#4140)
* [fix] fix ep group all-reduce

* [fix] fix clear/update lock not working when workers > 1

* [chore] add preemption triggered info log

* [fix] fix code style

* fix model_weights_signal (#4092)

* fix model_weights_signal

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-09-18 10:34:49 +08:00
Jiang-Jia-Jun
cffde70949 Add assertion for ENABLE_V1_KVCACHE_SCHEDULER (#4146)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-09-17 16:02:56 +08:00
K11OntheBoat
7f9a9b37f3 Support limit thinking lengths (#4070)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-09-17 12:40:08 +08:00
gaoziyuan
b41988f4bc fix gid (#4038)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-09-16 20:56:36 +08:00
李泳桦
7ccbcc5a62 [feat] support prefix cache clearing when /clear_load_weight is called (#4091)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [feat] support clearing prefix cache (cherry-picked from release/2.1)

* [fix] fix ipc suffix, use port instead

* [fix] fix prefix caching not enabled

* [fix] fix code style

* [fix] wait for rank0 to update weight status
2025-09-16 11:11:20 +08:00
chen
fbb4e0f8d1 [CP]Glm45 air 2.2 (#4073)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Feature] Support zai-org/GLM-4.5-Air BF16 model (#3928)

* support glm45_air

* [Feature] GLM-45-AIR Support Mix Quantization(Dense wfp8afp8 and wint8 triton_moe_backend) (#4051)

* check

* fix v1 load for mix and wint8

* check --quantizations 'None'

* check

* support RL rollout

* check v1 loader

* check glm rollout_model, change wfp8afp8 per_token_cast_to_fp8 to native impl

* check rollout moe gate begin layer_id

* check rollout e_score_correction_bias

* delete infer_to_train_mapping={}

* code check
2025-09-15 18:52:58 +08:00
YuanRisheng
4e8ba62241 [setup optimize]Support git submodule (#4033) (#4080)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* support git submodule

* update setup

* fix ci network

* fix clone

* revert clone linux

* delete args

* fix ci

* update
2025-09-15 11:41:55 +08:00
YuBaoku
7e3148ed81 [CI] update paddlepaddle==3.2.0 in release/2.2 (#3997)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [CI] update paddlepaddle-gpu==3.2.0 in release/2.2

* [CI] debug paddleformers==0.3.0 in release/2.2

* [CI] update paddlepaddle==3.2.0 in release/2.2
2025-09-11 22:04:40 +08:00
chenjian
4f8ff478b3 [Feature] Support mixed deployment with yiyan adapter in release22 (#3974)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Feature] Support mixed deployment with yiyan adapter in release2.2

* [Feature] Support mixed deployment with yiyan adapter in release2.2

* fix metrics

* add unit test

* add unit test

* add unit test

* add unit test

* add unit test

* add unit test
2025-09-10 16:01:13 +08:00
guozhuangzhuang
c4098d56a0 Fixed the issue of metrics file conflicts between multiple instances … (#4010)
* Fixed the issue of metrics file conflicts between multiple instances on a single machine

* Use uuid to name the metrics shared folder

* Use uuid to name the metrics shared folder
2025-09-10 13:48:24 +08:00
ltd0924
a6b161b007 [Fix] fix multi api server log dir (#3966)
* fix scheduler bug

* fix

* Update api_server.py

* Update multi_api_server.py

* [Fix]
2025-09-10 13:48:17 +08:00
Yuanle Liu
7272afe3dc Fix down projection weight shape in fused MOE layer (#4041) 2025-09-10 12:49:03 +08:00
yangjianfengo1
dfc94371ee 【FIX】Change the name of sparse attn from moba to plas (#4006)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* 更新文档

* 【docs】 update readme (#4000)

* 更新文档

* update readme

* update docs

* 【FIX】Change the name of sparse attn from moba to plas (#3845)

* 更新文档

* 更新文档

* 更新文档

* 更新文档

* 修改moba为plas

* code style

* update ci

* code style

* update ci

* code style

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-10 10:04:29 +08:00
Zero Rains
35b8362804 get org_vocab_size from args (#3984)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-09-09 15:07:51 +08:00
zhuzixuan
d43c2f2577 [Optimize]Error messages about Model api. (#3839) (#3972)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* add v1/models interface related

* add model parameters

* default model verification

* unit test

* check model err_msg

* unit test

* type annotation

* model parameter in response

* modify document description

* modify document description

* unit test

* verification

* verification update

* model_name

* pre-commit

* update test case

* update test case

* Update tests/entrypoints/openai/test_serving_models.py



* Update tests/entrypoints/openai/test_serving_models.py



* Update tests/entrypoints/openai/test_serving_models.py



* Update tests/entrypoints/openai/test_serving_models.py



* Update fastdeploy/entrypoints/openai/serving_models.py



* 优化报错信息。

---------

Co-authored-by: yangzichao01 <yangzichao01@baidu.com>
Co-authored-by: Yzc216 <101054010+Yzc216@users.noreply.github.com>
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-09 10:58:11 +08:00
yangjianfengo1
14df2c59da 更新文档 (#3996) 2025-09-09 10:23:51 +08:00
ming1753
934071578a [Docs] release 2.2.0 (#3991) 2025-09-09 09:50:45 +08:00
JYChen
36a58f487c [docs] update best practice docs for release/2.2 (#3970)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* update best practice docs

* add version and v1 loader info
2025-09-08 22:17:32 +08:00
lizhenyun01
d40a1046de [Feature] support rl_tp_degree (#3934)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Feature] support rl_tp_degree

* add rl_tp_degree in lmhead

* add rl_tp_degree in bias

* fix split_axis=0 in bias

* fix split_axis in weight

* fix bias rl_tp_degree

* fix bias rl_tp_degree

* change attr to dict

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-08 16:20:32 +08:00
Sunny-bot1
fa2369271d update env docs for Machete (#3960) 2025-09-08 14:44:52 +08:00
Zhang Yulong
8903f937f9 update ci (#3953) 2025-09-08 14:21:25 +08:00
luukunn
1023a67765 [BugFix] fix default parser (#3932)
* add reasoning parser plugin

* fix finish reason

* fix default parser

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-08 14:12:13 +08:00
Zero Rains
d43549953c [Cherry-Pick][Bug Fix]fix the bug for real size 0 in cudagraph (#3888)
* fix the bug for real size 0 in cudagraph

* fix cache_messager

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-08 14:06:10 +08:00
Yuanle Liu
c7c1627456 Update paddleformers version to >=0.2.3 (#3936)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* Update paddleformers version to 0.2.2

* Update requirements.txt

* Update paddleformers version to >=0.2.3
2025-09-08 11:11:05 +08:00
ming1753
d6bf6de5e6 [Bug Fix] Fix mm performance degradation (#3942)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Bug Fix] Fix mm performance degradation

* formate

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: chenjian <1435317881@qq.com>
2025-09-08 00:32:22 +08:00
chenjian
38e734e183 [Feature] support hierarchical cache in v1 (#3939) 2025-09-08 00:31:34 +08:00
bukejiyu
051e4a881c ignore (#3949) 2025-09-07 23:57:48 +08:00
chenjian
b2bb37d7c0 [Fix] when prompt token ids is numpy (#3944) 2025-09-07 23:02:03 +08:00
CSWYF3634076
c6e2a37a95 [BugFix] qwen2.5vl enable_thinking=true bug fix (#3920) 2025-09-07 21:06:36 +08:00
Jiang-Jia-Jun
a6146d237e Merge branch 'release/2.2' into fix-gpu-memory-oom 2025-09-07 12:10:51 +08:00
chenjian
8d77c1cb51 [Optimize] optimize prefix cache in release22 (#3889)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* optimize prefix cache in release22

* optimize prefix cache in release22

* fix worker

* fix

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-06 09:52:01 +08:00
chenjian
41cd3e24c9 [Feature] Enable prefix caching as default (#3816)
* [Feature] Enable prefix caching as default

* [Feature] Enable prefix caching as default

* Set prefix caching as default

* skip dynamic load

* fix kill bug

* fix kill bug

* fix kill bug

* fix ci

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-06 09:51:34 +08:00
Zhang Yulong
11b18e5ef0 add cache queue port (#3904) (#3926)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* add cache queue port

* add cache queue port

* add cache queue port
2025-09-06 00:00:12 +08:00
freeliuzc
e2c764fd5a update hybrid-mtp-with-ngram (#3924) 2025-09-05 23:06:57 +08:00
lizhenyun01
2d975e16b0 [BugFix] fix TaskQueue dp_id in multi node (#3919) 2025-09-05 22:29:26 +08:00
chenjian
8915c8411d Revert "[Feature] Setting number of apiserver workers automatically (#3794)" (#3918)
This reverts commit d1d063e4af.
2025-09-05 21:06:50 +08:00
yinwei
77c1bd0813 [XPU]Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3900)
* fix bug

* fix bug

* update

* udpate

* update
2025-09-05 19:17:25 +08:00
Yuanle Liu
473cde779f paddleformers==0.2.1 (#3925) 2025-09-05 19:06:15 +08:00
chen
335d1c8e8f 【CP】Compatible with EB 0.3B torch model arch (#3914)
* fix

* check
2025-09-05 19:05:07 +08:00
ltd0924
173e4df982 [Fix] mv connection_manager init (#3902)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* Update serving_chat.py

* Update serving_completion.py

* Update serving_completion.py

* mv connection_manager init

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-09-05 17:42:36 +08:00
lizhenyun01
199f88ce1e support tpep weight load (#3882) 2025-09-05 13:56:29 +08:00
ltd0924
55ebe855c0 [Feature] support controller port in multi api server (#3895)
* fix scheduler bug

* fix

* Update api_server.py

* Update multi_api_server.py
2025-09-05 13:38:58 +08:00
zhouchong
deb7ad205f fix qwen_vl_processor miss image_patch_id (#3894)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-05 11:32:34 +08:00
Jiang-Jia-Jun
f4db5d8b59 Merge branch 'release/2.2' into fix-gpu-memory-oom 2025-09-05 11:29:11 +08:00
Yuanle Liu
e9f72df918 paddleformers==0.1.4 (#3908) 2025-09-05 11:25:57 +08:00
Jiang-Jia-Jun
0f8dc9f754 Remove unused import in engine_client.py 2025-09-04 21:36:29 +08:00
chenjian
8567ada09e [Fix] disable scheduler v1 in guided decoding (#3877)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* disable scheduler v1 in guided decoding

* disable scheduler v1 in guided decoding
2025-09-04 20:54:55 +08:00
YuBaoku
afcde19277 [CI] update paddleformers==0.2 in release/2.2 (#3828)
* [DEBUG] Adapt validation for paddleformers==0.2 in release/2.2

* [CI] update paddleformers==0.2 in release/2.2
2025-09-04 20:12:37 +08:00
lizhenyun01
d40d3a5a4f fix DP&&TP (#3872)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-09-04 14:38:26 +08:00
luukunn
b8d0f1c081 [bug] fix finish reason (#3858)
* add reasoning parser plugin

* fix finish reason

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-09-04 14:36:03 +08:00
ltd0924
8550e19008 [bugfix] scheduler (#3871)
* fix scheduler bug

* fix

* Update api_server.py
2025-09-04 11:34:12 +08:00
chenjian
a0c03510c0 [Bug fix] Fix prompt token ids dtype in v1 (#3861) 2025-09-04 11:02:37 +08:00
chenjian
fb1e0d6a87 [Feature] Set scheduler v1 as default (#3812)
* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default
2025-09-04 11:02:10 +08:00
gaoziyuan
fbf0e9d2aa fix mem boom in ep (#3852) 2025-09-04 10:38:34 +08:00
SunLei
8c0e7d6fe9 Support for async processor added. (#3870)
* Support for async processor added.

* remove yappi code
2025-09-04 10:35:08 +08:00
yangjianfengo1
b56b015d85 fix port (#3865)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-09-04 10:02:08 +08:00
ming1753
1432e336d7 [Bug Fix] Fix bug of multimodal inputs only text (#3850)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-09-03 19:48:10 +08:00
yangjianfengo1
9213a58a06 【Fix bug] w4afp8 的nblock固定为256,并且fa3的append attn 增加mask参数 (#3771) (#3835)
* fix w4afp8

* 增加集中式配置

* codestyle

* fix fa3 append attn
2025-09-03 19:36:45 +08:00
plusNew001
87ef0f5d30 [XPU] Update XPU stable xvllm and xtdk version for 2.2 & Change CI Case (#3855)
* Update no_proxy environment variable in CI workflow

* Install lsof and kill api_server processes

Install lsof tool and kill processes using it.

* Update dependency versions for stable release

* Update CI script to use stable dependencies
2025-09-03 19:33:06 +08:00
plusNew001
abcd2148c0 [XPU]Update XPU CI Case (#3844)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* Update no_proxy environment variable in CI workflow

* Install lsof and kill api_server processes

Install lsof tool and kill processes using it.
2025-09-03 15:29:47 +08:00
gaoziyuan
05b6591c23 【BugFix】add moe noaux_tc tatics in trition backend (#3821)
* add moe noaux_tc tatics in trition backend

* fix

* add dp config
2025-09-03 13:28:44 +08:00
plusNew001
42402c80e9 Update installation method for paddlepaddle-xpu (#3834) 2025-09-03 11:28:27 +08:00
luukunn
1968c65849 add reasoning parser plugin (#3820) 2025-09-03 11:17:13 +08:00
ltd0924
37cb37b7f2 [BugFix] fix scheduler (#3818)
* fix scheduler bug

* fix
2025-09-03 11:16:49 +08:00
bukejiyu
f975f7de2f [v1loader]Reduce EB300B model loading time (#3700) (#3810)
* speed up eb45

* update
2025-09-03 10:14:31 +08:00
Yuanle Liu
174510180a [BugFix] fix error of import paddle.base.core.Config (#3761) (#3804)
* 延迟 import Config

* support chunked_prefill

* support chunked_prefill
2025-09-03 10:14:03 +08:00
ltd0924
5cda326ba2 Update qwen_vl_processor.py (#3806)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-09-02 21:56:24 +08:00
RAM
a6c8f17431 [Executor] Fix bug of import paddle with RLHF (#3781) (#3817) 2025-09-02 21:42:59 +08:00
ltd0924
cd09384a14 [BugFix] fix max streaming tokens invalid (#3799)
* Update serving_chat.py

* Update serving_completion.py

* Update serving_completion.py
2025-09-02 21:03:13 +08:00
ltd0924
0f42771a84 [Feature] support model weight update in ep (#3802)
* Update config.py

* Update ep.py

* Update fused_moe_backend_base.py

* Update dynamic_weight_manager.py

* Update worker_process.py

* fix ci
2025-09-02 20:52:47 +08:00
Jiang-Jia-Jun
d1d063e4af [Feature] Setting number of apiserver workers automatically (#3794)
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-09-02 17:19:07 +08:00
kevin
a86b35ab49 Fix chunked prefill (#3778)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* update enable chunked_prefill

* update code

* update code

* update code
2025-09-02 13:41:55 +08:00
YUNSHEN XIE
0cdbc950b5 fix ce compile task upload error (#3788) 2025-09-02 11:52:50 +08:00
YUNSHEN XIE
2b0a745d57 fix ce build job (#3777) 2025-09-02 10:53:26 +08:00
Jiang-Jia-Jun
1953c7c759 Update FASTDEPLOY_VERSION to 2.2.0 2025-08-31 21:31:12 +08:00
chenjian
465065cd19 [Bug fix] Fix prefix cache in V1 (#3715)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* [Bug fix] Fix prefix cache in V1

* fix code style
2025-08-31 21:29:33 +08:00
lizhenyun01
bed09ae8f8 fix mask_offset in append_attn (#3745)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mask_offset in append_attn

* fix test
2025-08-31 15:03:16 +08:00
kevin
753772ace8 default enable chunked prefill (#3731)
* add error traceback info

* update error msg

* update code

* default enable chunked prefill

* update code

* update code

* add envs

* update code

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-31 13:15:13 +08:00
李泳桦
98e03fb4ea [feat] add metrics for yiyan adapter (#3219) (#3614)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* [feat] add metrics for yiyan adapter

* [fix] fix metrics num_requests_waiting and num_requests_running

* [fix] fix metrics gpu_cache_usage_perc

* [refactor] change where requests_number increases

* [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly

* [chore] delete useless code
2025-08-30 23:20:58 +08:00
Sunny-bot1
fe5d09f9ee [FIX]Fix Machete compile via ENABLE_MACHETE (#3727)
* add ENABLE_MACHETE

* fix

* revert

* update

* pre_commit

* fix

* fix

---------

Co-authored-by: Ayakouji <yuhongh@qq.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: aquagull <hongyuh@qq.com>
2025-08-30 17:50:17 +08:00
SunLei
b9af95cf1c [Feature] Add AsyncTokenizerClient&ChatResponseProcessor with remote encode&decode support. (#3674)
* [Feature] add AsyncTokenizerClient

* add decode_image

* Add response_processors with remote decode support.

* [Feature] add tokenizer_base_url startup argument

* Revert comment removal and restore original content.

* [Feature] Non-streaming requests now support remote image decoding.

* Fix parameter type issue in decode_image call.

* Keep completion_token_ids when return_token_ids = False.

* add copyright
2025-08-30 17:06:26 +08:00
luukunn
9a7c231f2c [Feature]support chat_template.jinja (#3721)
* add support chat_template.jinja

* add support chat_template.jinja
2025-08-30 17:05:34 +08:00
lizexu123
b21e085f3e [Code Simplification] delete print (#3729) 2025-08-30 16:19:07 +08:00
chen
7568b20098 check (#3720) 2025-08-30 16:04:20 +08:00
lizexu123
455205f991 [Features] support hugging face qwen3 moe (#3649)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* split ut

* qwen3-30B-A3B

* fix

* add test

* add test_torch_model.py

* fix test_torch_model.py

* delete print

* fix moe

* delete init.py

* fix

* fix

---------

Co-authored-by: bukejiyu <395822456@qq.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
2025-08-30 15:26:05 +08:00
Zero Rains
f206474cc7 fix the bug when num_key_value_heads < tensor_parallel_size (#3717) 2025-08-30 12:40:00 +08:00
chenjian
c4b1f6b0a5 [Optimize] Increase zmq buffer size to prevent apiserver too slowly to consume (#3723) 2025-08-30 10:45:26 +08:00
YUNSHEN XIE
a18afcfdd9 Optimize coverage jobs (#3683)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-30 00:12:40 +08:00
chen
cd252ec673 [Feature]support load eb 0.3B and 21B torch model (#3660)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-29 20:00:48 +08:00
yangjianfengo1
3754a9906d [Feature] block sparse attention (#3668)
* 支持稀疏attn

* fix bug

* code style

* fix moba attn get kv shape

* 修复a100编译

* codestyle

* code style

* code style

* code style

* fix conflict

* 增加单侧

* code style

* 增加eblite 加载时间

* fix bug

* for ci

* for ci

* for ci

* for ci

* 支持mlp block size 128

* 增加小算子单测

* fix 单测 mlp

* 将环境变量加入到config里面

* fix rollout config

* 修复显存

* add test server

* add test server

* fix mlp  最后一层使用full attn
2025-08-29 19:46:30 +08:00
zhouchong
ccd52b5596 [Model]support qwen2_5_vl (#3557)
* adapt qwen_2_5_vl model

* adapt qwen_2_5_vl VIT model

* adapt qwen2_5_vl images_embeds

* adapt qwen2_5_vl 3D rope

* adapt qwen2_5_vl 3D rope v2

* adapt qwen2_5_vl processor

* adapt qwen2_5_vl bypass resampler_model

* adapt qwen2_5_vl 绕过部分ernie逻辑

* adapt qwen2_5_vl 绕过部分ernie逻辑 v2

* adapt qwen2_5_vl 权重加载与命名修改

* adapt qwen2_5_vl 非必须think_end_id

* adapt qwen2_5_vl 区分多种模型的extract_vision_features

* fix:adapt qwen2_5_vl model

* adapt qwen2_5_vl norm

* adapt qwen2_5_vl  processor 更新

* adapt qwen2_5_vl image and video success

* adapt qwen2_5_vl 部分整理代码

* adapt qwen2_5_vl 支持多卡

* adapt qwen2_5_vl on latest develop

* adapt qwen2_5_vl RL

* adapt qwen2_5_vl 整理代码

* support noex rope3d

* adapt qwen2_5_vl add init.py

* adapt qwen2_5_vl add init.py v2

* adapt qwen2_5_vl remove space

* adapt qwen2_5_vl remove space v2

* adapt qwen2_5_vl pre-commit

* adapt qwen2_5_vl update

* adapt qwen2_5_vl pre-commit v2

* adapt qwen2_5_vl modify comments

* adapt qwen2_5_vl fix indentation

* adapt qwen2_5_vl fix indentation v2

---------

Co-authored-by: wangyafeng <wangyafeng@baidu.com>
Co-authored-by: xiaoxiaohehe001 <49090790+xiaoxiaohehe001@users.noreply.github.com>
Co-authored-by: CSWYF3634076 <58356743+CSWYF3634076@users.noreply.github.com>
2025-08-29 18:28:39 +08:00
YuBaoku
65425bf858 [CI] update paddle version to nightly (#3698) 2025-08-29 18:16:13 +08:00
Yuan Xiaolan
c71ee0831c add w4afp8 offline script (#3636) 2025-08-29 17:56:05 +08:00
zyfncg
f677c032c0 [CudaGraph] [SOT] Support spliting static graph into piecewise graph with cuda_graph (#3478)
* support spliting static graph into piecewise graph with cuda_graph

* Update fastdeploy/model_executor/graph_optimization/cudagraph_piecewise_backend.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix merge conflict

* fix bug

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-29 16:28:01 +08:00
lzy
48d760539b fix deepcopy(tp_group) in spec (#3648) 2025-08-29 16:08:21 +08:00
Ryan
45f81b34f0 add dtype int32 (#3692)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-29 14:56:35 +08:00
xiaoxiaohehe001
1bf4fc7f36 support w4afp8 eplb (#3680) 2025-08-29 14:43:06 +08:00
Yuanle Liu
68f87240da fix key error in mm (#3702) 2025-08-29 14:35:12 +08:00
李泳桦
88297240e7 [feat] completion api supports passing input token ids in either prompt or prompt_token_ids (#3311)
* [feat] completion api supports passing input token ids in either `prompt` or `prompt_token_ids`

* [fix] update comment

* [fix] fix type error

* [test] add a unittest file for serving api test

* [test] try to fix ci error

* [chore] rename test function names

* [test] try to fix ci error

* [test] try to fix ci error

* [test] add tests for qwen
2025-08-29 14:19:42 +08:00
周周周
17b414c2df MoE Default use triton's blockwise fp8 in TP Case (#3678) 2025-08-29 11:07:30 +08:00
co63oc
b6edd15d55 fix scaled_gemm_f8_i4_f16_weight_quantize input (#3685) 2025-08-29 11:04:04 +08:00
Yuanle Liu
2fb2c0f46a fix MultimodalRegistry (#3699) 2025-08-29 11:01:30 +08:00
Echo-Nie
43d5bd62b4 【Hackathon 9th No.70】supplementary unit test for CPUPlatform and CUDAPlatform (#3580)
* 功能模块 CUDAPlatform、CPUPlatform 单测补充

* update the "is_cuda" to "is_cuda_and_available"

* fix pre-commit

---------

Co-authored-by: Tao Luo <luotao02@baidu.com>
2025-08-29 10:34:05 +08:00
lifulll
72094d4d82 enable dcu ci (#3402) 2025-08-29 10:23:08 +08:00
kevin
73d60fe64d update ci envs for structred output (#3687)
* add error traceback info

* update error msg

* update code

* update ci envs for structred output

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-29 10:21:36 +08:00
bukejiyu
0b51b9c35b fix qwen3 235B tp 8 (#3697)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-28 23:46:25 +08:00
Yuanle Liu
4957908275 add input_processor plugin (#3657)
* add input_processor plugin

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update
2025-08-28 22:53:57 +08:00
ming1753
02b3644903 [Bug Fix] VL Support w4a8/w4afp8 (#3686)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-28 21:38:35 +08:00
YuanRisheng
808b548761 support tmp (#3675) 2025-08-28 19:42:32 +08:00
Divano
368bbd9dc6 Update _base_test.yml (#3690)
新增测试并发参数ci case
2025-08-28 19:15:19 +08:00
gaoziyuan
fc635acc47 [BugFix]fix dp&ep&tp and muti node infer (#3629)
* rm log

* fix bug

* fix bug

* fix dp&ep&tp and muti node infer

* fix

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-08-28 19:09:10 +08:00
Divano
17731a8acd add concurrency cases (#3689) 2025-08-28 18:30:19 +08:00
Liumengyuan
2a73a6df03 fix_fp8_deepgemm_moe_tp_bug (#3658) 2025-08-28 17:19:02 +08:00
Liumengyuan
e93d4cfcdd Add with_output version AppendAttention (#3302)
* get use_output from fd_config

* add clear TODO description

* add mask_offset para to align with develop

* fix bug

* fix use_output logic

* fix sot bug
2025-08-28 17:10:18 +08:00
ltd0924
94ded434bd [BugFix] ep mixed offline exit (#3661)
* Update expert_service.py

* Update expert_service.py
2025-08-28 17:09:07 +08:00
ltd0924
e5015eea05 [BugFix] fix logger (#3666) 2025-08-28 17:08:00 +08:00
bukejiyu
73cf6096da fix (#3676)
* fix

* update
2025-08-28 17:06:32 +08:00
ltd0924
98c217b428 Update config.py (#3669) 2025-08-28 15:30:51 +08:00
co63oc
d4fc893fe3 fix typos (#3633)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-28 14:42:24 +08:00
co63oc
c294fc8139 Fix target_version (#3159)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fix

* fix

* fix
2025-08-28 14:17:54 +08:00
Mattheliu
108d989d9d [Docs] add fastdeploy_unit_test_guide.md (#3484)
* docs:add fastdeploy_unit_test_guide.md

* docs:fix fastdeploy_unit_test_guide.md

* docs: add FastDeploy unit test spec (EN) and update usage nav

* fix codestyle
2025-08-28 14:12:25 +08:00
plusNew001
b791bea0c5 Update run_ci_xpu.sh to lock xvllm version (#3671)
Lock version due to xvllm update causing service errors.
2025-08-28 12:30:50 +08:00
Yuan Xiaolan
d37331fc71 fix w4afp8_gemm_scale_permute import error on A100 (#3611) 2025-08-28 11:42:23 +08:00
YuanRisheng
ad9b95e6dd fix rl bugs (#3654) 2025-08-28 11:09:34 +08:00
yangjianfengo1
e81046fdad 【New Feature】集中式支持w4afp8 (#3644)
* 支持tp w4afp8

* code style
2025-08-28 10:53:24 +08:00
周周周
76513f6416 Support 45t fp8 8 GPU (#3659) 2025-08-28 10:52:53 +08:00
Echo-Nie
7afcd4b776 【Hackathon 9th No.77】supplementary unit test for get_filtered_metrics (#3578)
* 功能模块 fastdeploy/metrics/metrics/get_filtered_metrics 单测补充

* fix pre-commit

---------

Co-authored-by: Tao Luo <luotao02@baidu.com>
2025-08-28 10:39:02 +08:00
ltd0924
3d92fb09f7 [BugFix] fix parameter is 0 (#3592)
* Update engine_client.py

* fix

* Update common_engine.py
2025-08-28 09:52:36 +08:00
Sunny-bot1
479c8b85d3 [Optimize]support machete weight only gemm (#3561)
* support machete weight only gemm

* add generate

* update

* fix

* change file location

* add sm_version limit

* fix

* fix

* fix ci

* fix coverage

* fix xpu
2025-08-28 09:49:58 +08:00
Zero Rains
e37e86b3b8 [V1 Loader]support param create and load for wint2 and xpu backend (#3581)
* support wint2 backend'

* [V1 Loader]support param create and load for wint2 and xpu backend

* update weight shape name

* update

* update

* update baseline.txt

* update model name

* update baseline.txt

* fix codestyle

* remove debug coode
2025-08-28 09:49:36 +08:00
lizexu123
b28a0343a6 fix ENABLE_V1_KVCACHE_SCHEDULER (#3625)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-27 21:21:29 +08:00
ltd0924
2974016103 [BugFix] fix ce bugs (#3641)
* [BugFix] fix tp8 client refuse

* fix engine port bug

* Update utils.py
2025-08-27 20:38:15 +08:00
Yuanle Liu
836345a4dd delete ernie4_5_vl_tokenizer (#3631) 2025-08-27 20:36:02 +08:00
Liumengyuan
11803e0907 fix undefined cuPointerGetAttribute symbol error (#3628) 2025-08-27 20:24:59 +08:00
Jiang-Jia-Jun
c694fa2879 Revert "[Feature] block sparse attention (#3209)" (#3647)
This reverts commit 646a0c2fd8.
2025-08-27 17:35:04 +08:00
李泳桦
b2afdf4fc6 [fix] qwen output inconsistency when top_p=0 (#3634)
* [fix] qwen output inconsistency when top_p=0

* [fix] remove decode pre_id code
2025-08-27 17:16:23 +08:00
lzy
1265f6c192 deepgemm don't support tp+ep (for ci) (#3638)
* deepgemm don't support tp+ep (for ci)

* deepgemm don't support tp+ep (for ci)
2025-08-27 16:39:19 +08:00
plusNew001
f0140be1e1 Change paddlepaddle-xpu installation command (#3646)
Updated the installation command for paddlepaddle-xpu to use a specific wheel file.
2025-08-27 16:17:19 +08:00
JYChen
e645db348b [docs] Update best practice doc (#3539)
* fix some docs error

* [docs] x1 best-practice

* update docs

* fix docs
2025-08-27 15:45:30 +08:00
xjkmfa
afb9f327ef 【CI case】for echo finish_reason text_after_process and raw_prediction check (#3630)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

* echo&finish_reason&text_after_process&raw_prediction check

* echo&finish_reason&text_after_process&raw_prediction check

* echo&finish_reason&text_after_process&raw_prediction check

* echo&finish_reason&text_after_process&raw_prediction check

* echo&finish_reason&text_after_process&raw_prediction check

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2025-08-27 15:21:16 +08:00
chen
5ad8721506 check (#3639) 2025-08-27 14:32:13 +08:00
plusNew001
f8b70bf60c update xpu ci (#3632)
* Update Docker image version in CI workflow

* Modify paddlepaddle-xpu installation and add dependencies

Updated installation source for paddlepaddle-xpu and added dependency download step.

* Fix no_proxy environment variable in CI workflow
2025-08-27 14:25:56 +08:00
chen
ce9c0917c5 [Precision] Support lm_head layer running in float32 (#3597)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support lm_head fp32 bf16 fp16

* support lm_head fp32 bf16 fp16

* add doc and check code

* lm_head_fp32 specify lm_head as fp32

* code check

* check doc
2025-08-27 11:34:53 +08:00
xiaoxiaohehe001
ad319a87cc support fa3 rope3d (#3622) 2025-08-27 11:31:29 +08:00
YUNSHEN XIE
85afa72763 fix publish task (#3635)
* fix publish task

* disable ut
2025-08-27 11:14:53 +08:00
yangjianfengo1
646a0c2fd8 [Feature] block sparse attention (#3209)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* 支持稀疏attn

* fix bug

* code style

* fix moba attn get kv shape

* 修复a100编译

* codestyle

* code style

* code style

* code style

* fix conflict

* 增加单侧

* code style

* 增加eblite 加载时间

* fix bug

* for ci

* for ci

* for ci

* for ci

* 支持mlp block size 128

* 增加小算子单测

* fix 单测 mlp

* 将环境变量加入到config里面

* fix rollout config
2025-08-26 07:16:04 -07:00
RAM
f0a362af18 [CUDAGraph]Switch the scope so that output buffer of CUDAGraph can automatically release (#3612)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* fix typo

* fix typo

* add print dot files

* fix bug

* Switch the scope so that output buffer of cudagraph can automatically release

* Revert "add print dot files"

This reverts commit dc21809eb5.
2025-08-26 21:28:19 +08:00
gaoziyuan
82e64b13e1 [NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop (#3598)
* [Feature] update ep

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix queue ports idx

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* Update engine.py

* fix ci

* fix some bug in mixed ep

* add server fix and op fix

* rm some log

* fix code style

* ltd fix

* fix

* fix

* fix some bug

* fix bug

* fix bug

* fix style

* Update config.py

* Update splitwise_connector.py

* Update cache_messager.py

* Update __init__.py

* merge and fix

* Update engine.py

* Update common_engine.py

* Update run_ci_xpu.sh

* Update ernie_processor.py

* Update ernie_processor.py

---------

Co-authored-by: ltd0924 <ltd0924@sina.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
2025-08-26 19:59:02 +08:00
Yuanle Liu
cbce94a00e rename ernie_xxx to ernie4_5_xxx (#3621)
* rename ernie_xxx to ernie4_5_xxx

* ci fix
2025-08-26 19:29:27 +08:00
YuanRisheng
642480f5f6 [CI] Standard unittest (#3606)
* standard unittest

* fix bugs

* fix script
2025-08-26 19:03:11 +08:00
SunLei
2f28f40d90 fix: replace list * n initialization with list comprehension to avoid shared references (#3618) 2025-08-26 17:53:31 +08:00
bukejiyu
3200a80de3 [v1 loader]support fp8 (#3593)
* support fp8

* update ci
2025-08-26 02:42:46 -07:00
RAM
00898603c8 [CUDAGraph]Add debug func (#3616)
* add print dot files

* refine code
2025-08-26 16:43:48 +08:00
xiaoxiaohehe001
9afa236e39 [NewFeatures] support eplb (#3547)
* [NewFeatures] support eplb

* fix eplb
2025-08-26 16:19:30 +08:00
Yuanle Liu
56e2d7e668 adaptive rms_norm's dtype (#3617)
* adaptive rms_norm's dtype

* adaptive rms_norm's dtype

* add approve coverage

---------

Co-authored-by: liuyuanle <liuyuanle@baidu.com>
2025-08-26 15:29:15 +08:00
lzy
d339df2e90 Supports DP+TP+EP hybrid parallel deployment strategy (#3489)
* Support DP+TP+EP hybrid parallel deployment strategy

* Support DP+TP+EP hybrid parallel deployment strategy

* fix conflict

* add moe_tp_ep function split_allgather_out

* del tp_group in moe_cutlass_backend

* for ci

* fix parallel_config for ci

* del log
2025-08-26 00:04:01 -07:00
freeliuzc
52eda7fdb3 [Feature][MTP]support new speculative decoding method named hybrid mtp with ngram (#3610) 2025-08-26 14:29:22 +08:00
AIbin
0a0d2959b9 qkv_a_proj horizontal fusion (#3591)
Support DSK qkv_a_proj horizontal fusion under V0 Loder
2025-08-26 14:25:57 +08:00
YuBaoku
75db0d1ae2 [CI] reopen sot test (#3613)
* [CI] change check_service time to 360s

* [CI] disable sot test temporarily

* [CI] reopen sot test
2025-08-26 14:23:38 +08:00
xiaoxiaohehe001
70c75798a7 [NewFeatures] support noex rope3d (#3542)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [NewFeatures] support noex rope3d

* [NewFeatures] support noex rope3d encoder
2025-08-26 11:44:57 +08:00
tianlef
0bc7d076fc [CE]add x1 w4a8c8 benchamrk config (#3607)
* [CE]add x1 w4a8c8 benchamrk config

* [CE]add x1 w4a8c8 benchamrk config

* [CE]add x1 w4a8c8 benchamrk config
2025-08-26 11:27:32 +08:00
Ryan
a5b4866ff1 [CudaGraph][SOT] Add unit tests for splitting the static graph into piecewise graphs that support cuda_graph (#3590)
* add unitest

* change sot_warmup_sizes

* wtf; add missed commit
2025-08-26 11:25:04 +08:00
Sunny-bot1
c68c3c4b8b [Feature] bad words support v1 scheduler and specifiy token ids (#3608)
* support bad_words_token_ids

* docs

* fix test

* fix

* bad words support kvcache v1 and token ids

* fix
2025-08-25 20:14:51 -07:00
lizexu123
c43a4bec00 [Features] support hugging face qwen3 dense and qwen2 model (#3574)
* support qwen2 and qwen3 hugging face

* fix moe

* defualt_v1 loader

* hugging_face_format deprecated

* modify hugging_face_foramt to model_format

* model_format auto

* fix environemt

* fix bug

* fix qwen3-0.6 bug

* model_format is str

* fix
2025-08-26 10:54:53 +08:00
ltd0924
66c5addce4 [Bugfix] fix api server control signal bugs (#3531)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* Update serving_chat.py

* Update serving_completion.py

* Update serving_completion.py
2025-08-25 21:13:04 +08:00
RAM
2fa173e327 [Executor] CUDAGraph support RL training (#3265)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* add clear graph opt backend

* cuda graph support rl

* add branch

* 1.fix dynamic_weight_manager bug 2.add clear api for CasualLM

* open test case

* fix typo

* update mkdocs.yaml

* [Docs]Update mkdocs.yml

* update test case

* use unittest in graph test case
2025-08-25 20:59:30 +08:00
Kane2011
2ae7ab28d2 [MetaxGPU] adapt to the latest fastdeploy on metax gpu (#3492) 2025-08-25 17:44:20 +08:00
YuBaoku
c13c904971 [CI] temporarily disable sot test due to occasional timeout issue (#3586)
* [CI] change check_service time to 360s

* [CI] disable sot test temporarily
2025-08-25 14:34:27 +08:00
chen
9cab3f47ff [Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552)
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing

* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs

* delete some code

* code check

* code check and add doc

* fix tokenizer.decoder(-1), return 'Invalid Token'

* add ci for temp_scaled and top_p logprobs

* check test

* check seq len time shape

* logprob clip inf

---------

Co-authored-by: sunlei1024 <sunlei5788@gmail.com>
2025-08-25 14:11:49 +08:00
YUNSHEN XIE
2410adb041 Add coverage skip (#3553)
* add coverage skip

* update

* fix
2025-08-25 14:08:24 +08:00
Yuan Xiaolan
9205c88da1 support w4afp8 EP inference (#3044)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-25 11:27:45 +08:00
YUNSHEN XIE
46664985fc Modify the existing coverage collection method (#3573)
fix cov report
2025-08-25 10:35:35 +08:00
YuBaoku
7821534ff5 [CI] add sot test (#3579)
* [CI] add sot test

* [CI] add sot test
2025-08-25 10:14:50 +08:00
lengxia
137e539456 [Feature][XPU] add custom kernels for mtp (#3537) 2025-08-25 10:14:17 +08:00
bukejiyu
bdbac0aa3d support qwen2 weight only (#3571)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-24 11:14:34 +08:00
bukejiyu
77514e3e1e [V1 Loader] support weight_only (#3413)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* support wint4/wint8

* delete smoe case

* update ci

* print log
2025-08-23 13:13:41 +08:00
Jiang-Jia-Jun
93e1b63200 Revert "[UnitTest][Copilot] Improve unit test coverage for entrypoints module…" (#3564)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
This reverts commit 36325e9ea7.
2025-08-23 10:44:23 +08:00
YuanRisheng
e481b7a779 fix sot (#3556) 2025-08-23 08:37:06 +08:00
Zero Rains
79f0dbbb55 [V1 Loader] Support qwen2(bf16) (#3502)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen2(bf16)

* merge bias_loader and weight_loader
2025-08-23 01:08:23 +08:00
YUNSHEN XIE
cb166053ba fix test name (#3493)
* fix test name

* update

* update

* fix

* fix

* update

* update

* update

* update

* update

* fix

* update
2025-08-22 23:43:47 +08:00
Copilot
36325e9ea7 [UnitTest][Copilot] Improve unit test coverage for entrypoints modules (#3546)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* Initial plan

* Add comprehensive unit tests for entrypoints utilities

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Complete entrypoints test coverage improvement with tool parser tests

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Apply pre-commit formatting to test files - fix trailing whitespace and long lines

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-22 19:20:51 +08:00
zhink
df7c31012b Modified to support custom all reduce by default (#3538) 2025-08-22 16:59:05 +08:00
lddfym
27666ee586 [Feature] Add Qwen25-VL Processor (#3501)
* add qwen-2.5-vl processor

* add qwen25-vl processor

* add qwen25-vl processor

* add qwen25-vl processor

* add qwen25-vl processor position_ids

* add qwen25-vl processor

* add qwen25-vl processor

* position_ids

* add test for qwen25-vl

* organize comments

* formatted

* qwen_vl_processor

* add qwen_vl_processor unittest

* update model path

* update model path

* update qwen_vl_processor unittest

* add unittest and bug fix

* add unittest and bug fix

* Update fastdeploy/input/qwen_mm_processor/image_processor.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/input/qwen_vl_processor.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-22 16:49:42 +08:00
YuanRisheng
5b66462f0e Fix fdconfig bugs (#3528)
* fix config

* fix parallel

* fix ips

* fix rl

* open code
2025-08-22 16:17:15 +08:00
plusNew001
7ae41e9daf [CI] fix xpu ci bug (#3535) 2025-08-22 15:08:39 +08:00
freeliuzc
76759108c9 [Feature][SpeculativeDecoding]Support tree-attention (#3514)
* support tree-attention

* fix merge bug

* fix unit-test api

* fix merge bug
2025-08-22 13:36:41 +08:00
YuBaoku
cc88671507 [CI] add container naming and cleanup logic in workflows (#3526) 2025-08-22 11:42:57 +08:00
YUNSHEN XIE
2630260616 disable stable test (#3529) 2025-08-22 11:38:18 +08:00
YuanRisheng
85fbf5455a [V1 Loader]Ernie VL support loader v1 (#3494)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* ernie vl support new loader

* add unittest

* fix test
2025-08-22 11:16:57 +08:00
Zhang Yulong
3cc182236a update ci (#3519)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-21 20:05:50 +08:00
YuanRisheng
c389a4013c Unify server-side and model-side Config(Part-5) (#3497)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* move config

* fix xpu

* fix

* fix vl

* fix vl

* fix unitest

* fix args

* add unitest

* fix test
2025-08-21 19:00:21 +08:00
yangjianfengo1
e5aa7087db 【bug fix】修复w4a8编译慢 (#3510)
* 修复w4a8编译

* code style

* 修复tma copy
2025-08-21 18:50:14 +08:00
Zhang Yulong
a5692e8b7d Add PD CI case (#3490)
* Create test_ernie_03b_pd.py

* Update test_ernie_03b_pd.py
2025-08-21 18:48:34 +08:00
李泳桦
8bea4b1e25 [fix] fix output tokens count in streaming completion api (#3507) 2025-08-21 18:19:13 +08:00
李泳桦
e4f0b755b4 [fix] setting disable_chat_template while passing prompt_token_ids led to response error (#3228)
* [fix] setting disable_chat_template while passing prompt_token_ids led to response error

* [fix] code syntax

* [test] add test case for this bug

* [test] add test case for empty message list

* [test] fix test case for empty message list
2025-08-21 17:30:51 +08:00
luukunn
371fb3f853 [Feature] add tool parser (#3483)
* add tool parser

* add x1 enable_thinking

* restart ci

* fix vl reasoning parser

* modify call style

* modify call style

* add offline enablethinking

* fix completion

* fix

* fix unit test

* fix unit test

* fix unit test

* fix vl reasoning parser

* fix vl reasoning parser
2025-08-21 17:25:44 +08:00
Yzc216
466cbb5a99 [Feature] Models api (#3073)
* add v1/models interface related

* add model parameters

* default model verification

* unit test

* check model err_msg

* unit test

* type annotation

* model parameter in response

* modify document description

* modify document description

* unit test

* verification

* verification update

* model_name

* pre-commit

* update test case

* update test case

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/entrypoints/openai/serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-21 17:02:56 +08:00
Zhang Yulong
b7eee3aec1 Update CI (#3474)
* update CI cases

* update CI cases

* update CI cases

* update CI cases

* Merge upstream/develop and resolve directory rename conflict

* Merge upstream/develop and resolve directory rename conflict

* Merge upstream/develop and resolve directory rename conflict

* update deploy

* update deploy

* update deploy

* update deploy

* update deploy
2025-08-21 16:49:20 +08:00
qw86972190
c83381d650 revert pr (#3481)
Co-authored-by: iosmers <yinwei_hust@163.com>
2025-08-21 14:19:50 +08:00
ltd0924
51f68ae593 [Feature] add dealer manager to reuse the connection (#3471)
* [BugFix] fix control signal release failed

* [BugFix] fix control signal release failed

* update

* update

* update

* [Feature] add dealer manager to reuse the connection

* fix

* fix

* fix

* fix

* fix

* fix

* Create test_dealer_connection_manager.py

* Delete test/entrypoints/openai directory

* Update test_dealer_connection_manager.py

* Update test_dealer_connection_manager.py
2025-08-21 13:11:13 +08:00
YUNSHEN XIE
985b1265c3 CE 编译任务(合入触发) (#3491)
* add ce compile job

* fix

* update
2025-08-21 11:33:26 +08:00
memoryCoderC
31f639f10b [Feature] add prompt_tokens and completion_tokens (#3504)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-21 10:23:27 +08:00
Zero Rains
30b3f2dc07 [BugFix][V1 Loader] fix the bug in creat weight for block_wise_fp8 (#3486)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-20 05:52:54 -07:00
Ryan
bcdfc1d6b9 Add custom op declaration for all_reduce (#3473)
* add custom op declaration

* roll back try except
2025-08-20 20:29:58 +08:00
Zhang Yulong
33ff0bfe38 Update disaggregated.md (#3495)
修复文档错误
2025-08-20 19:39:18 +08:00
YUNSHEN XIE
e197894977 add e2e cases (#3476)
* add e2e cases

* fix
2025-08-20 18:50:14 +08:00
Zhang Yulong
9ff2dfb162 Create eb45-8k-fp8-tp1-dp8_ep.yaml (#3485)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
混合架构EP并行yaml
2025-08-20 14:33:54 +08:00
YuBaoku
33d369586b [CI] remove useless case (#3482) 2025-08-20 14:20:30 +08:00
xiaolei373
5d131485d8 add error log to file (#3431)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* feat(log):add_request_and_response_log

* feat[log]:add error log to file
2025-08-20 09:52:34 +08:00
YUNSHEN XIE
3a6058e445 Add stable ci (#3460)
* add stable ci

* fix

* update

* fix

* rename tests dir;fix stable ci bug

* add timeout limit

* update
2025-08-20 08:57:17 +08:00
kevin
67298cf4c0 add error traceback info (#3419)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add error traceback info

* update error msg

* update code

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-19 19:32:04 +08:00
yangjianfengo1
b047681c5d 【New Feature】支持Fp8 group Gemm 24稀疏 (#3463)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* 支持24稀疏

* code style

* 增加stmatrix 宏定义判断

* code style
2025-08-19 02:54:47 -07:00
ltd0924
d587fb257f [CI] add test generation demo (#3270)
* Create test_generation.py

* update

* update

* format

* Update test_generation.py

* Update test_generation.py

* Update test_generation.py

* Update test_generation.py

* Update test_generation.py

* Update test_generation.py

* Update test_generation.py

* Update test_generation.py

* Update setup.py

* Delete test/plugins/test_model_runner_register.py

---------

Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
2025-08-19 17:12:40 +08:00
Zero Rains
fef447e350 [V1 Loader] Support MOE parameters create and load for DeepGemm and marlin backend (#3447)
* support deepgemm backend

* support marlin backend

* remove print

* fix process_prequanted_weights
2025-08-19 14:15:53 +08:00
chen
6735626014 fix request_output sampling_params (#3154) (#3464) 2025-08-19 13:52:50 +08:00
ltd0924
bca8905b40 [BugFix] fix control signal release failed (#3390)
* [BugFix] fix control signal release failed

* [BugFix] fix control signal release failed

* update

* update

* update
2025-08-19 13:51:38 +08:00
Zero Rains
8b12c80f90 [FixBug] compute early stopping with real batch size (#3418)
* [FixBug] compute early stopping with real batch size

* update

* fix test_sampler
2025-08-18 22:09:21 -07:00
luukunn
3a7a20d191 [Feature] Pass through the chat_template_kwargs to the data processing module (#3421)
* fix chat_template_args

* fix args

* add offline

* add offline

* fix

* fix

* fix default enable_thinking value

* fix default enable_thinking value

* modify condition

* Revert "modify condition"

This reverts commit 26430bdeb1.

* fix unit test
2025-08-19 10:50:01 +08:00
lizexu123
a053ab889b [BugFix] fix num_running_requests in cuda_graph (#3457)
* fix cuda_grpah

* add note

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-08-19 10:47:22 +08:00
AIbin
beec24fd89 【Inference Optimize】DeepSeek-v3 model inference performance optimization (#3455)
* DSK_OPT_01

* update FA3
2025-08-19 10:42:42 +08:00
zhuzixuan
c95b3395e9 【BugFix】completion接口echo回显支持 (#3245)
* wenxin-tools-511,修复v1/completion无法回显的问题。

* 支持多prompt的回显

* 支持多prompt情况下的流式回显

* 补充了 completion 接口支持 echo 的单元测试

* pre-commit

* 移除了多余的test文件

* 修复了completion接口echo支持的单测方法

* 补充了单元测试文件

* 补充单测

* unittest

* 补充单测

* 修复单测

* 删除不必要的assert.

* 重新提交

* 更新测试方法

* ut

* 验证是否是正确思路单测

* 验证是否是正确思路单测

* 验证是否是正确思路单测3

* 优化单测代码,有针对性地缩小单测范围。

* 优化单测代码2,有针对性地缩小单测范围。

* 优化单测代码3,有针对性地缩小单测范围。

* support 'echo' in chat/completion.

* update

* update

* update

* update

* update

* update

* 补充了关于tokenid的单元测试

* update

* 修正index错误

* 修正index错误
2025-08-19 10:41:51 +08:00
lizexu123
32b39620bc [Code Simplification] remove cum_offsets (#3410)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-18 20:21:25 +08:00
YUNSHEN XIE
2cf96ddd68 add publish workflow (#3063)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add publish job

* update

* update
2025-08-18 16:42:36 +08:00
luukunn
9c129813f9 [Feature] add custom chat template (#3251)
* add custom chat_template

* add custom chat_template

* add unittest

* fix

* add docs

* fix comment

* add offline chat

* fix unit test

* fix unit test

* fix

* fix pre commit

* fix unit test

* add unit test

* add unit test

* add unit test

* fix pre_commit

* fix enable_thinking

* fix pre commit

* fix pre commit

* fix unit test

* add requirements
2025-08-18 16:34:08 +08:00
Jundong Liu
70ee910cd5 [Excutor] Change cudagraph hashkey from batch size to num_tokens (#3454) 2025-08-18 16:16:48 +08:00
Jundong Liu
ea4a3b479c [Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool (#3404)
* 修复buffer申请不够大,增加打印forwardmetadata的工具

* fix mistake

* Make CPU tensor in CPUPlace

* Add test about forward_meta_str and Add unitest_requirement

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-08-18 16:14:09 +08:00
chen
5585cf7aa5 fix mtp_rej_topp input (#3450) 2025-08-18 16:12:42 +08:00
Divano
246cd7b3a5 Perf (#3453)
* add repitation early stop cases

* add repitation early stop cases

* add stress tool
2025-08-18 15:37:46 +08:00
gaoziyuan
6fdd83da10 fix some bug (#3434) 2025-08-18 14:39:13 +08:00
freeliuzc
a12d0bc549 [Feature][MTP]update multi-draft-token strategy (#3369)
* update multi-draft-token strategy

* fix format

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-08-18 13:59:56 +08:00
Zhang Yulong
3ee6053e5d Add ci case (#3355)
* add ci cases

* debug

debug H20 baseline

* Update run_pre_ce.sh

* Update test_EB_Lite_serving.py

* Update test_EB_VL_Lite_serving.py

* Update test_EB_Lite_serving_mtp.py

* Update test_Qwen3-MoE_serving.py

* Update test_Qwen2-7B-Instruct_serving.py

* Update run_pre_ce.sh
2025-08-18 11:35:56 +08:00
chen
e88f5552db fix cpu __ini__.py (#3448)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-17 12:38:54 +08:00
RAM
33c0197ebe [Docs] Update mkdocs.yml (#3444)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Updata docs of graph opt backend

* update best_practices

* update mkdocs.yaml

* [Docs]Update mkdocs.yml
2025-08-15 21:57:40 +08:00
RAM
154308102e [Docs]Updata docs of graph opt backend (#3442)
* Updata docs of graph opt backend

* update best_practices
2025-08-15 21:30:32 +08:00
yongqiangma
5703d7aa0f update installation readme (#3429) 2025-08-15 19:09:41 +08:00
yangjianfengo1
615930bc05 Update README (#3426)
* 修改READMe

* code style

* code style
2025-08-15 18:46:28 +08:00
JYChen
6f11171478 fix some docs error (#3439) 2025-08-15 18:45:27 +08:00
yinwei
354575b6d1 [Docs]Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95 (#3428)
* XPU Update 2.1 Release Documentation

* code style check

* Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95
2025-08-15 18:34:37 +08:00
YUNSHEN XIE
cc8ee50f27 add accuracy check ci (#3389)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add accuracy ci

* fix

* fix

* update

* rename ci jobs
2025-08-15 15:17:43 +08:00
GoldPancake
4bd6a9fa7d [Bugs] Fix DeepGEMM pre-compile tools. (#3351)
Fix some miss cache problems.
Add README.md.
2025-08-15 14:37:49 +08:00
ming1753
d4e3a20300 [Docs] Release 2.1 docs and fix some description (#3424) 2025-08-15 14:27:19 +08:00
yinwei
fbb6dcb9e4 [Docs]XPU Update 2.1 Release Documentation (#3423)
* XPU Update 2.1 Release Documentation

* code style check
2025-08-15 14:07:47 +08:00
JYChen
562e01c979 update docs (#3420) 2025-08-15 13:00:08 +08:00
Jiang-Jia-Jun
cca96ab1e4 Update Dockerfile.gpu 2025-08-15 12:29:20 +08:00
Jiang-Jia-Jun
7132fa9ec2 Update dockerfile 2025-08-15 12:28:08 +08:00
Sunny-bot1
6c1f3ff897 topk_gating_softmax support bias (#3405) 2025-08-15 11:57:45 +08:00
ltd0924
5a84324798 [Doc] Add multinode deployment documents (#3417)
* Create multi-node_deployment.md

* Create multi-node_deployment.md

* Update mkdocs.yml
2025-08-15 10:37:04 +08:00
chen
f0f00a6025 [OPs] Universal optimization and Fix early_stop cuda 700 (#3375)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* delete nonzero

* delete setup_ops_base.py

* check if

* check gcp infer_seed.cpu()

* fix repetition_early_stopper_kernel cuda 700
2025-08-14 22:40:44 +08:00
YuanRisheng
09c979f3dd [V1 Loader] Support Ernie text(moe and dense) (#3110)
* new loader support 0.3B

* fix weight

* support parallel load

* support parallel load

* fix slice

* support moe

* delete code

* perfect code

* perfect code
2025-08-14 20:25:28 +08:00
xjkmfa
ab60292f89 【CI】 evil case (#3359)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

* 边缘检测 ,攻击性测试

* 边缘检测 ,攻击性测试

* 边缘检测 ,攻击性测试

* 边缘检测 ,攻击性测试

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2025-08-14 20:00:47 +08:00
freeliuzc
cacc52bf21 modify readme (#3409) 2025-08-14 19:47:36 +08:00
Sunny-bot1
79d8ae4c38 [UT Fix] Fix bad_words test (#3385)
* fix bad_words test

* add streaming

* fix

* fix
2025-08-14 03:55:02 -07:00
lzy
1e06b9fa6d make append_attn supports mask_offset (#3138)
* make append_attn supports mask_offset

* add unittest
2025-08-14 03:40:55 -07:00
memoryCoderC
6031f9a5f5 [BugFix] fix ErnieProcessor not set raw_prediction (#3400) 2025-08-14 18:07:49 +08:00
YUNSHEN XIE
f72db9386c Add requirements for running unit tests (#3350)
* Add requirements for running unit tests

* update
2025-08-14 17:37:18 +08:00
lizexu123
7b596d0877 [BugFix] fix real_bsz in ep (#3366)
* Your commit message here

* fix ep

* delete cuda_graph
2025-08-14 17:31:19 +08:00
gaoziyuan
0ea8712018 fix op tests (#3398) 2025-08-14 16:45:25 +08:00
Sunny-bot1
2e7831185f [Optimize]Add norm_weights feature for topk_gating_softmax (#3372)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-14 15:05:23 +08:00
Jiang-Jia-Jun
666ab65a51 [Polish Code] Remove useless notes 2025-08-14 14:04:52 +08:00
Jiang-Jia-Jun
dd583fb16a [BugFix] Fix default log level of paddleformers (#3376)
* [BugFix] Fix default log level of paddleformers

* [BugFix] Fix default log level of paddleformers

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-08-14 11:36:24 +08:00
xiaolei373
d4f610e4cd feat(log):add_request_and_response_log (#3373)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-13 23:27:41 +08:00
ming1753
396dba0d62 [Bug Fix] Fix V1 video bug (#3388)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-13 23:04:07 +08:00
YUNSHEN XIE
1ace375fc3 Optimize CI execution workflow (#3371)
* Optimize CI execution workflow

* fix
2025-08-13 18:47:31 +08:00
Zero Rains
be94bdd0b0 [Loader V1] modify layername for DeepSeekV3 (#3336)
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
2025-08-13 15:47:06 +08:00
memoryCoderC
f702a675a1 fix TestOpenAIServingCompletion fail (#3368) 2025-08-13 15:45:07 +08:00
EnflameGCU
d1a92e3e17 [GCU] Enable gcu CI (#3190)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [GCU] Update to the latest version

* [GCU] Enable CI
2025-08-13 11:48:24 +08:00
yzwu
ce9180241e [Iluvatar GPU] Modify the names of some variables (#3273) 2025-08-13 11:38:02 +08:00
Kane2011
b4fef2cf29 [MetaxGPU] Support FastDeploy on metax gpu (#3241)
* [MetaxGPU] Support FastDeploy on metax gpu

* Update metax_worker.py

1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;

* Update __init__.py

1. remove metax's key work comment

* Update __init__.py

1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import

---------

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-08-13 11:11:54 +08:00
Ryan
ed6bff215a fix custom op order rms_norm_eps (#3348) 2025-08-13 10:12:49 +08:00
Sunny-bot1
8224b21525 Refactor moe_topk_select op to use apply_norm_weight as a template parameter (#3345)
* Refactor moe_topk_select op to use apply_norm_weight as a template parameter

* update test
2025-08-13 08:44:16 +08:00
luukunn
eda83ca672 add Tool Parser (#3272)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add tool-parser

* add tool-parser

* add tool parser

* add tool parser

* fix

* add offline

* add offline

* fix

* parsers:tool&reasoning

* 修改tool parser名称·

* update

* fix reasoning-parser

* add requirements

* fix finish reason

* fix

* fix reasoning-parser

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>
2025-08-13 01:06:55 +08:00
memoryCoderC
2d1a4cacdf Completion add raw_prediction/text_after_process (#3356) 2025-08-12 23:06:45 +08:00
zhink
2c0d853067 add test for CustomAllreduce (#3313)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-12 20:44:47 +08:00
YUNSHEN XIE
8791ad4e61 Pre ce modified (#3335)
* update

* update

* fix

* fix

* update

* update

* update

* fix

* update
2025-08-12 20:25:03 +08:00
memoryCoderC
c575611a5b [BugFix] v1/completions add finish_reason (#3246)
* [BugFix] v1/completions add finish_reason

* update TestOpenAIServingCompletion for merge

---------

Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
2025-08-12 19:40:26 +08:00
Jiang-Jia-Jun
90bfa0be9c Update envs.py 2025-08-12 16:24:47 +08:00
Jiang-Jia-Jun
5620bd12de Update envs.py 2025-08-12 16:24:33 +08:00
YUNSHEN XIE
7d0d5a543a Use latest PaddlePaddle package (#3347)
* Use latest PaddlePaddle package

* fix
2025-08-12 16:23:41 +08:00
gaoziyuan
ccc7f1beb3 fix mapping (#3320) 2025-08-12 16:15:59 +08:00
RichardWooSJTU
283da92bfa fix ep lm head (#3244)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-12 15:38:28 +08:00
ming1753
f5164215be [Bug Fix] fix vl V1 schedule bug (#3323)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug Fix] fix vl V1 schedule bug

* fix format
2025-08-12 11:31:39 +08:00
yangjianfengo1
b808c49585 [Doc] 增加中英文切换 (#3318)
* 增加中英文切换

* 增加中英文切换

* 修改readme
2025-08-12 11:20:45 +08:00
chenjian
b21272d9ff [Bug fix] fix block num setting in scheduler v1 for develop (#3303)
* fix block num setting in scheduler v1

* fix block num setting in scheduler v1

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting
2025-08-12 10:38:51 +08:00
Jiang-Jia-Jun
183e3863e8 Remove useless code (#3337) 2025-08-12 10:32:31 +08:00
Sunny-bot1
19fda4e912 fix docs (#3332)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-11 21:03:49 +08:00
JYChen
973ddad91e fix unittest (#3328) 2025-08-11 20:58:24 +08:00
Divano
f27e879785 Update _base_test.yml (#3331) 2025-08-11 20:57:20 +08:00
Sunny-bot1
789dc67ff7 [Docs]fix sampling docs (#3113)
* fix sampling docs

* fix sampling docs

* update
2025-08-11 20:42:27 +08:00
Divano
8bf96217b4 Update test_evil_cases.py 2025-08-11 20:27:02 +08:00
YUNSHEN XIE
770b0aa3c5 fix ci pypi index error (#3326) 2025-08-11 20:21:08 +08:00
kevin
9627619235 fix uvicorn multi worker error (#3300)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-11 19:39:41 +08:00
Zero Rains
b23af29d0b Launch expert_service before kv_cache initialization in worker_process (#3045)
* launch expert_service before kv_cache initialization

* add two signal make sure model loading and expert_service lauching finished

* fix the EP bug

* fix ep

* update launching way

* fix ep

* update

* roback ep

* pre-commit all files

---------

Co-authored-by: RAM <gstian5555@outlook.com>
Co-authored-by: Divano <dddivano@outlook.com>
2025-08-11 19:38:46 +08:00
Zhang Yulong
c27a3dc43b Update deploy.py (#3310)
* Update deploy.py

更新部署工具

* Update deploy.py
2025-08-11 19:11:57 +08:00
Jiang-Jia-Jun
c56c99837a Revert "[BugFix] num_seqs (#3291)" (#3316)
This reverts commit e0aeac58e1.
2025-08-11 16:16:51 +08:00
Yuanle Liu
9571c458f0 enhance eos_tokens (#3274)
* enhance eos_tokens

* update

* update
2025-08-11 14:47:52 +08:00
Divano
21caa63794 update base test (#3304)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* update base test

额外启动一次服务测试repetition stop

* Update _base_test.yml
2025-08-11 14:15:45 +08:00
Zero Rains
42af0b4b64 [V1 Loader] Support DeepSeekV3(bf16) (#3294)
* Support new loader for DeepSeekV3(bf16)

* update paddle version

* remove useless attr
2025-08-11 13:39:28 +08:00
lizexu123
e0aeac58e1 [BugFix] num_seqs (#3291)
* fix num_seqs

* merge develop
2025-08-11 13:38:55 +08:00
chenjian
b88537a456 fix bug for scheduler v0 (#3308) 2025-08-11 13:07:04 +08:00
xjkmfa
71018fb62e 【CI case】include total_tokens in the last packet of completion interface stream output (#3279)
* Add ci case for min token and max token

* 【CI case】include total_tokens in the last packet of completion interface stream output

---------

Co-authored-by: xujing43 <xujing43@baidu.com>
2025-08-11 10:59:47 +08:00
Divano
0b77d396ad Acc (#3301)
* add repitation early stop cases

* add repitation early stop cases

* add accuracy cases
2025-08-11 10:22:06 +08:00
Divano
79868be220 Update _base_test.yml (#3299)
add more cases
2025-08-11 10:03:27 +08:00
chen
46c8491201 merge logprob into batch_output (#3266) 2025-08-11 10:03:00 +08:00
Divano
566badb83c Update _base_test.yml (#3298) 2025-08-11 09:40:14 +08:00
Divano
eaae4a580d Split cases (#3297)
* add repitation early stop cases

* add repitation early stop cases

* split repetition_early_stop from the base test
2025-08-11 09:38:35 +08:00
chenjian
c011cb8b16 [Bug Fix] Fix scheduler bug in develop (#3292)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fix scheduler bug in develop

* Fix scheduler bug in develop

* Fix scheduler bug in develop
2025-08-10 13:55:38 +08:00
Jundong Liu
1e4968e810 [Excutor] Fixed the issue of CUDA graph execution failure caused by different branches during decoding (#3223)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* 彻底解决解码切块问题

* update C8 and C4 kernel

* fix problem

* fix with pre-commit

* retain branch for mtp
2025-08-09 07:37:19 +08:00
ltd0924
31d4fcb425 [BugFix] fix too many open files problem (#3256)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Update cache_messager.py

* fix too many open files problem

* fix too many open files problem

* fix too many open files problem

* fix ci bugs

* Update api_server.py

* add parameter

* format

* format

* format

* format

* Update parameters.md

* Update parameters.md

* Update serving_completion.py

* Update serving_chat.py

* Update envs.py

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-08 20:10:11 +08:00
YUNSHEN XIE
22255a65aa add base test ci (#3225) 2025-08-08 19:08:55 +08:00
gaoziyuan
a799d14df1 [Bugfix] Fix model accuracy in some ops (#3231)
* fix noaux_tc op

* fix

* update

* fix qk norm

* fix linear for prequant loader

* test

* fix

* fix

* rm some print

* fix noaux_tc op

* test

* Fix the confused enable_early_stop when only set early_stop_config (#3214)

* fix the confused early_stop_config when only set early_stop_config

* pre-commit

* write a general method

* Add ci case for min token and max token (#3229)

Co-authored-by: xujing43 <xujing43@baidu.com>

* add some evil cases (#3240)

* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases

* add evil cases

* qwen3_moe (#3084)

* [Feature] support seed parameter (#3161)

* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix

* 【Fix Bug】 修复 fa3 支持集中式bug (#3235)

* fix fa3 集中式bug

* 增加qknorm参数

* fix qk norm

* fix

* update

* fix linear for prequant loader

* fix

* fix

* rm some print

* fix

* fix moe init weight&scale

* fix moe init weight&scale

---------

Co-authored-by: bukejiyu <395822456@qq.com>
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: Zero Rains <linjunlu@zerorains.top>
Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com>
Co-authored-by: xujing43 <xujing43@baidu.com>
Co-authored-by: Divano <dddivano@outlook.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com>
Co-authored-by: qingqing01 <dangqingqing@baidu.com>
2025-08-08 17:30:37 +08:00
Zero Rains
ce1f353c70 Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend (#3148)
* w4a8 bug

* fix w4a8 bug

* remove code

* modify the triton backend

* fix ep

* fix the bug with tensor_wise_fp8 in triton backend

* fix the RL

* fix bug by merge

* fix the bug in w4a8

* fix the tensor_wise_fp8 bug

* fix RL
2025-08-08 15:55:47 +08:00
plusNew001
d0e9a70380 [CI] add CI logprobs case (#3189)
* [ci] add CI case

* [ci] add CI case

* [ci] add CI case

* [ci] add CI case

---------

Co-authored-by: ZhangYulongg <1272816783@qq.com>
2025-08-08 15:47:55 +08:00
freeliuzc
71267840f7 【Fix】fix mtp bug (#3139) 2025-08-08 13:30:12 +08:00
bukejiyu
b76b17fc1b qwen3 0.3B fix (#3255)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-08 11:35:40 +08:00
Yuanle Liu
fac2f64837 delete parallel_state.py (#3250) 2025-08-08 11:03:29 +08:00
yzwu
fbdd6b0663 [Iluvatar GPU] Optimze attention and moe performance (#3234) 2025-08-08 10:51:24 +08:00
bukejiyu
37569cca86 [feat]add fast_weights_iterator (#3258)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add fast_weights_iterator

* update

* update
2025-08-07 22:36:46 +08:00
chenjian
5f0b30f6d0 support logprob in scheduler v1 (#3249)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-07 20:14:01 +08:00
Yzc216
6037dd5d9c [fix] multi source download (#3259)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit

* Change default download

* change requirements.txt

* modify English Documentation

* documentation

* modify model download path

* add requirements

* error optimization

* 连接失败兜底

* 连接失败兜底

* 连接失败兜底

* unit test

* unit test

* unit test

* test

* test

* 兜底修改

* Trigger CI
2025-08-07 19:30:39 +08:00
JYChen
9423c577fe [stop_seq] fix out-bound value for stop sequence (#3216)
* fix out-bound value for stop sequence

* catch error if there are out-of-bounds value

* check in offline mode

* add ut tests
2025-08-07 15:40:21 +08:00
Divano
5885285e57 Ce add benchmark test (#3262)
* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases

* add evil cases

* add benchmark gsm8k
2025-08-07 15:28:30 +08:00
YuBaoku
55ac449c31 [CI] remove useless case (#3261) 2025-08-07 15:09:40 +08:00
RAM
820798aec5 [Executor]Update graph test case and delete test_attention (#3257)
* 1.update graph test case 2.delete test_attention

* code style

* delete print
2025-08-07 14:05:15 +08:00
YuanRisheng
0074b423a9 fix ci bug (#3239)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-07 11:32:39 +08:00
hong19860320
93a1731891 [Doc] Update deps and fix dead links (#3252) 2025-08-07 11:04:31 +08:00
李泳桦
09cc4e2802 [fix] fix completion stream api output_tokens not in usage (#3247) 2025-08-07 10:36:00 +08:00
Yzc216
d9e3f88f9e [Feature] multi source download (#3125)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit

* Change default download

* change requirements.txt

* modify English Documentation

* documentation

* modify model download path

* add requirements

* error optimization

* 连接失败兜底

* 连接失败兜底

* 连接失败兜底

* unit test

* unit test

* unit test

* test

* test
2025-08-07 00:40:27 +08:00
bukejiyu
9408e667a5 [bugfix]fix blockwisefp8 and all_reduce (#3243)
* fix

* update

* fix linear for prequant loader
2025-08-06 23:54:33 +08:00
yangjianfengo1
3a15e0c53e 【Fix Bug】 修复 fa3 支持集中式bug (#3235)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix fa3 集中式bug

* 增加qknorm参数
2025-08-06 16:24:27 +08:00
lizexu123
afff4d37ea [Feature] support seed parameter (#3161)
* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix
2025-08-06 15:20:47 +08:00
bukejiyu
20839abccf qwen3_moe (#3084) 2025-08-06 14:45:27 +08:00
Divano
91dc87f1c5 add some evil cases (#3240)
* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases

* add evil cases
2025-08-06 14:23:55 +08:00
xjkmfa
256a82b0b3 Add ci case for min token and max token (#3229)
Co-authored-by: xujing43 <xujing43@baidu.com>
2025-08-06 14:10:57 +08:00
Zero Rains
36dc73470d Fix the confused enable_early_stop when only set early_stop_config (#3214)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix the confused early_stop_config when only set early_stop_config

* pre-commit

* write a general method
2025-08-06 11:42:27 +08:00
YuanRisheng
a6e8b780f8 fix approve (#3224) 2025-08-06 10:36:01 +08:00
yangjianfengo1
89397516a8 [New Feature] Support W4Afp8 MoE GroupGemm (#3171)
* init

* 增加多线程编译

* fix bug

* fix bug

* code style

* 增加fp16

* 将print替换成assert

* 修复stmatrix

* 减小单测shape

* 减小单测shape
2025-08-06 10:34:05 +08:00
sg263
841e831575 [Trace]add trace when fd start (#3174)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* fix annotation

* fix annotation when add opentelemetry

* fix opentelemetry-instrumentation-fastapi

* fix pentelemetry-bootstrap

* fix opentelemetry can not work in uvicorn

* move conf to env

* fd start add trace

* fix pre-commit

* fix pre-commit

* change FD_JOB_ID

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: shige <shige@baidu.com>
2025-08-05 21:18:27 +08:00
YUNSHEN XIE
e0bbd3b6ca fix approve ci (#3212)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-05 17:21:26 +08:00
Yuan Xiaolan
7ce00e597c support qk norm (#3145) 2025-08-05 16:46:14 +08:00
RAM
4a10e29804 fix mla attention backend (#3176) 2025-08-05 16:43:15 +08:00
Yuan Xiaolan
af543b7f0f revise get_moe_scores (#3164) 2025-08-05 16:43:07 +08:00
Divano
e24929efa3 Ce add bad cases (#3215)
* add repitation early stop cases

* add repitation early stop cases

* add bad cases

* add bad cases
2025-08-05 16:37:28 +08:00
lizexu123
b01cfd6007 [BugFix] support real batch_size (#3109)
* support real bsz

* fix

* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py

* add event_loop_ep

* fix

* Add comments

* fix

* support mtp real_batch_size

* fix

* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer

* fix

* fix VL real_seq_lens_this_time

* fix

* fix mtp

* fix

* fix mtp

* fix xpu

* fix
2025-08-05 16:33:54 +08:00
Jiang-Jia-Jun
55939f7942 Update engine.py 2025-08-05 16:10:36 +08:00
chen
04fc7eb931 fix test_air_top_p_sampling name (#3211) 2025-08-05 15:47:50 +08:00
Divano
9f1936ae28 Ce add repitation early stop cases (#3213)
* add repitation early stop cases

* add repitation early stop cases
2025-08-05 15:47:28 +08:00
RichardWooSJTU
1e9a8e8cef fix lm head bias (#3185)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-05 15:40:24 +08:00
RichardWooSJTU
f5c64a074c [EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization (#3182)
* Add support for mixed-ep across multi nodes

* code refine

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-05 15:40:11 +08:00
ming1753
14ed75f7d3 [Test] scaled_gemm_f8_i4_f16 skip test while sm != 89 (#3210) 2025-08-05 15:25:28 +08:00
yangjianfengo1
40f7f3e0d8 [New Feature] fa3 支持flash mask (#3184)
* 支持flash mask

* 修改test_flash_mask

* 修改test.sh
2025-08-05 12:20:48 +08:00
YUNSHEN XIE
b8f3c73aac fix coverage report (#3198)
* fix coverage report

* fix
2025-08-05 11:24:55 +08:00
Divano
fb7a0689cc add more cases (#3207) 2025-08-05 11:17:36 +08:00
RAM
c593e1a39c [Bug Fix]Fix bug of append attention test case (#3202)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-05 11:04:45 +08:00
RichardWooSJTU
e39159f3bd Add switch to apply fine-grained per token quant fp8 (#3192)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-04 19:54:03 -07:00
Divano
88596c0c63 Add more base chat cases (#3203)
* add test base class

* fix codestyle

* fix codestyle

* add base chat
2025-08-05 10:24:12 +08:00
lizhenyun01
fe540f6caa [plugin] Custom model_runner/model support (#3186)
* support custom model&&model_runner

* fix merge

* add test && update doc

* fix codestyle

* fix unittest

* load model in rl
2025-08-04 18:52:39 -07:00
Sunny-bot1
72ef5a9c93 [FIX]fix bad_words when sending requests consecutively (#3197)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix bad_words

* fix log

* fix log
2025-08-04 05:59:41 -07:00
Yuan Xiaolan
1f8289e106 fix expertwise_scale (#3181) 2025-08-04 20:06:15 +08:00
YuBaoku
3eb9a5df60 [CI] add test_compare_top_logprobs (#3191) 2025-08-04 19:49:24 +08:00
SunLei
68bc1d12c0 [Bugfix] Fix uninitialized decoded_token and add corresponding unit test. (#3195) 2025-08-04 19:23:58 +08:00
Longzhi Wang
01d7586661 [Bug fix] Fix cudagraph when use ep. (#3130)
* fix cudagraph when use ep

* fix typo

* reduce full length to adapt large bsz such 128/256
2025-08-04 18:06:18 +08:00
周周周
2bd8a50649 remove useless code (#3166) 2025-08-04 18:03:08 +08:00
gaoziyuan
0443587a57 【Feature】support qwen3 name_mapping (#3179)
* add fd plugins && rm model_classed

* fix reviews

* add docs

* fix

* fix unitest ci

* support qwen3 name_mapping
2025-08-04 01:34:07 -07:00
Zero Rains
17f51f0c92 [unitest] fix the bug in test_sampler (#3157) 2025-08-04 01:23:25 -07:00
YuanRisheng
79bbacc152 Fix approve shell scripts (#3108)
* fix approve

* fix
2025-08-04 15:51:33 +08:00
Divano
3bfb2eca92 Update test_base_chat.py (#3183) 2025-08-04 15:09:53 +08:00
ltd0924
c9e6ce1518 Update cache_messager.py (#3172) 2025-08-04 14:32:34 +08:00
gaoziyuan
4021d66ea5 【Feature】add fd plugins && rm model_classes (#3123)
* add fd plugins && rm model_classed

* fix reviews

* add docs

* fix

* fix unitest ci
2025-08-03 19:53:20 -07:00
bukejiyu
1582814905 fix load_pre_sharded_checkpoint (#3152)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-04 10:44:20 +08:00
Divano
66d3bb89ad Update __init__.py (#3163)
升级测试基类兼容性
2025-08-04 09:40:09 +08:00
AIbin
22fe695f1c 【Inference Optimize】Support automatic generation of marlin kernel (#3149)
* Support automatic generation of marlin kernel
2025-08-01 22:43:18 +08:00
ApplEOFDiscord
b71cbb466d [Feature] remove dependency on enable_mm and refine multimodal's code (#3014)
* remove dependency on enable_mm

* fix codestyle check error

* fix codestyle check error

* update docs

* resolve conflicts on model config

* fix unit test error

* fix code style check error

---------

Co-authored-by: shige <1021937542@qq.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-01 20:01:18 +08:00
plusNew001
243394044d [XPU]Updata XPU dockerfiles (#3144)
* [CI] add xpu ci case

* [CI]Update run_ci_xpu.sh

* [XPU]Update Dockerfile.xpu

* Update Dockerfile.xpu
2025-08-01 19:41:59 +08:00
Zhang Yulong
0eb32bb9c8 add cases (#3155) 2025-08-01 18:38:57 +08:00
yangjianfengo1
64d7a3194d 集中式支持fa3 (#3112) 2025-08-01 18:03:36 +08:00
YUNSHEN XIE
bdb83e007d fix ci (#3141) 2025-08-01 17:42:26 +08:00
Divano
50db0d7ba9 add case (#3150)
* add test base class

* fix codestyle

* fix codestyle

* add base chat
2025-08-01 17:30:58 +08:00
Ryan
94264bbf60 [Code Simplification] Refactor Post-processing in VL Model Forward Method (#2937)
* rm sth useless

* refactor model forward

* mv bool index to kernel
2025-08-01 17:28:07 +08:00
yinwei
3a4db15765 Fix out-of-memory issue during single-XPU deployment (#3133) 2025-08-01 17:12:03 +08:00
JYChen
c34088b0fd fix stop seq unittest (#3126) 2025-08-01 16:50:05 +08:00
ming1753
fc5f43c6bc [Docs] Optimal Deployment (#2768) 2025-08-01 11:56:27 +08:00
chen
a2f5cc54f8 moe preprocess op support 160 experts and fused_moe triton kernel name add K (#3121) 2025-08-01 10:46:20 +08:00
Divano
1d93565082 [CE] Add base test class for web server testing (#3120)
* add test base class

* fix codestyle

* fix codestyle
2025-07-31 23:28:50 +08:00
YUNSHEN XIE
e1011e92d9 disable test_cuda_graph.py (#3124) 2025-07-31 22:03:48 +08:00
plusNew001
8c63237cfa [CI] add xpu ci case (#3111)
* [CI] add xpu ci case

* [CI]Update run_ci_xpu.sh
2025-07-31 22:03:34 +08:00
YUNSHEN XIE
ff6a109b4d Describe PR diff coverage using JSON file (#3114)
* Refactored ci pipeline

* update

* Describe PR diff coverage using JSON file

* remove pip cache setting from Approve

* fix

* update
2025-07-31 21:59:20 +08:00
SunLei
dade19d7a4 [Feature] General support for logprobs (#2974)
* [Feature] support logprobs in chat/completions and completions endpoints

* Temporarily comment out text_offset due to incorrect logic

* Clean up temporary debug prints

* [Feature] support logprobs in offline mode via SamplingParams

* fix: serialize Logprob as dict before zmq send to fix msgpack error

* refactor: remove redundant methods to simplify codebase

* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization

* refactor: centralize param validation in engine_client to reduce duplication

* revert: rollback changes in offline_demo.py

* revert: rollback changes in offline_demo.py

* [bugfix] fix parameter validation for logprobs

* [bugfix] fix parameter validation for logprobs

* [bugfix] fix parameter validation for logprobs

* [bugfix] fix parameter validation for logprobs

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 20:25:56 +08:00
chenjian
fe17410f9c [BUG] Fix bug for pd in fd (#3034)
* Fix bug for pd in fd

* Fix bug for pd in fd

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 20:17:27 +08:00
Zhang Yulong
1a543bca29 Fix test_EB_Lite_serving.py (#3119)
* Fix test_EB_Lite_serving.py

* fix test_EB_Lite_serving.py
2025-07-31 20:15:25 +08:00
Yuan Xiaolan
5f56d289a7 fix is_permuted (#3098)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 19:58:05 +08:00
LiqinruiG
25005fee30 [Doc] add chat_template_kwagrs and update params docs (#3103)
* add chat_template_kwagrs and update params docs

* add chat_template_kwagrs and update params docs

* update enable_thinking

* pre-commit

* update test case

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 19:44:06 +08:00
kevin
22cab724e8 [Feature] block scheduler v1 support prefix caching (#3061)
* block scheduler v1 support prefix cache

* update code

* update code

* fix code bug

* add timeout time

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 19:29:19 +08:00
chenjian
32307283f1 Fix bug for offline inference in scheduler v1 (#3117) 2025-07-31 17:54:24 +08:00
YUNSHEN XIE
583eae2fd1 fix ci (#3106)
* fix ci

* disable test_non_streaming_chat_with_min_tokens
2025-07-31 17:25:08 +08:00
JYChen
1ef38b1563 [doc] best practice for eb45 text models (#3002)
* [doc] best practice for eb45 text models

* fix docs
2025-07-31 17:21:55 +08:00
Jiang-Jia-Jun
4498058722 Update README.md 2025-07-31 15:33:12 +08:00
Jiang-Jia-Jun
66304cf921 Update sampling.md 2025-07-31 15:02:57 +08:00
yinwei
5b9aec1f10 xpu release 2.0.3 (#3105) 2025-07-31 14:26:07 +08:00
YUNSHEN XIE
66c3835a46 add approve ci (#3093)
* add approve ci

* fix

* fix
2025-07-31 10:10:10 +08:00
RAM
d850660872 [Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989)
* reset decoder_block_shape_q buffer

* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch

* update decode_max_tile_size

* fix pre-commit

* update block_multihead_attn_backend

* update flas attn backend

* update MLA Attention

* update XPU Attention

* update gcu,iluvatar model runner

* Update MTP

* fix MTP bug
2025-07-31 00:09:31 +08:00
Jiang-Jia-Jun
998968f1e8 [Doc] Update parameters of serving 2025-07-30 22:35:01 +08:00
chenjian
fe0e3f508b [BUG FIX] Fix bug when preempted request rescheduled (#3080)
* Fix bug when preempted request rescheduled

* Fix bug when preempted request rescheduled

* Fix bug when preempted request rescheduled
2025-07-30 22:25:47 +08:00
Jiang-Jia-Jun
0616c208d2 [Feature] Support include_stop_str_in_output in completion api (#3096)
* [Feature] Support include_stop_str_in_output in completion api

* Fix ci test

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-30 22:18:48 +08:00
YuanRisheng
7dfdd157ac [BugFix]Fix ep size (#3092)
* fix ep

* fix num_layer
2025-07-30 21:03:12 +08:00
ltd0924
d17886de19 [Feature] support ep in mixed mode (#3001)
* [LLM] support ep

* Update worker_process.py

* Update expert_service.py

* Update worker_process.py

* format files
2025-07-30 20:43:39 +08:00
JYChen
bd29b2aaca add stop_seqs doc (#3090) 2025-07-30 20:36:18 +08:00
Jiang-Jia-Jun
6ead7a3a49 Update setup.py 2025-07-30 20:21:41 +08:00
YUNSHEN XIE
e4ba9a0dde debug use (#3095) 2025-07-30 20:18:36 +08:00
Zhida Hu
3f8a41e68c [*] fix the memory leak when modify qp to rts failed (#3051)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-30 19:49:07 +08:00
李泳桦
b242150f94 [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client (#3058)
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client

* [fix] delete ci test case for enable_thinking

* [fix] add reasoning_parser when server starts

* [fix] fix ci consistency test error with reasoning parser

* [doc] update docs related to metadata

* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
bukejiyu
db698bda01 qwen loader (#3057) 2025-07-30 19:09:38 +08:00
AIbin
28fff1b035 Revert "Add uinttest for moe_ffn_wint2. (#3037)" (#3085)
This reverts commit 327e1943fa.
2025-07-30 19:04:07 +08:00
YuanRisheng
acc5c0aa85 add ci for custom op approve (#3079) 2025-07-30 16:50:20 +08:00
zhink
d89b6dd43f adapter qwen3 moe attr for init (#3066)
adapter qwen3 moe attr for init
2025-07-30 16:49:28 +08:00
bukejiyu
8e203666d9 w4a8 offline (#3074)
* w4a8 offline

* update

* update

* update
2025-07-30 16:33:30 +08:00
ming1753
5acde4eb43 [Feature] Multimodal Scheduler V1 (#3019)
* [Feature] Support multimodal scheduler v1

* remove debug log

* fix bug

* fix format

* modify code

* fix bug

* fix bug

* fix bug

* modify code
2025-07-30 16:05:55 +08:00
Jiang-Jia-Jun
ffa0f4d99b [Fix] Fix version function (#3076)
* [Fix] Fix version function

* Fix commit

* Fix commit

* fix code sync

* Update coverage_run.sh

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-30 16:05:24 +08:00
ltd0924
ecf2fd5b9a [BugFix] vl encoder tokens dtype problem (#3069) 2025-07-30 15:20:53 +08:00
YuanRisheng
eeadbf332a delete unused unittest (#3065) 2025-07-30 15:11:58 +08:00
Yiqun Liu
327e1943fa Add uinttest for moe_ffn_wint2. (#3037)
Change-Id: Ifd452527eaf87ea96c3fa4fa9aeb17729b33c2de
2025-07-30 15:03:09 +08:00
Yuan Xiaolan
35935da9e5 support W4A8 EPLB (#3075) 2025-07-30 14:34:12 +08:00
Yzc216
159767717d [Feature] multi source download (#3072)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit

* Change default download

* change requirements.txt

* modify English Documentation

* documentation

* modify model download path
2025-07-30 14:10:13 +08:00
Zero Rains
4dc130c5a9 [Doc] add repetition early stopping doc (#3078)
* add repetition early stop doc

* add the early_stop.md
2025-07-29 22:01:57 -07:00
YuanRisheng
99a70fc722 unify parallel config (#3070) 2025-07-30 11:41:23 +08:00
lddfym
5ca684c762 update doc: load_balance.md (#3008)
* update doc of load_balance

* update doc: load_balance.md
2025-07-30 10:27:56 +08:00
Sunny-bot1
74aa31d15b [Feature] support bad_words (#3055)
* support bad_words

* support online infer bad_words

* update

* add CI test

* update

* update

* update

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-07-30 09:31:29 +08:00
Sunny-bot1
9c962343f2 [Docs] add sampling docs (#2973)
* add sampling docs

* add minp sampling docs

* update sample docs

* update

* update

* add bad words desc

* update
2025-07-30 02:24:16 +08:00
zhuzixuan
ad7bb52a28 修复传入max_tokens=1时的报错 (#3068)
* 修复传入max_tokens=1时的报错

* 修复传入max_tokens=1时的报错

* 修复传入max_tokens=1时的报错

* 修复传入max_tokens=1时的报错

* 修复传入max_tokens=1时的报错

* 修复传入max_tokens=1时的报错
2025-07-29 23:49:28 +08:00
Ryan
73cfe1fd37 [SOT] Extend SOT warmup support to new hardware (#3032)
* add new hardware

* add_sot_warmup4new_hardware

* fix conflict

* rm Optional
2025-07-29 22:45:20 +08:00
Zero Rains
b2f9a42d87 [Feature] Support repetition early stop (#3024)
* support repetition early stop and support user to set the parameter

* remove log

* fix codestyle

* add the early_stop_config to rollout_config

* update config and EarlyStopper class

* fix the bug for triton

* modify the stop method

* update description

* modify the usage for stop_flags

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-07-29 22:42:54 +08:00
Yuan Xiaolan
3214fb5393 support model loading for w4a8 offline quant (#3064)
支持W4A8 EP 对离线量化权重的load
2025-07-29 21:54:37 +08:00
Longzhi Wang
be0a0f2bb2 fix arguement error in ep when pd (#3060) 2025-07-29 17:17:24 +08:00
YuanRisheng
502ee92a0a Unify server-side and model-side Config (Part3) (#3047)
* merge model config

* fix arch

* fix rl
2025-07-29 17:07:44 +08:00
Longzhi Wang
907d561523 fix ep when paddle version mismatch (#3056) 2025-07-29 15:06:49 +08:00
JYChen
dafe02a7b9 [stop sequence] support stop sequence (#3025)
* stop seqs in multi-ends

* unittest for gpu stop op

* kernel tid==0
2025-07-29 14:17:37 +08:00
YuanRisheng
1a815b7a2a Fix Speculative Config bug (#3049)
* fix speculative bug

* fix rl
2025-07-29 10:50:48 +08:00
yinwei
f2a528f9ae [XPU] Support kvblock centralized management (#3017) 2025-07-29 10:40:55 +08:00
Jiang-Jia-Jun
286802a070 Update ernie-4.5.md 2025-07-29 10:10:09 +08:00
Yuan Xiaolan
7d87aaace8 optimize w4a8 decoding (#3050) 2025-07-28 22:20:13 +08:00
lizhenyun01
e80ea8a71b remove Synchronize in hadamard 2025-07-28 19:22:46 +08:00
Yuan Xiaolan
b1d787a272 [fix] w4a8 model loading and hadamard config (#3013) 2025-07-28 18:17:59 +08:00
YUNSHEN XIE
c8bf8b3913 add logprob ci test (#3022)
* add logprob ci test
2025-07-28 17:30:58 +08:00
K11OntheBoat
83048bbe55 [Feature] Deepseekv3 supports cudagraph (#3041)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-28 17:12:54 +08:00
AIbin
ec52d39e68 【Inference Optimize】Update wint2 weight n-dim reorder (#3042) 2025-07-28 16:31:56 +08:00
YuanRisheng
bddf403576 Unify server-side and model-side Config (Part2) (#3035)
* merge speculative and graph opt conifg

* add attr
2025-07-28 15:31:48 +08:00
yinwei
776fb03250 add error info (#3040) 2025-07-28 15:10:28 +08:00
YUNSHEN XIE
60311956e4 fix(ci): correct diff coverage data download URL (#3036) 2025-07-28 14:44:02 +08:00
lizhenyun01
238766e403 fix c4 prompt_cache 2025-07-28 14:31:37 +08:00
chen
01485cd28b MTP rejection_topp add topk input (#3031) 2025-07-28 13:58:45 +08:00
begin2023
dd877f38b1 [Perf] Remove unnecessary operations in non-cuda_graph (#3010)
* [Perf] Remove unnecessary operations in non-cuda_graph

* fix code logic

* use suggestion comment

* reduce function call

* reduce function call

* reduce function call

* reduce function call
2025-07-27 20:38:29 -07:00
Longzhi Wang
247010d298 fix arguement error (#3030) 2025-07-28 11:03:29 +08:00
YuanRisheng
6ccc10ad47 Unify server-side and model-side Config (Part1) (#3018)
* move cache config

* fix mtp
2025-07-28 10:51:52 +08:00
Yiqun Liu
8f426c1690 Optimize the performance of moe_expert_ffn_wint2 (#2990)
* Change wint2 to ColumnMajor.

Change-Id: I6b44d02946a685f8fe24d9f2c7be258b51e16da2

* Unify default_wint2x_mma.

Change-Id: I9e77b0e8e6cecab01fedc0b24b536ee0a1a89ff7

* Change wint2 to ColumnMajorTileInterleave.

Change-Id: I593cbe36f991c0c5044989d65f0014087587c624

* Enable async copy for B.

Change-Id: Ia3ac37ad162a8cf3ccce4f268e81bd06c8ac3c46

* Add wint2x Dequantizer

* Remove TileDequanterB related codes.

Change-Id: Id8e65703b72a8984d367f584ff41b7726017fbb8

* Implement FastInterleavedAndBiasedNumericArrayConverter for wint2.

Change-Id: I438f2b18ab964a04ae1cdb09d9e7d9f7b95eafca

* Implement Wint2ParamsAccessor to load extra quant params from global memory.

Change-Id: Ic3750cd9b767df8893501820880c3342a4b47233

* Implement FastInterleavedAndBiasedNumericArrayConverter for wint2.

Change-Id: I438f2b18ab964a04ae1cdb09d9e7d9f7b95eafca

* Use async copy for local_scale.

Change-Id: Ib882ba41c3d2354bda4d25b40e2408ad3b2f7893

* Check and correct the load and dequantize of weights.

Change-Id: Ie8dca505b39987144964fe6407d465b3b5953790

* Change for performance tuning.

Change-Id: I1da026fb1d1533a9d70350c7ba23c27e896cfc29

* Optimize the global memory access size of local_scale reading.

Change-Id: I4cbe3a2ef5951723d415c2d3252ce912394beaf5

* Specialize mma_tensor_op for wint2 to enable fine-grained pipeline.

Change-Id: Icbb4d48f90a41136f42d6ffff42d68de32f408da

* Minor fix.

Change-Id: I14d4ac9d267ee05442a3b47f00c26bee13d79e6f

* optimizing dequant performance with LOP3

* optimizing dequant performance with LOP3

* Avoid redundant dequantization of local_scale and use bf16 as computing type.

Change-Id: I63239ebc8f8e4a92d6281af59840ba50600b4334

* Add Multiplier and remove some logs.

Change-Id: Ifa199d81e6aeb472d2247c63f85ef30213684bcd

* optimizing dequant performance with LOP3

* Use __byte_perm to implement int8 to float32 conversion for performance improvement.

* Use lop3 to optimize the dequantize of local_scale.

Change-Id: I6189759970cb5b8dcbef769724784b8a7533b63c

* Minor fix and remove some logs.

Change-Id: I6279ba9926d5041093b1c6aea200acf2e4c49d46

* Fix stages for test.

Change-Id: I6f7b7cac612ef2c678e9d49f5ffa60eb53d3ae29

* Fix stages for test and add clock64 to profile.

Change-Id: Iffaf7324beaa910ce9ee56f47ae289de98f1a267

* Use __byte_perm to replace shift-and-or operations for faster integer merging.

* Split the uint2b convert.

Change-Id: I78da672ce8968e21f685285140ba546a161521b4

* Optimize convert of unscale.

Change-Id: I6795da1cdf5e8ab38ddaa9836240921b5312913a

* Minor optimization.

Change-Id: I1800aec34c3f4621abb02658208108f54da44d88

* Optimize mma pipeline and refine codes.

Change-Id: Id3075cf7b88f2813a11ccd1d3b49c62c978f36b8

* Add missing support.

Change-Id: Id65b7bc2c25fbb1a5b232c6bc9fb8c9093f691a8

* Accelerate FP16 dequantization performance

* Support tile shape as Xx64x64.

Change-Id: Ib8fd37e1ba1d06f7d11f2956e7f1367b0a92bcac

* Remove debugging codes and minor optimization.

Change-Id: I6b79bd56a6e8dd823efc169967ecd3cc9a43baf4

* Fix offset bug.

Change-Id: Id7aeb91e99d6f51836f2aff22187b4f79607395e

* Fix typo.

Change-Id: I19dde93fc1c1f7e19605905c90dc46298e203952

* Restore some codes and remove some debugging logs.

Change-Id: I8d44daf82ad1c6f8174134d195e7b3fe9a3afdfb

---------

Co-authored-by: baoqiwen <baoqiwen@baidu.com>
2025-07-28 10:32:43 +08:00
YUNSHEN XIE
fb410b5f4c Add unit test run and coverage report generation (#3011)
* Add unit test run and coverage report generation

* fix

* fix: upload coverage report failure

* fix

* update

* fix

* fix

* update
2025-07-27 22:48:34 +08:00
YUNSHEN XIE
1d29dd80f7 modified dockerfile (#3026)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-25 21:10:23 +08:00
李泳桦
69996a40da [feat] add disable_chat_template in chat api as a substitute for previous raw_request (#3020)
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request

* [fix] pre-commit code check
2025-07-25 20:57:32 +08:00
Longzhi Wang
0700c90caa [Feat] support mixed ep (#2969)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support mixed ep

* fix comment

* fix comment

* update mixep

* fix conflict

* fix typo

* update

* fix typo

* fix code style

* fix conflict
2025-07-25 15:29:30 +08:00
chen
332154f504 [feature] Support FA2 (#3009) 2025-07-25 14:09:00 +08:00
YuBaoku
4b02b96467 [CI] fix codestyle_check (#3015) 2025-07-25 14:02:34 +08:00
EnflameGCU
8c167e130c [GCU] Update post_process (#3012) 2025-07-25 11:03:03 +08:00
EnflameGCU
7634ffb709 [GCU] Add CI (#3006) 2025-07-25 10:59:29 +08:00
Jiang-Jia-Jun
6ce3a8a497 Update index.md 2025-07-25 10:32:47 +08:00
xiaoxiaohehe001
2970b00dfa [Feature] Support_eplb (#2997)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb

* [Feature] support_eplb

* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
littledgg
f37d00e856 [Model] Provide clearer error for missing KV cache quantization scales (#3007) 2025-07-24 20:15:00 +08:00
EnflameGCU
c40df1802e [GCU] Update to develop (#2988) 2025-07-24 19:30:52 +08:00
Yzc216
980126b83a [Feature] multi source download (#3005)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit

* Change default download

* change requirements.txt

* modify English Documentation

* documentation
2025-07-24 17:42:09 +08:00
Zero Rains
0fb37ab7e4 update flake8 version to support pre-commit in python3.12 (#3000)
* update flake8 version to support pre-commit in python3.12

* polish code
2025-07-24 01:43:31 -07:00
Zhang Yulong
5151bc92c8 Update benchmark tools (#3004)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* update benchmark tools

* update benchmark tools
2025-07-24 15:19:23 +08:00
ltd0924
f935d6f862 [BugFix] fix multinode deployment (#2977) 2025-07-24 15:04:04 +08:00
ltd0924
3792345c3a [LLM] update function name (#2985)
* [LLM] update function name
2025-07-24 15:03:40 +08:00
Yzc216
e14587a954 [Feature] multi-source download (#2986)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit
2025-07-24 14:26:37 +08:00
YUNSHEN XIE
87a2f4191d add ci reuse action (#2968)
* add ci reuse action

* fix code formatting

* update
2025-07-24 14:24:10 +08:00
xiaoxiaohehe001
2c0ff068e2 [Fix] fix mm ep empty run (#2999) 2025-07-24 14:15:55 +08:00
xiegegege
e3a843f2c5 [benchmark] add quantization for benchmark yaml (#2995) 2025-07-24 13:26:34 +08:00
lizhenyun01
6235ef3881 fix chunk_prefill 2025-07-24 12:00:52 +08:00
lizhenyun01
29c3292f02 support c4 attn && fix cache 2025-07-24 12:00:52 +08:00
lizexu123
832d25334a [Code Simplification] fix init_distributed_environment() (#2982) 2025-07-24 11:43:28 +08:00
bukejiyu
bfeb664ab8 update (#2978)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
chenjian
85a78d695d [Feature] Support block scheduler v1 for FD (#2928)
* Support FD block scheduler v1

* Support FD block scheduler v1

* Support FD block scheduler v1

* Fix according to copilot review

* Fix according to review

* Remove is_dummy

* Fix bug when real_bsz=1

* Fix infer first token cost time

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-23 20:31:31 +08:00
Zero Rains
ca0f71bd39 polish code for prefill restrictions (#2991) 2025-07-23 05:10:14 -07:00
chen
172e69fe17 FA3 fix bug (#2987) 2025-07-23 19:07:43 +08:00
zhink
1272c7ce98 Fix performance degradation bug of custom_all_reduce (#2981) 2025-07-23 17:45:44 +08:00
Zero Rains
850c9d98d4 [BugFix] Add prefill restrictions for chunked_prefill+VL (#2983) 2025-07-23 01:45:57 -07:00
freeliuzc
a39a67334c fix mtp bug in pd-split mode (#2970)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-23 15:31:16 +08:00
YuBaoku
6c4cfd9359 [CI] add codestyle_check action (#2972)
* [CI] add codestyle_check action

* [CI] Integrate codestyle check via pre-commit in GitHub Actions
2025-07-23 15:21:56 +08:00
lizexu123
9b22b8d2c3 delete max-len (#2959) 2025-07-23 15:11:39 +08:00
Jiang-Jia-Jun
5b59a97030 Update README.md 2025-07-23 13:52:14 +08:00
Jiang-Jia-Jun
475dc6d84e Update README.md 2025-07-23 13:47:31 +08:00
chen
ad202272ed 【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (#2942) 2025-07-23 13:02:50 +08:00
lizhenyun01
e51f018577 support chunk_prefill in fa3 2025-07-23 12:19:20 +08:00
Ryan
95b5af24db [SOT] Add sot warmup (NVIDIA GPU Only) (#2929)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add sot warmup

* fix code style

* change batch_size list

* add param to config

* rm free_list settings && set sot_warmup_sizes

* finish debug with dynamic dims by type annotations

* add profile_run guard

* rm sth useless
2025-07-22 21:36:14 +08:00
Sunny-bot1
7c5e34e72d [FIX]fix rejection sampling when topp=0 using _SAMPLING_EPS (#2967)
* fix rejection sampling when topp=0

* fix
2025-07-22 05:53:37 -07:00
gaoziyuan
dbe6225b33 fix rl config local rank (#2957) 2025-07-22 04:39:54 -07:00
GoldPancake
9b84d51e25 [MTP Fix] Fix code and register cpp operators (#2965) 2025-07-22 19:36:24 +08:00
K11OntheBoat
93bb68aa71 [Feature] Marlin MoE backend supports DeepseekV3 (#2962)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 18:11:15 +08:00
GoldPancake
dc67c10a7e [Feature][MTP]Support multi-step MTP (#2952) 2025-07-22 16:26:29 +08:00
luukunn
920e6b3f60 [Fix]fix empty prompt_token_ids,update the parser's triggering condit… (#2891) 2025-07-22 16:13:05 +08:00
Zero Rains
89a485b69f [Feature] Support using prefix-caching + cudagraph for inference (#2924)
* fix the bug in cudagraph+prefix-caching but still have some bug with profile

Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397

* add the signal to make sure cache manager launched

* fix judge condition

* reomove useless control

* update control stream

* update

* fix xpu

* change the do_profile flag

* update

* add new threads to init cache_manager

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-07-22 00:59:45 -07:00
Nyakku Shigure
48e6a0ca26 [SOT] Mark dynamic dims by type annotations (#2771)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [SOT] Mark dynamic dims by type annotations

* fix conflict of forward_meta

* mark more attn backend

* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS

* auto infer implicit 0 dim dynamic dim

* revert manual marked dims

* revert missing update

* auto infer can use unsafe code in warmup stage

* check -> type_match

* fix codestyle

* restore blank line

* empty commit

* add need_warmup nonlocal;

* add doc for resolver

* add missing type hints

* unquote "ForwardMeta"
2025-07-22 00:23:52 -07:00
K11OntheBoat
e991777757 [Feature] DeepseekV3 use pd_build_static_op (#2948)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 15:03:41 +08:00
李泳桦
2a8a2c06de [fix] non-streaming api now returns full output ids if return_token_ids is enabled (#2951) 2025-07-22 14:35:56 +08:00
lifulll
2c6a9e887e native top_p_sampling (#2901) 2025-07-22 14:09:59 +08:00
gaoziyuan
0eedbdaee0 fix import error (#2944) 2025-07-22 14:06:01 +08:00
K11OntheBoat
8020927f50 [BugFix] Rename attention params of deepseekv3 (#2939)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 14:01:30 +08:00
Jiang-Jia-Jun
56102e91e1 [Polish] Return error message of raw_request (#2946)
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-22 10:21:32 +08:00
zhink
0262ef7eb3 custom all reduce support cuda graph (#2938)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag

* rename communication_op to communication
2025-07-21 22:52:03 +08:00
周周周
ff4569f135 remove some code in ep.py (#2947) 2025-07-21 22:44:57 +08:00
李泳桦
8a619e9db5 [Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body (#2940)
* [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body

* [fix] return_token_ids not working in curl request

* [test] improve some test cases of return_token_ids and prompt_token_ids

* [fix] the server responds ok even if request.messages is an empty list
2025-07-21 19:31:14 +08:00
littledgg
2845bde964 [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph (#2936)
* [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph

* Fix: Apply black formatting
2025-07-21 16:25:51 +08:00
Yuanle Liu
2f74e93d7e use dist.all_reduce(min) to sync num_blocks_local (#2933)
* pre-commit all files check

* reduce min num_blocks_local

* fix nranks=1

* pre-commit when commit-msg
2025-07-21 01:23:36 -07:00
lizexu123
67990e0572 [Feature] support min_p_sampling (#2872)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fastdeploy support min_p

* add test_min_p

* fix

* min_p_sampling

* update

* delete vl_gpu_model_runner.py

* fix

* Align usage of min_p with vLLM

* fix

* modified unit test

* fix test_min_sampling

* pre-commit all files

* fix

* fix

* fix

* fix xpu_model_runner.py
2025-07-20 23:17:59 -07:00
gaoziyuan
95a214ae43 support trainer_degree in name_mapping (#2935) 2025-07-20 23:12:55 -07:00
YuanRisheng
bce2c6cd7c rename test dir (#2934) 2025-07-21 14:05:45 +08:00
ltd0924
cc4cec0a74 Update engine_client.py (#2931) 2025-07-21 11:42:16 +08:00
liddk1121
17c5d3a241 [Iluvatar GPU] Add CI scripts (#2876) 2025-07-21 09:44:42 +08:00
周周周
8c5407d9e4 remove cum_offsets from ForwardMeta (#2925)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-19 23:57:27 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
ZhangYulongg
b8676d71a8 update ci cases
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-18 21:44:07 +08:00
ZhangYulongg
43976138de update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg
e546e6b1b0 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg
9c8292fb19 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg
a5e95013b5 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg
93481a5478 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg
eb77b1be6d update ci cases 2025-07-18 21:44:07 +08:00
ming1753
5328daa333 [Bug Fix] fix ep config bug (#2920) 2025-07-18 19:12:56 +08:00
xiaoxiaohehe001
a42fc3f40b [Feature] Support 45tVL EP FP8 Infer. (#2909)
* support_mm_ep_fp8

* support_mm_ep
2025-07-18 17:57:15 +08:00
Jiang-Jia-Jun
fbe3547c95 [Feature] Support include_stop_str_in_output in chat/completion (#2910)
* [Feature] Support include_stop_str_in_output in chat/completion

* Add ci test for include_stop_str_in_output

* Update version of openai

* Fix ci test

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-18 16:59:18 +08:00
gaoziyuan
6efad14b95 support vl ori_vacab_size (#2900) 2025-07-18 16:26:14 +08:00
周周周
d306944f4f remove cum_offsets from get_block_shape_and_split_kv_block (#2913)
* remove padding_offsets from get_padding_offset.cu

* remove padding_offsets from get_padding_offset.cu

* remove padding_offsets from get_padding_offset.cu

* remove cum_offsets from get_block_shape_and_split_kv_block

* remove cum_offsets from get_block_shape_and_split_kv_block
2025-07-18 16:13:32 +08:00
YUNSHEN XIE
e81137e581 fix ci workflow (#2896) 2025-07-18 16:01:00 +08:00
RAM
cd52dc0f65 [Executor] Fix set capture sizes bug (#2902) 2025-07-18 15:12:19 +08:00
周周周
1339e56282 [XPU] Remove padding_offsets from get_padding_offset.cu (#2911) 2025-07-18 14:16:44 +08:00
YuanRisheng
0eb5dc18d3 [BugFix]Fix sample rejection (#2908)
* fix config

* fix rejection
2025-07-18 13:44:30 +08:00
sg263
e679567d59 [Trace]fix opentelemetry can not work in uvicorn (#2906)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* fix annotation

* fix annotation when add opentelemetry

* fix opentelemetry-instrumentation-fastapi

* fix pentelemetry-bootstrap

* fix opentelemetry can not work in uvicorn

* move conf to env

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-17 23:16:45 +08:00
RAM
bbe2c5c968 Update GraphOptimizationBackend docs (#2898) 2025-07-17 21:38:18 +08:00
ltd0924
4b14dca1d6 [LLM] delete fixed slots (#2893)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 19:19:54 +08:00
yulangz
c8c280c4d3 [XPU][Doc] fix typo (#2892) 2025-07-17 19:13:54 +08:00
周周周
ddb10ac509 [Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880)
* remove padding_offsets from atten
2025-07-17 18:41:31 +08:00
freeliuzc
d49f8fb30a [Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890)
* support chunk_prefill both normal and speculative_decoding(mtp)

* optimize pd-disaggregation config

* fix bug
2025-07-17 17:58:08 +08:00
ming1753
67180c1ff9 [Bug Fix] fix bug of prompt penalty (#2888) 2025-07-17 17:21:37 +08:00
Xintong Yu
273efba76f [Fix] remove misleading variables (#2841)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-17 16:49:14 +08:00
YUNSHEN XIE
1cfba5ba3e enable CI workflow for pull requests targeting release/* branches (#2887) 2025-07-17 16:48:03 +08:00
Jiang-Jia-Jun
31cab9f87b Update test_openai.py 2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun
d3dfa1446c Update test_openai.py 2025-07-17 16:07:07 +08:00
ltd0924
b630031414 [LLM] fix serval bugs (#2878) 2025-07-17 14:21:05 +08:00
LokeZhou
f50c25178b [MM_PROCESS] add _extract_labels (#2879) 2025-07-17 14:20:01 +08:00
Yuanle Liu
dbb9e2506b Fix rollout_model init (#2881) 2025-07-16 22:36:21 -07:00
ming1753
1f15ca21e4 [Feature] support prompt repetition_penalty (#2806)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 12:05:52 +08:00
yulangz
7dfd2ea052 [XPU][doc] Update minimal fastdeploy required (#2863)
* [XPU][doc] update minimal fastdeploy required
2025-07-17 11:33:22 +08:00
GoldPancake
42d4001400 [Features] Add speculative metrics (#2857) 2025-07-17 11:08:55 +08:00
sg263
52aca233e8 [Trace] fix annotation when add opentelemetry (#2869)
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* fix annotation

* fix annotation when add opentelemetry

* fix opentelemetry-instrumentation-fastapi

* fix pentelemetry-bootstrap

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-17 10:29:16 +08:00
ltd0924
9c25dcca0b [LLM] Update Multinode Deployment (#2830)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] fix ci bugs

* Update fastdeploy/engine/args_utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [LLM] update random port

* [LLM] update random port

* [LLM] fix ci bugs

* fix ci bugs

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-16 23:42:54 +08:00
ltd0924
d245d1ca6c [LLM] support send batch data and aggregate data (#2860)
* [LLM] support send batch data and aggregate data

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] update
2025-07-16 23:42:20 +08:00
Yuanle Liu
63d6e7ce06 fix and refine vl (#2866)
* refine vl config

* delete attn_sep

* fix vl accuracy
2025-07-16 05:59:28 -07:00
周周周
aa76085d1f [Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870)
2025-07-16 20:10:57 +08:00
sg263
42b80182e0 [Trace] add opentelemetry (#2852)
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-16 15:33:25 +08:00
Yuanle Liu
dda4a9f848 rl update (#2861) 2025-07-16 00:33:10 -07:00
yangjianfengo1
a83a3eea5f 将FLAGS_max_partition_size修改为环境变量获取 (#2854) 2025-07-16 14:14:21 +08:00
xiaoxiaohehe001
0d0340392f [Fix] Fix mm ep weight init. (#2855)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix_45t_mm

* Update load_weight_utils.py

* Update load_weight_utils.py
2025-07-16 12:02:39 +08:00
YuanRisheng
0253381fb9 fix config (#2858) 2025-07-16 11:40:10 +08:00
freeliuzc
2d1184aefe [Fix] fix expert_parallel bug in decoder stage (#2848) 2025-07-16 11:08:18 +08:00
yulangz
17314ee126 [XPU] Update doc and add scripts for downloading dependencies (#2845)
* [XPU] update xvllm download

* update supported models

* fix xpu model runner in huge memory with small model

* update doc
2025-07-16 11:05:56 +08:00
YuanRisheng
101ad33332 [BugFix] Fix Configs (#2849)
* fix config

* fix config
2025-07-15 19:50:36 -07:00
RAM
0fad10b35a [Executor] CUDA Graph support padding batch (#2844)
* cuda graph support padding batch

* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.

* Do not insert max_num_seqs when the user specifies a capture list

* Support set graph optimization config from YAML file

* update cuda graph ci

* fix ci bug

* fix ci bug
2025-07-15 19:49:01 -07:00
Yuanle Liu
61b3997b85 refactor rl get_name_mappings_to_training (#2847)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* refactor rl get_name_mappings_to_training

* fix tp>1

* change variable name(ffn1->up_gate_proj/ffn2->down_proj)

* change variable name(linear_weight->weight/linear_bias->bias)

* add rl names mapping for vl

* fix ernie 0.3B error

* fix develop code

* fix
2025-07-15 07:31:42 -07:00
Zero Rains
e7bcbbab52 Merge vl execution path into normal execution path (#2829)
* merge vl model into gpu_model runner

Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6

* fix chinese

Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a

* fix the parse parameter

Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe

* fix the bug in online_inference

Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559

* polish code

Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c
2025-07-15 22:20:03 +08:00
zhenwenDang
5fc659b900 [Docs] add enable_logprob parameter description (#2850)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add enable_logprob parameter description

* add enable_logprob parameter description

* add enable_logprob parameter description

* add enable_logprob parameter description

* add enable_logprob parameter description

* add enable_logprob parameter description

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-15 19:47:45 +08:00
ophilia-lee
33db137d0b 新增vLLM默认请求参数yaml 2025-07-15 19:31:27 +08:00
lijingning
9d6a42b334 适配vLLM无arrival_time;适配vLLM model必传;RequestFuncInput/RequestFuncOutput/SampleRequest新增用例编号no 2025-07-15 19:31:27 +08:00
Jiang-Jia-Jun
1b712bba82 Update setup.py 2025-07-15 14:57:23 +08:00
AIbin
fd91da7b41 【Inference Optimize】Support wint2 triton kernel about triton_utils_v2 (#2842)
* update supported_models doc
2025-07-15 14:35:40 +08:00
bukejiyu
15c8c240b5 [vl] Use top_k from config.json (#2831)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-15 00:39:12 +08:00
freeliuzc
7cdd8d290d [MTP] optimize mtp infer speed (#2840)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-14 19:50:22 +08:00
YuanRisheng
4c7b8bc458 Simplify the Config code (#2770)
* simplify the code

* fix vl

* delete config

* fix

* perfect code

* fix ci

* fix xpu

* fix xpu

* fix server

* resolve conflict

* fix mtp

* resolve conflict

* fix xpu

* fix xpu

* fix vl

* fix log

* fix qwen moe

* fix qwen moe

* fix qwen moe
2025-07-14 19:50:05 +08:00
freeliuzc
2e81792d64 [fix] fix 'force-reinstall all-depe-packages in build' (#2837) 2025-07-14 16:50:54 +08:00
AIbin
b7858c22d9 【Update Docs】update supported_models doc (#2836)
* update supported_models doc
2025-07-14 16:01:34 +08:00
GoldPancake
09bbac6de0 Add DeepGEMM pre-compile tools (#2819)
This tool allows you to compile all possible kernels in advance through the model's config.json, and avoids the situation where uncompiled kernel is encountered and JIT is executed when certain requests arrive.
2025-07-14 14:56:41 +08:00
freeliuzc
7f64d408a9 [MTP] support expert-parellel in mtp (#2835) 2025-07-14 14:28:50 +08:00
lddfym
ece88596ed fix spelling error (#2827) 2025-07-14 13:12:57 +08:00
bukejiyu
bad53c6b6e [vl]remove duplicated load logic (#2744)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-13 07:36:26 +08:00
xiegegege
16940822a7 add result save for ci (#2824)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
LGTM
2025-07-12 23:34:46 +08:00
zhenwenDang
d48c03413f Feature/logprob bug fix (#2817)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix: handle missing logprobs at step 0 and incorrect finish reason with max_completion_tokens

* Prevent response_logprobs.logprob_token_ids[0] from going out of bounds
2025-07-12 16:48:51 +08:00
gaoziyuan
e9e8443ea8 fix num_blocks_local when small size model in TP2 running mode (#2792) 2025-07-12 12:50:48 +08:00
gaoziyuan
749b2e9c89 support qwen3moe name_mapping (#2820) 2025-07-12 12:05:54 +08:00
Sunny-bot1
f6ad26fc08 fix topp default value (#2814)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-11 17:10:21 +08:00
zhink
c08561c13a [Feature] support tensor-parallel-size>num_key_value_heads for qwen3 (#2799) 2025-07-11 15:09:43 +08:00
chen
2c3607407f check (#2811) 2025-07-11 13:54:52 +08:00
lddfym
b5e4288704 Global scheduler supports configuring hot updates (#2807)
* Check if the controller port is available

* Global scheduler supports configuring hot updates

* add interface: /controller/scheduler

* add interface: /controller/scheduler
2025-07-11 13:38:07 +08:00
yulangz
abbbd0cddc [XPU] Update docker file (#2809) 2025-07-11 13:26:38 +08:00
yinwei
e98937cbba delete useless file (#2772)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-11 11:46:04 +08:00
Sunny-bot1
240d6236bc [Fix]fix top_k_top_p sampling (#2801)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix topk-topp

* update

* add base_non_truncated
2025-07-10 22:35:10 +08:00
littledgg
59071268b6 [Executor] Move forward_meta.py to fastdeploy/model_executor (#2774)
* Use PEP 563 in attention.py and fix conflict

* merge commit

* Change what was left out last time
2025-07-10 20:36:51 +08:00
lizexu123
8c660a0dfb [BugFix] fix RMSNorm rms_norm_esp (#2797)
* fix rms

* add vl

* fix

* add vl

* fix

* fix
2025-07-10 20:02:24 +08:00
LiqinruiG
ce5adec877 [Doc] modify offline-inerence docs (#2800)
* modify offline-inerence docs

* [bug] remove tool_call_content
2025-07-10 19:41:12 +08:00
Zeyu Chen
36571fd2d9 Update README.md
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-10 17:01:08 +08:00
yulangz
830de5a925 [XPU] Supports TP4 deployment on 4,5,6,7 (#2794)
* 支持通过 XPU_VISIBLE_DEVICES 指定 4,5,6,7 卡运行
* 修改 XPU 文档中多卡说明
2025-07-10 16:48:08 +08:00
chen
d33105baeb [Feature] Online Chat API Support Return logprobs (#2777)
* online chat support logprobs

* check xpu

* check vl_gpu_model_runner and xpu_model_runner

* get_worker() check platform
2025-07-10 16:33:40 +08:00
K11OntheBoat
24f934f1f9 [BugFix] Fix low prediction accuracy of deepseekv3 (#2798) 2025-07-10 16:16:44 +08:00
Sunny-bot1
1e2319cbef Rename top_p_sampling to top_k_top_p_sampling (#2791) 2025-07-10 00:09:25 -07:00
Sunny-bot1
e45050cae3 [Feature] support top_k_top_p sampling (#2753)
* support top_k_top_p sampling

* fix

* add api param

* add api para

* fix

* fix

* fix

* fix

* fix

* fix

* fix
2025-07-09 20:58:58 -07:00
Ryan
b0f525955c [SOT] Remove breakgraph in post processing && fix datatype (#2780) 2025-07-10 11:26:00 +08:00
Yuanle Liu
2ea267f624 assert prompt len > 0 (#2773) 2025-07-10 11:14:52 +08:00
0x3878f
1d8af7ab73 Add env variable for dy2st (#2779) 2025-07-10 11:06:06 +08:00
LiqinruiG
54affdc44b [Doc] modify offline_inference docs (#2787)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* modify reasoning_output docs

* modify offline inference docs

* modify offline inference docs

* modify offline_inference docs

* modify offline_inference docs
2025-07-10 01:06:14 +08:00
Jiang-Jia-Jun
a4fdb3970b [BugFix] Fix vocab size error for ernie model (#2785)
* [BugFix] Fix vocab size error for ernie model

* [BugFix] Fix vocab size error for ernie model

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-10 01:05:51 +08:00
Jiang-Jia-Jun
2a86928657 [BugFix Revert] Fix vocab size error for ernie model 2025-07-09 22:14:54 +08:00
Jiang-Jia-Jun
b1c53fa779 [BugFix] Fix vocab size error for ernie model 2025-07-09 22:13:41 +08:00
lizexu123
da20cf681e [Bug fix] Fixed the garbled text issues in Qwen3-8B (#2783) 2025-07-09 22:03:57 +08:00
LiqinruiG
4ccd1696ab [Doc] modify offline inference docs (#2747)
* modify reasoning_output docs

* modify offline inference docs

* modify offline inference docs
2025-07-09 20:53:26 +08:00
chen
888780ffde [Feature] block_wise_fp8 support triton_moe_backend (#2767) 2025-07-09 19:22:47 +08:00
RAM
e3768c5a83 [Executor] Fix bug of logger.debug (#2778) 2025-07-09 04:13:43 -07:00
lifulll
1f28bdf994 dcu adapter ernie45t (#2756)
Co-authored-by: lifu <lifu@sugon.com>
Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-09 18:56:27 +08:00
RAM
03a74995b8 Clear dead code And supplementary notes (#2757)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* 1.supplementary notes 2.delete dead code

* fix bug of forward meta

* Global modification of forward meta

* fix vl model_runner bug
2025-07-09 16:17:34 +08:00
zhink
b89180f1cd [Feature] support custom all-reduce (#2758)
* [Feature] support custom all-reduce

* add vllm adapted
2025-07-09 16:00:27 +08:00
yulangz
be21ef5047 [XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B (#2765)
* fix no quant xpu moe

* change dir of xpu moe weight only
2025-07-09 15:57:51 +08:00
celsowm
771e71a24d Feat/blackwell sm100 support (#2670)
* Add initial support for NVIDIA Blackwell (SM100) architecture

This change introduces initial support for the NVIDIA Blackwell GPU
architecture, specifically targeting SM100 (Compute Capability 10.x)
with '100a' architecture-specific features (e.g., for CUTLASS).

Key changes:
- Updated custom_ops/setup_ops.py to generate appropriate gencode
  flags (arch=compute_100a,code=sm_100a) when '100' is specified
  in FD_BUILDING_ARCS. Requires CUDA 12.9+.
- Updated custom_ops/gpu_ops/cutlass_extensions/gemm_configs.h:
    - Added CutlassTileConfigSM100 enum (with placeholder tile shapes).
    - Added BLACKWELL to CandidateConfigTypeParam.
    - Updated CutlassGemmConfig struct with is_sm100 flag,
      tile_config_sm100, and new constructor for SM100.
    - Modified toString() and fromString() for SM100 support.
- Updated custom_ops/gpu_ops/cutlass_kernels/cutlass_heuristic.cu:
    - Added get_candidate_tiles_sm100() (with placeholder tiles).
    - Added placeholder mcast support functions for SM100.
    - Updated get_candidate_configs() to include SM100 paths using
      the BLACKWELL flag and new SM100 config types.
- Updated build.sh with comments to guide users on specifying '100'
  for Blackwell in FD_BUILDING_ARCS.

Further work:
- Optimal CUTLASS tile configurations for SM100 need to be researched
  and updated in cutlass_heuristic.cu.
- Kernel auto-generation scripts in custom_ops/utils/ may need
  SM100-specific versions if Blackwell's hardware features for FP8/TMA
  differ significantly from SM90.
- Compatibility of third-party libraries (CUTLASS v3.8.0, DeepGEMM)
  with Blackwell should be fully verified.

* Feat: Implement detailed Blackwell (SM100) CUTLASS heuristics

This change integrates specific, expert-provided CUTLASS heuristic
configurations for the NVIDIA Blackwell (SM100) GPU architecture,
replacing previous placeholders. This includes:

- Updated `custom_ops/gpu_ops/cutlass_extensions/gemm_configs.h`:
    - Populated `CutlassTileConfigSM100` enum with specific tile shapes
      (e.g., CtaShape64x64x128B, CtaShape128x128x128B) suitable for SM100.
    - Added `FP4_ONLY` to `CandidateConfigTypeParam` for new FP4 paths.

- Updated `custom_ops/gpu_ops/cutlass_kernels/cutlass_heuristic.cu`:
    - Implemented `get_candidate_tiles_sm100` with detailed logic for
      selecting tile configurations based on GROUPED_GEMM and FP4_ONLY flags,
      using the new SM100 tile enums.
    - Implemented `supports_mcast_along_m_sm100` and
      `supports_mcast_along_n_sm100` with specific tile checks for Blackwell.
    - Updated the `sm == 100` (Blackwell) block in `get_candidate_configs`
      to use these new helper functions and accurately populate candidate
      kernel configurations for various cluster shapes.

- `custom_ops/setup_ops.py` remains configured to compile for
  `arch=compute_100a,code=sm_100a` with CUDA 12.9+ for these features.

This aligns the codebase with heuristic configurations similar to those
in upstream TensorRT-LLM / CUTLASS for Blackwell, enabling more
performant kernel selection on this new architecture.

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-09 15:29:42 +08:00
yulangz
0350831c2b fix xpu offline demo garbled output (#2763) 2025-07-09 14:51:20 +08:00
RichardWooSJTU
fee544e808 fix ep prefill (#2762) 2025-07-09 14:03:05 +08:00
Ryan
c4718fd693 Enable SOT D2St in Multimodal Model (#2735) 2025-07-09 12:26:18 +08:00
GoldPancake
f7cad30a38 [Feature] Add speculative decoding simulation benchmark. (#2751)
* Add speculative decoding simulation benchmark

* Fix the name of the parameter
2025-07-09 12:08:43 +08:00
gaoziyuan
6b10c19482 【Feature】add fd commit/branch info when start server (#2752)
* add_commit_config

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-09 11:52:22 +08:00
EnflameGCU
f4f1d8de44 Support for non-CUDA builds (#2750)
Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-09 11:48:40 +08:00
RichardWooSJTU
6610aa29d0 Revert "[Bug fix] fix attention rank init (#2743)" (#2761)
This reverts commit e8bbe7244b.
2025-07-09 10:38:12 +08:00
Ryan
f72c4de539 [SOT] Make custom_op dy&st unified (#2733)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* make_custom_op dy&st unified

* add instance judgement
2025-07-08 19:21:44 +08:00
xiegetest
f6ffbc3cbd add precision check for ci (#2732)
* add precision check for ci

* add precision check for ci

* add precision check for ci

* add precision check for ci

---------

Co-authored-by: xiegegege <xiege01@baidu.com>
2025-07-08 18:43:53 +08:00
RichardWooSJTU
e8bbe7244b [Bug fix] fix attention rank init (#2743)
* fix attention rank init

* fix attention rank init
2025-07-08 17:19:49 +08:00
Longzhi Wang
57b086dc6b [Bug fix] Add the missing pod_ip param to the launch_cache_manager function. (#2742)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug fix] fix the missing position args in expert_service.py

* update
2025-07-08 14:52:13 +08:00
lizexu123
525be243e7 [Bug fix] Fixed the garbled text issues in Qwen3-8B (#2737)
* fix qwen3.py

* update

* update lm_head tie_word_embeddings

* update tie_word_embeddings

* fix

* fix tie_word_embedding not in config.json

---------

Co-authored-by: lizexu <lizexu@baidu.com>
2025-07-07 23:15:27 -07:00
EnflameGCU
d0f4d6ba3a [GCU] Support gcu platform (#2702)
baseline: e7fa57ebae

Co-authored-by: yongqiangma <xing.wo@163.com>
2025-07-08 13:00:52 +08:00
gaoziyuan
26d5d737dd 【Fearture】support qwen2 some func (#2740)
* add rl qwen model support

* fix

* fix
2025-07-08 12:03:04 +08:00
Ryan
fefbd65cf8 [SOT] Remove BreakGraph with paddle.maximum (#2731)
* rm if with clip

* clip -> maximum

* int64 -> int32
2025-07-08 11:44:25 +08:00
ming1753
1eb8ea7328 [Bug fix] fix complie bug when sm < 89 (#2738) 2025-07-08 11:24:52 +08:00
ming1753
ef6649a577 [Optimize] Optimize tensorwise fp8 performance (#2729)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Optimize] Optimize tensorwise fp8 performance
2025-07-07 20:06:28 +08:00
liddk1121
1b54a2831e Adapt for iluvatar gpu (#2684) 2025-07-07 16:53:14 +08:00
YUNSHEN XIE
2579e8fea8 support FastDeploy version setting (#2725)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-07 14:50:11 +08:00
Yuanle Liu
91528f1af9 remove redundant install whl of fastdeploy (#2726)
* remove redundant install

* remove redundant install
2025-07-06 23:49:37 -07:00
lddfym
4e293e50fa Check if the controller port is available (#2724) 2025-07-07 13:24:55 +08:00
chen
66b321d9ec Update eb45-0.3B cuda memory (#2686) 2025-07-07 11:31:15 +08:00
ltd0924
68b4755587 [LLM] support multi node deploy (#2708)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] support multi node deploy

* Update engine.py

* fix bugs

* fix

* [LLM] support multi node deploy

* [LLM] support multi node deploy

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-06 10:33:51 +08:00
LQX
04a8e1ef2b 修改XPU CI, test=model (#2721) 2025-07-06 10:19:04 +08:00
Ting
a6e9161045 fix bug. (#2718)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-05 08:19:19 +08:00
Ting
90ef28d982 spec token map lazy. (#2715)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-05 00:14:54 +08:00
YuBaoku
b37585e693 [BugFix] fix paddle_git_commit_id error (#2714)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* set git identity to avoid merge failure in CI

* add ci cases

* [CI] Add validation for MTP and CUDAGraph

* [BugFix] fix paddle_git_commit_id error
2025-07-04 22:16:37 +08:00
lizexu123
9cb08e71e8 add support QWQ enable_thinking (#2706)
* add support QWQ enable_thinking

* add stream=True

* fix stream=true

* fix qwen

---------

Co-authored-by: lizexu <lizexu@baidu.com>
2025-07-04 20:55:23 +08:00
YuBaoku
dacc46f04c [CI] Add validation for MTP and CUDAGraph (#2710)
* set git identity to avoid merge failure in CI

* add ci cases

* [CI] Add validation for MTP and CUDAGraph
2025-07-04 18:13:54 +08:00
Jiang-Jia-Jun
09ded7715f Update mkdocs.yml 2025-07-04 17:55:52 +08:00
LQX
11cfdf5d89 添加XPU CI, test=model (#2701)
* 添加XPU CI,  test=model

* 添加XPU CI,  test=model

* 添加XPU CI,  test=model

* 添加XPU CI,  test=model

* 添加XPU CI,  test=model

* 添加XPU CI,  test=model

* 添加XPU CI,  test=model

* 添加XPU CI,  test=model

* 添加XPU CI,  test=model
2025-07-04 16:16:06 +08:00
GoldPancake
e7fa57ebae Extract eh_proj Layer from ParallelLMHead for MTP to Avoid Weight Transposition Issue (#2707)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mtp eh_proj layer

* fix mtp update_cfg function

* fix stringdoc

* simplify class name
2025-07-04 14:15:04 +08:00
gaoziyuan
a5ae88ded9 [feature]add fd whl version info (#2698) 2025-07-04 14:12:42 +08:00
ltd0924
87e638498c [RL] update reschedule finish reason (#2709) 2025-07-04 13:47:36 +08:00
freeliuzc
667547be59 support chunk_prefill in MTP (#2705) 2025-07-04 11:55:48 +08:00
LiqinruiG
b38823bc66 modify reasoning_output docs (#2696) 2025-07-04 11:30:02 +08:00
Divano
050d9658a5 Update requirements.txt 2025-07-04 09:53:03 +08:00
Divano
be5cabaf80 add quick benchmark (#2703)
测试脚本不需要过CI
2025-07-04 09:32:36 +08:00
Yuanle Liu
240bdac2a4 [feat] support fa3 backend for pd disaggregated (#2695)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* support fa3 backend run in pd disaggregated

* delete use_fast_ffn
2025-07-03 22:33:27 +08:00
ltd0924
00863c43fd [Bug] fix logger format (#2689)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-03 19:58:03 +08:00
kevin
3d3bccdf79 [doc] update docs (#2690) 2025-07-03 19:33:19 +08:00
Jiang-Jia-Jun
9fd74f75bd Update dynamic_weight_manager.py 2025-07-03 15:55:22 +08:00
Jiang-Jia-Jun
05c670e593 [Sync] Update to latest code (#2679)
* [Sync] Update to latest code

* Add new code files

* Add new code files

* update code

* Try to fix build.sh

* Try to fix build.sh

* Update code

* Update requirements.txt

* Update code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun
d222248d00 Update README.md 2025-07-03 15:28:28 +08:00
Jiang-Jia-Jun
e5b94d4117 Update README.md 2025-07-03 15:28:05 +08:00
Jiang-Jia-Jun
87e2e58a22 Update gh-pages.yml 2025-07-03 15:26:21 +08:00
Jiang-Jia-Jun
de20e5a992 Update Dockerfile.xpu
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-03 10:14:50 +08:00
Jiang-Jia-Jun
2f9c0618f0 Update Dockerfile.gpu 2025-07-03 10:14:39 +08:00
Yuanle Liu
9a14ab6572 add --force-reinstall --no-cache-dir when pip install fastdeploy*.whl (#2682)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-02 05:32:20 -07:00
Divano
d1cb3ed571 Update gh-pages.yml (#2680) 2025-07-02 17:36:18 +08:00
handiz
b8a8a19689 add wint2 performance (#2673) 2025-07-02 17:10:01 +08:00
Jiang-Jia-Jun
97ac82834f Update nvidia_gpu.md 2025-07-02 16:54:14 +08:00
Jiang-Jia-Jun
685265a97d Update nvidia_gpu.md 2025-07-02 15:43:35 +08:00
Jiang-Jia-Jun
fc4d643634 Update nvidia_gpu.md 2025-07-02 15:39:48 +08:00
YuBaoku
bb880c8d7c Update CI test cases (#2671)
* set git identity to avoid merge failure in CI

* add ci cases
2025-07-02 15:08:39 +08:00
liddk1121
865e856a94 update iluvatar gpu fastdeploy whl (#2675) 2025-07-02 14:47:21 +08:00
Jiang-Jia-Jun
9f4a65d817 Update README.md
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-02 10:04:58 +08:00
YuBaoku
e3aac0c5b8 set git identity to avoid merge failure in CI (#2665)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-01 19:06:46 +08:00
AIbin
a197dcd729 【Inference Optimize】Support ERNIE-4_5-300B-A47B-2BITS-Paddle model TP2/TP4 Inference (#2666)
* Support TP2&TP4 Wint

* Support TP2&TP4 Wint2 Inference
2025-07-01 18:29:11 +08:00
freeliuzc
2b7f74d427 fix docs (#2669)
Co-authored-by: liuzichang01 <liuzichang01@baidu.com>
2025-07-01 18:02:44 +08:00
Jiang-Jia-Jun
164b83ab0b [Doc] Update nvidia gpu installation description 2025-07-01 15:22:19 +08:00
Jiang-Jia-Jun
01d5d66d95 [Doc] Update nvidia gpu installation description 2025-07-01 15:20:40 +08:00
Jiang-Jia-Jun
8f1dddcf35 [Doc] Update nvidia gpu installation description 2025-07-01 15:20:21 +08:00
hong19860320
8e335db645 Update kunlunxin_xpu.md (#2662) 2025-07-01 15:10:45 +08:00
AIbin
1bb296c5ad update quantization doc (#2659) 2025-07-01 15:05:02 +08:00
hong19860320
92428a5ae4 Update kunlunxin_xpu.md (#2657)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-01 12:28:49 +08:00
RichardWooSJTU
85090ed799 remove unuseful scripts (#2652) 2025-07-01 10:18:25 +08:00
ltd0924
50aa4080c0 [Serving] fix offline inference sampling parameters overwrite (#2654) 2025-07-01 10:17:46 +08:00
YUNSHEN XIE
d5af78945b Add ci (#2650)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add ci ut and workflow

* Automatically cancel any previous CI runs for the ci.yml workflow, keeping only the latest one active
2025-06-30 20:20:49 +08:00
hong19860320
6bead64f48 Update kunlunxin_xpu.md 2025-06-30 15:59:22 +08:00
hong19860320
6b95b42986 Update kunlunxin_xpu.md 2025-06-30 15:49:32 +08:00
hong19860320
b0d3a630ba Merge branch 'develop' of https://github.com/hong19860320/FastDeploy into hongming/fix_xpu_doc 2025-06-30 15:42:29 +08:00
hong19860320
ef72873695 Update kunlunxin_xpu.md 2025-06-30 15:27:48 +08:00
qingqing01
4a5db82fb2 Merge pull request #2644 from kevincheng2/develop
[docs] update docs
2025-06-30 14:55:54 +08:00
kevin
4f7b42ce3e update docs 2025-06-30 14:45:41 +08:00
qingqing01
df1e22b595 Merge pull request #2642 from MARD1NO/remove_redundant_sync
use shfl_xor_sync to reduce redundant shfl broadcast
2025-06-30 14:33:12 +08:00
MARD1NO
ac5f860536 use shfl_xor_sync to reduce redundant shfl broadcast 2025-06-30 13:12:21 +08:00
qingqing01
90a5b18742 Update disaggregated.md
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-06-30 11:57:12 +08:00
qingqing01
7c43500060 Update disaggregated.md 2025-06-30 11:56:33 +08:00
Jiang-Jia-Jun
ea29b01a68 Update quick_start.md 2025-06-30 11:52:05 +08:00
Jiang-Jia-Jun
51f1306de8 Merge pull request #2641 from yongqiangma/doc
fix format
2025-06-30 11:42:52 +08:00
yongqiangma
f9431106d8 Merge branch 'develop' into doc 2025-06-30 11:42:43 +08:00
Jiang-Jia-Jun
f4ce0393f3 Merge pull request #2640 from chang-wenbin/fix_wint2_doc
【Update Doc】Update Wint2 Doc
2025-06-30 11:40:41 +08:00
mayongqiang
0d39e23ab9 fix format 2025-06-30 11:39:59 +08:00
changwenbin
634d3c3642 update wint2 doc 2025-06-30 11:36:15 +08:00
Jiang-Jia-Jun
cb54462303 Update README.md 2025-06-30 11:16:00 +08:00
Jiang-Jia-Jun
c9b358c502 Merge pull request #2639 from ZhangYulongg/patch-1
Update README.md
2025-06-30 11:10:17 +08:00
Divano
733cc47b00 Merge pull request #2638 from PaddlePaddle/DDDivano-FixWorkflow
Update gh-pages.yml
2025-06-30 10:55:52 +08:00
Zhang Yulong
264ddfdf8a Update README.md 2025-06-30 10:28:15 +08:00
Divano
8fb6b5f731 Update gh-pages.yml 2025-06-30 10:16:31 +08:00
Jiang-Jia-Jun
870a4554c7 Merge pull request #2637 from Jiang-Jia-Jun/develop
Fix workflow
2025-06-30 10:02:41 +08:00
Jiang-Jia-Jun
d2b8fbe0bf Fix workflow 2025-06-30 02:01:59 +00:00
Jiang-Jia-Jun
98fc80e694 Merge pull request #2636 from Jiang-Jia-Jun/develop
Fix workflow
2025-06-30 09:28:47 +08:00
Jiang-Jia-Jun
7c00834e3d Fix workflow 2025-06-30 01:27:36 +00:00
Jiang-Jia-Jun
b755047927 Merge pull request #2635 from Jiang-Jia-Jun/develop
Add workflows
2025-06-30 09:14:50 +08:00
Jiang-Jia-Jun
fdf0c6349e Add workflows 2025-06-30 01:14:07 +00:00
Jiang-Jia-Jun
50c5bc1e9d Update nvidia_gpu.md 2025-06-30 08:59:41 +08:00
Jiang-Jia-Jun
187a5ae592 Update quick_start.md 2025-06-30 08:57:25 +08:00
Jiang-Jia-Jun
866946de0d Update quick_start.md 2025-06-30 08:57:02 +08:00
Jiang-Jia-Jun
72c768168c Update ernie-4.5-vl.md 2025-06-30 08:56:27 +08:00
Jiang-Jia-Jun
f0b7e99f05 Update ernie-4.5.md 2025-06-30 08:56:08 +08:00
Jiang-Jia-Jun
b40633cbbd Update quick_start.md 2025-06-30 08:55:29 +08:00
Jiang-Jia-Jun
f14f361c23 Add README.md for quick start 2025-06-30 08:55:05 +08:00
Jiang-Jia-Jun
47299dbc54 Update supported models 2025-06-30 08:50:44 +08:00
Jiang-Jia-Jun
6cb1a75663 Update supported models 2025-06-30 08:50:21 +08:00
Jiang-Jia-Jun
08e59c71ee Update supported models 2025-06-30 08:41:37 +08:00
Jiang-Jia-Jun
b4d82b8eb0 Fix installation doc link 2025-06-30 08:34:56 +08:00
Jiang-Jia-Jun
8a340b6458 Fix installation doc link 2025-06-30 08:34:01 +08:00
qingqing01
db1c88946d Fix links in README.md 2025-06-30 08:31:04 +08:00
Jiang-Jia-Jun
a1fa84e418 Fix get started document 2025-06-30 08:18:40 +08:00
Jiang-Jia-Jun
046c002b58 Fix advanced usage link 2025-06-30 08:17:45 +08:00
Jiang-Jia-Jun
39ed715b5e Update supported models 2025-06-30 08:16:03 +08:00
Jiang-Jia-Jun
16b0b51a5d Fix dead link 2025-06-30 08:06:16 +08:00
Jiang-Jia-Jun
53ddc6806e Merge pull request #2633 from Jiang-Jia-Jun/develop
FastDeploy 2.0: Large Language Model Deployment
2025-06-30 07:57:31 +08:00
Jiang-Jia-Jun
aba655c228 Update mkdocs navigation 2025-06-29 23:38:53 +00:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00
Jiang-Jia-Jun
d151496038 Add requirement 2025-06-29 19:11:44 +00:00
Jiang-Jia-Jun
8dba1d90a1 Rename filepath 2025-06-30 01:32:01 +08:00
Jiang-Jia-Jun
9322f379e7 Merge pull request #2632 from PaddlePaddle/Jiang-Jia-Jun-patch-1
Create README.md
2025-06-30 01:29:59 +08:00
Jiang-Jia-Jun
135489dd30 Create README.md 2025-06-30 01:29:45 +08:00
YUNSHEN XIE
d68cd5faec Merge pull request #2624 from XieYunshen/develop
Add scripts and GitHub Actions workflows for CI
2025-06-16 11:38:21 +08:00
XieYunshen
0c490b72cc Automatically cancel any previous CI runs for the ci.yml workflow, keeping only the latest one active 2025-06-16 11:05:10 +08:00
YUNSHEN XIE
5106c83e0d Merge branch 'develop' into develop 2025-06-16 02:23:05 +08:00
XieYunshen
0825146538 add ci ut and workflow 2025-06-16 02:18:00 +08:00
Jiang-Jia-Jun
3d44c7a3e9 Update serving parameters description 2025-06-16 00:04:48 +08:00
Jiang-Jia-Jun
8e6faa999c Polish README.md 2025-06-16 00:04:48 +08:00
jiangjiajun
ffb096a60b Remove unavailable doc link 2025-06-16 00:04:48 +08:00
jiangjiajun
00453996b8 Provide prebuilt docker image 2025-06-16 00:04:48 +08:00
jiangjiajun
bef75f5a61 Fix install guide 2025-06-16 00:04:48 +08:00
jiangjiajun
15e11b8b0b Update docker image address 2025-06-16 00:04:48 +08:00
jiangjiajun
5be18dea00 Update README.md 2025-06-16 00:04:48 +08:00
jiangjiajun
7389161af1 [LLM] Add output module and polish docs 2025-06-16 00:04:48 +08:00
jiangjiajun
8cfd95fb0b [LLM] Add output module and polish docs 2025-06-16 00:04:48 +08:00
jiangjiajun
a54a28f0a3 [LLM] Add output module and polish docs 2025-06-16 00:04:48 +08:00
jiangjiajun
149c79699d [LLM] First commit the llm deployment code 2025-06-16 00:04:48 +08:00
Jules
8513414112 fix Windows text encoding issue causing infinite loop 2025-06-16 00:04:48 +08:00
Zheng-Bicheng
b8a54dbf57 Update cosine_similarity.cc 2025-06-16 00:04:47 +08:00
Jiang-Jia-Jun
f57422e3c1 Update serving parameters description 2025-06-10 19:27:38 +08:00
Jiang-Jia-Jun
0dcfc6de75 Polish README.md 2025-06-10 17:17:34 +08:00
jiangjiajun
041919b343 Remove unavailable doc link 2025-06-10 11:02:37 +08:00
jiangjiajun
b03fb36873 Provide prebuilt docker image 2025-06-10 10:28:54 +08:00
jiangjiajun
26dd92297b Fix install guide 2025-06-10 02:11:13 +08:00
Jiang-Jia-Jun
00f08365a0 Merge pull request #2622 from PaddlePaddle/2.0.0-llm
[LLM] Upgrade FastDeploy to 2.0 version
2025-06-10 02:02:25 +08:00
jiangjiajun
1ebc4f9492 Update docker image address 2025-06-10 01:59:43 +08:00
jiangjiajun
f7cd5560fe Update README.md 2025-06-09 20:39:08 +08:00
jiangjiajun
0a42545723 [LLM] Add output module and polish docs 2025-06-09 20:30:41 +08:00
jiangjiajun
0d2651e594 [LLM] Add output module and polish docs 2025-06-09 20:29:17 +08:00
jiangjiajun
fb18f3092d [LLM] Add output module and polish docs 2025-06-09 20:26:53 +08:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00
12636 changed files with 287391 additions and 1293527 deletions

View File

@@ -1,180 +1,29 @@
# This file is used by clang-format to autoformat paddle source code
#
# The clang-format is part of llvm toolchain.
# It need to install llvm and clang to format source code style.
#
# The basic usage is,
# clang-format -i -style=file PATH/TO/SOURCE/CODE
#
# The -style=file implicit use ".clang-format" file located in one of
# parent directory.
# The -i means inplace change.
#
# The document of clang-format is
# http://clang.llvm.org/docs/ClangFormat.html
# http://clang.llvm.org/docs/ClangFormatStyleOptions.html
---
Language: Cpp
# BasedOnStyle: LLVM
AccessModifierOffset: -1
AlignAfterOpenBracket: Align
AlignArrayOfStructures: None
AlignConsecutiveMacros: None
AlignConsecutiveAssignments: None
AlignConsecutiveBitFields: None
AlignConsecutiveDeclarations: None
AlignEscapedNewlines: Right
AlignOperands: Align
AlignTrailingComments: true
AllowAllArgumentsOnNextLine: true
AllowAllConstructorInitializersOnNextLine: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortEnumsOnASingleLine: true
AllowShortBlocksOnASingleLine: Never
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: All
AllowShortLambdasOnASingleLine: All
AllowShortIfStatementsOnASingleLine: Never
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: MultiLine
AttributeMacros:
- __capability
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
AfterCaseLabel: false
AfterClass: false
AfterControlStatement: Never
AfterEnum: false
AfterFunction: false
AfterNamespace: false
AfterObjCDeclaration: false
AfterStruct: false
AfterUnion: false
AfterExternBlock: false
BeforeCatch: false
BeforeElse: false
BeforeLambdaBody: false
BeforeWhile: false
IndentBraces: false
SplitEmptyFunction: true
SplitEmptyRecord: true
SplitEmptyNamespace: true
BreakBeforeBinaryOperators: None
BreakBeforeConceptDeclarations: true
BreakBeforeBraces: Attach
BreakBeforeInheritanceComma: false
BreakInheritanceList: BeforeColon
BreakBeforeTernaryOperators: true
BreakConstructorInitializersBeforeComma: false
BreakConstructorInitializers: BeforeColon
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: true
ColumnLimit: 80
# CommentPragmas: '^ IWYU pragma:'
# CommentPragmas: '^[^ ]'
CommentPragmas: '^\\.+'
CompactNamespaces: false
ConstructorInitializerAllOnOneLineOrOnePerLine: false
ConstructorInitializerIndentWidth: 4
BasedOnStyle: Google
IndentWidth: 4
TabWidth: 2
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DeriveLineEnding: true
DerivePointerAlignment: false
DisableFormat: false
EmptyLineAfterAccessModifier: Never
EmptyLineBeforeAccessModifier: LogicalBlock
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
ForEachMacros:
- foreach
- Q_FOREACH
- BOOST_FOREACH
IfMacros:
- KJ_IF_MAYBE
AccessModifierOffset: -1 # The private/protected/public has no indent in class
Standard: Cpp11
AllowAllParametersOfDeclarationOnNextLine: true
BinPackParameters: false
BinPackArguments: false
IncludeBlocks: Preserve
IncludeCategories:
- Regex: '^"(llvm|llvm-c|clang|clang-c)/'
Priority: 2
SortPriority: 0
CaseSensitive: false
- Regex: '^(<|"(gtest|gmock|isl|json)/)'
Priority: 3
SortPriority: 0
CaseSensitive: false
- Regex: '.*'
Priority: 1
SortPriority: 0
CaseSensitive: false
IncludeIsMainRegex: '(Test)?$'
IncludeIsMainSourceRegex: ''
IndentAccessModifiers: false
IndentCaseLabels: false
IndentCaseBlocks: false
IndentGotoLabels: true
IndentPPDirectives: None
IndentExternBlock: AfterExternBlock
IndentRequires: false
IndentWidth: 2
IndentWrappedFunctionNames: false
InsertTrailingCommas: None
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: true
LambdaBodyIndentation: Signature
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBinPackProtocolList: Auto
ObjCBlockIndentWidth: 2
ObjCBreakBeforeNestedBlockParam: true
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: true
PenaltyBreakAssignment: 2
PenaltyBreakBeforeFirstCallParameter: 19
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
PenaltyBreakTemplateDeclaration: 10
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 60
PenaltyIndentedWhitespace: 0
PointerAlignment: Left
PPIndentWidth: -1
ReferenceAlignment: Pointer
ReflowComments: false
ShortNamespaceLines: 1
SortIncludes: CaseSensitive
SortJavaStaticImport: Before
SortUsingDeclarations: true
SpaceAfterCStyleCast: false
SpaceAfterLogicalNot: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
SpaceBeforeCaseColon: false
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: true
SpaceBeforeInheritanceColon: true
SpaceBeforeParens: ControlStatements
SpaceAroundPointerQualifiers: Default
SpaceBeforeRangeBasedForLoopColon: true
SpaceInEmptyBlock: false
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 2
SpacesInAngles: Never
SpacesInConditionalStatement: false
SpacesInContainerLiterals: true
SpacesInCStyleCastParentheses: false
SpacesInLineCommentPrefix:
Minimum: 1
Maximum: -1
SpacesInParentheses: false
SpacesInSquareBrackets: false
SpaceBeforeSquareBrackets: false
BitFieldColonSpacing: Both
Standard: Latest
StatementAttributeLikeMacros:
- Q_EMIT
StatementMacros:
- Q_UNUSED
- QT_REQUIRE_VERSION
TabWidth: 8
UseCRLF: false
UseTab: Never
WhitespaceSensitiveMacros:
- STRINGIZE
- PP_STRINGIZE
- BOOST_PP_STRINGIZE
- NS_SWIFT_NAME
- CF_SWIFT_NAME
IncludeIsMainSourceRegex: (\.cu)$
...

View File

@@ -1,15 +0,0 @@
#!/bin/bash
set -e
readonly VERSION="3.8"
version=$(clang-format -version)
if ! [[ version=="VERSION"* ]]; then
echo "clang-format version check failed."
echo "a version contains 'VERSIONisneeded,butgetversion'"
echo "you can install the right version, and make an soft-link to '$PATH' env"
exit -1
fi
clang-format -style=google $@

View File

@@ -1,60 +0,0 @@
#!/bin/bash
#TOTAL_ERRORS=0
#echo "HAHAHAHAHHA"
#exit 5
#
#files=$(
#
#if [[ ! $TRAVIS_BRANCH ]]; then
# # install cpplint on local machine.
# if [[ ! $(which cpplint) ]]; then
# pip install cpplint
# fi
# # diff files on local machine.
# files=$(git diff --cached --name-status | awk 'Extra open brace or missing close brace2}')
#else
# # diff files between PR and latest commit on Travis CI.
# branch_ref=(gitrevparse"TRAVIS_BRANCH")
# head_ref=$(git rev-parse HEAD)
# files=(gitdiffnamestatusbranch_ref $head_ref | awk 'Extra open brace or missing close brace2}')
#fi
## The trick to remove deleted files: https://stackoverflow.com/a/2413151
#for file in $files; do
# echo $file
# if [[ $file =~ ^(patches/.*) ]]; then
# continue;
# else
# cpplint --filter=-readability/fn_size $file;
# TOTAL_ERRORS=(exprTOTAL_ERRORS + $?);
# fi
#done
#
#exit $TOTAL_ERRORS
if git rev-parse --verify HEAD >/dev/null 2>&1
then
against=HEAD
else
# Initial commit: diff against an empty tree object
against=4b825dc642cb6eb9a060e54bf8d69288fbee4904
fi
# Redirect output to stderr.
exec 1>&2
cpplint=cpplint
sum=0
filters='-build/include_order,-build/namespaces,-legal/copyright,-runtime/references,-build/include_what_you_use'
# for cpp
for file in $(git diff-index --name-status $against -- | grep -E '\.[ch](pp)?$' | awk '{print $2}'); do
$cpplint --filter=$filters $file
sum=$(expr ${sum} + $?)
done
if [ ${sum} -eq 0 ]; then
exit 0
else
exit 1
fi

7
.flake8 Normal file
View File

@@ -0,0 +1,7 @@
[flake8]
ignore = E203, E402, E501, E731, E741, W503, W605, E722, E231, W604, E702, E226, E221, E713, E271
max-line-length = 119
# E402: module level import not at top of file
per-file-ignores =
__init__.py:F401,F403,E402

View File

@@ -1,18 +0,0 @@
---
name: English
about: Report issue in English
title: ''
labels: ''
assignees: ''
---
## Environment
FastDeploy version: e.g 0.8.0 or the latest code in develop branch
OS Platform: e.g. Linux x64 / Windows x64 / Mac OSX 12.1(arm or intel)
Hardware: e.g. Nvidia GPU 3080Ti CUDA 11.2 CUDNN 8.3
Program Language: e.g. Python 3.8
## Problem description
Please attach the log file if there's problem happend.

View File

@@ -1,10 +0,0 @@
---
name: Other
about: Other issues, e.g feature/model request
title: ''
labels: ''
assignees: ''
---

View File

@@ -1,35 +0,0 @@
---
name: 报告issue
about: 反馈使用中遇到的问题
title: ''
labels: ''
assignees: ''
---
*********************************************
温馨提示:根据社区不完全统计,按照模板提问,可以加快回复和解决问题的速度
*********************************************
## 环境
- 【FastDeploy版本】 说明具体的版本如fastdeploy-linux-gpu-0.8.0
- 【编译命令】如果您是自行编译的FastDeploy请说明您的编译方式参数命令
- 【系统平台】: Linux x64(Ubuntu 18.04) / Windows x64(Windows10) / Mac OSX arm(12.0) / Mac OSX intel(12.0)
- 【硬件】: 说明具体硬件型号,如 Nvidia GPU 3080TI CUDA 11.2 CUDNN 8.3
- 【编译语言】: C++ / Python(3.7或3.8等)
## 问题日志及出现问题的操作流程
- 附上详细的问题日志有助于快速定位分析
- 【模型跑不通】
- - 先执行`examples`下的部署示例包括使用examples提供的模型确认是否可以正确执行
- - 如若`examples`下的代码可以运行,但自己的模型,或自己的代码不能运行
- - - 提供复现问题的 代码+模型+错误log供工程师快速定位问题
- 【模型精度问题】
- - 先执行`examples`下的部署示例包括使用examples提供的模型确认是否可以正确执行
- - 如若`examples`下的代码可以运行,但自己的模型,或自己的代码不能运行
- - - 提供复现问题的 代码+模型+错误log供工程师快速定位问题
- 【性能问题】描述清楚对比的方式
- - 注意性能测试循环跑N次取后80%的用时平均(模型启动时,刚开始受限于资源分配,速度会较慢)
- - FastDeploy的Predict包含模型本身之外的数据前后处理用时
- - - 提供复现问题的 代码+模型+错误log供工程师快速定位问题

View File

@@ -1,10 +0,0 @@
<!-- Demo: https://github.com/PaddlePaddle/Paddle/pull/24810 -->
### PR types(PR类型)
<!-- One of PR types [ Model | Backend | Serving | Quantization | Doc | Bug Fix | Other] -->
### Description
<!-- Describe what this PR does -->

50
.github/workflows/Codestyle-Check.yml vendored Normal file
View File

@@ -0,0 +1,50 @@
name: Codestyle-Check
on:
pull_request:
branches:
- develop
- 'release/*'
jobs:
pre-commit:
name: Pre Commit
if: ${{ github.repository_owner == 'PaddlePaddle' }}
runs-on: ubuntu-latest
env:
PR_ID: ${{ github.event.pull_request.number }}
BRANCH: ${{ github.event.pull_request.base.ref }}
steps:
- name: Cleanup
run: |
rm -rf * .[^.]*
- name: Checkout base repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.base.ref }}
fetch-depth: 1000
- name: Merge PR to test branch
run: |
git fetch origin pull/${PR_ID}/merge
git checkout -b test FETCH_HEAD
- name: Setup python3.10
uses: actions/setup-python@v5
with:
python-version: '3.10'
cache: 'pip'
- name: Install dependencies
run: |
pip install pre-commit==4.2.0 cpplint==1.6.0 clang-format==13.0.0
- name: Check pre-commit
env:
SKIP_CLANG_TIDY_CHECK: "ON"
run: |
set +e
bash -x tools/codestyle/pre_commit.sh;EXCODE=$?
exit $EXCODE

186
.github/workflows/_accuracy_test.yml vendored Normal file
View File

@@ -0,0 +1,186 @@
name: Accuracy Test
description: "Run Accuracy Tests"
on:
workflow_call:
inputs:
DOCKER_IMAGE:
description: "Build Images"
required: true
type: string
default: "ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:cuda126-py310"
FASTDEPLOY_ARCHIVE_URL:
description: "URL of the compressed FastDeploy code archive."
required: true
type: string
FASTDEPLOY_WHEEL_URL:
description: "URL of the FastDeploy Wheel."
required: true
type: string
CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
MODEL_CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
jobs:
accuracy_tests:
runs-on: [self-hosted, GPU-h20-1Cards]
timeout-minutes: 60
steps:
- name: Code Prepare
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fd_archive_url: ${{ inputs.FASTDEPLOY_ARCHIVE_URL }}
run: |
set -x
REPO="https://github.com/${{ github.repository }}.git"
FULL_REPO="${{ github.repository }}"
REPO_NAME="${FULL_REPO##*/}"
BASE_BRANCH="${{ github.base_ref }}"
# Clean the repository directory before starting
docker run --rm --net=host -v $(pwd):/workspace -w /workspace \
-e "REPO_NAME=${REPO_NAME}" \
${docker_image} /bin/bash -c '
if [ -d ${REPO_NAME} ]; then
echo "Directory ${REPO_NAME} exists, removing it..."
rm -rf ${REPO_NAME}*
fi
'
wget -q ${fd_archive_url}
tar -xf FastDeploy.tar.gz
rm -rf FastDeploy.tar.gz
cd FastDeploy
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
git log -n 3 --oneline
- name: Run FastDeploy Base Tests
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fastdeploy_wheel_url: ${{ inputs.FASTDEPLOY_WHEEL_URL }}
CACHE_DIR: ${{ inputs.CACHE_DIR }}
MODEL_CACHE_DIR: ${{ inputs.MODEL_CACHE_DIR }}
run: |
runner_name="${{ runner.name }}"
CARD_ID=$(echo "${runner_name}" | awk -F'-' '{print $NF}')
DEVICES=$(echo "$CARD_ID" | fold -w1 | paste -sd,)
DEVICE_PORT=$(echo "$DEVICES" | cut -d',' -f1)
FLASK_PORT=$((42068 + DEVICE_PORT * 100))
FD_API_PORT=$((42088 + DEVICE_PORT * 100))
FD_ENGINE_QUEUE_PORT=$((42058 + DEVICE_PORT * 100))
FD_METRICS_PORT=$((42078 + DEVICE_PORT * 100))
FD_CACHE_QUEUE_PORT=$((42098 + DEVICE_PORT * 100))
echo "Test ENV Parameter:"
echo "========================================================="
echo "FLASK_PORT=${FLASK_PORT}"
echo "FD_API_PORT=${FD_API_PORT}"
echo "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}"
echo "FD_METRICS_PORT=${FD_METRICS_PORT}"
echo "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}"
echo "DEVICES=${DEVICES}"
echo "========================================================="
CACHE_DIR="${CACHE_DIR:-$(dirname "$(dirname "${{ github.workspace }}")")}"
echo "CACHE_DIR is set to ${CACHE_DIR}"
if [ ! -f "${CACHE_DIR}/gitconfig" ]; then
touch "${CACHE_DIR}/gitconfig"
fi
if [ ! -d "${MODEL_CACHE_DIR}" ]; then
echo "Error: MODEL_CACHE_DIR '${MODEL_CACHE_DIR}' does not exist."
exit 1
fi
PORTS=($FLASK_PORT $FD_API_PORT $FD_ENGINE_QUEUE_PORT $FD_METRICS_PORT $FD_CACHE_QUEUE_PORT)
LOG_FILE="./port_cleanup_$(date +%Y%m%d_%H%M%S).log"
echo "==== LOG_FILE is ${LOG_FILE} ===="
echo "==== PORT CLEAN BEFORE TASK RUN ====" | tee -a $LOG_FILE
for port in "${PORTS[@]}"; do
PIDS=$(lsof -t -i :$port || true)
if [ -n "$PIDS" ]; then
echo "Port $port is occupied by PID(s): $PIDS" | tee -a $LOG_FILE
echo "$PIDS" | xargs -r kill -9
echo "Port $port cleared" | tee -a $LOG_FILE
else
echo "Port $port is free" | tee -a $LOG_FILE
fi
done
echo "==== PORT CLEAN COMPLETE ====" | tee -a $LOG_FILE
echo "========================================================="
echo "Ensuring no stale container named ${runner_name} ..."
if [ "$(docker ps -a -q -f name=${runner_name})" ]; then
echo "Removing stale container: ${runner_name}"
docker rm -f ${runner_name} || true
fi
docker run --rm --ipc=host --pid=host --net=host \
--name ${runner_name} \
-v $(pwd):/workspace \
-w /workspace \
-e fastdeploy_wheel_url=${fastdeploy_wheel_url} \
-e "FD_API_PORT=${FD_API_PORT}" \
-e "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}" \
-e "FD_METRICS_PORT=${FD_METRICS_PORT}" \
-e "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}" \
-e "FLASK_PORT=${FLASK_PORT}" \
-v "${MODEL_CACHE_DIR}:/MODELDATA" \
-v "${CACHE_DIR}/gitconfig:/etc/gitconfig:ro" \
-v "${CACHE_DIR}/.cache:/root/.cache" \
-v "${CACHE_DIR}/ConfigDir:/root/.config" \
-e TZ="Asia/Shanghai" \
--gpus '"device='"${DEVICES}"'"' ${docker_image} /bin/bash -xc '
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
python -m pip install ${fastdeploy_wheel_url}
python -m pip install pytest
wget https://paddle-qa.bj.bcebos.com/zhengtianyu/tools/llm-deploy-linux-amd64
chmod +x ./llm-deploy-linux-amd64
./llm-deploy-linux-amd64 -python python3.10 \
-model_name ERNIE-4.5-0.3B-Paddle \
-model_path /MODELDATA \
--skip install
git config --global --add safe.directory /workspace/FastDeploy
cd FastDeploy
pushd tests/ce/deploy
python3.10 deploy.py > dd.log 2>&1 &
sleep 3
curl -X POST http://0.0.0.0:${FLASK_PORT}/start \
-H "Content-Type: application/json" \
-d "{\"--model\": \"/MODELDATA/ERNIE-4.5-0.3B-Paddle\"}"
curl -X POST http://localhost:${FLASK_PORT}/wait_for_infer?timeout=90
popd
pushd tests/ce/accuracy_cases
export URL=http://localhost:${FD_API_PORT}/v1/chat/completions
export TEMPLATE=TOKEN_LOGPROB
export MODEL_SIZE=0.3B
TEST_EXIT_CODE=0
python gsm8k.py || TEST_EXIT_CODE=1
popd
echo "TEST_EXIT_CODE=${TEST_EXIT_CODE}" >> /workspace/FastDeploy/exit_code.env
'
if [ -f ./FastDeploy/exit_code.env ]; then
source ./FastDeploy/exit_code.env
cat ./FastDeploy/exit_code.env >> $GITHUB_ENV
fi
echo "TEST_EXIT_CODE=${TEST_EXIT_CODE}"
exit ${TEST_EXIT_CODE}

229
.github/workflows/_base_test.yml vendored Normal file
View File

@@ -0,0 +1,229 @@
name: Base Test
description: "Run Base Tests"
on:
workflow_call:
inputs:
DOCKER_IMAGE:
description: "Build Images"
required: true
type: string
default: "ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:cuda126-py310"
FASTDEPLOY_ARCHIVE_URL:
description: "URL of the compressed FastDeploy code archive."
required: true
type: string
FASTDEPLOY_WHEEL_URL:
description: "URL of the FastDeploy Wheel."
required: true
type: string
CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
MODEL_CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
jobs:
base_tests:
runs-on: [self-hosted, GPU-h20-1Cards]
timeout-minutes: 60
steps:
- name: Code Prepare
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fd_archive_url: ${{ inputs.FASTDEPLOY_ARCHIVE_URL }}
run: |
set -x
REPO="https://github.com/${{ github.repository }}.git"
FULL_REPO="${{ github.repository }}"
REPO_NAME="${FULL_REPO##*/}"
BASE_BRANCH="${{ github.base_ref }}"
# Clean the repository directory before starting
docker run --rm --net=host -v $(pwd):/workspace -w /workspace \
-e "REPO_NAME=${REPO_NAME}" \
${docker_image} /bin/bash -c '
if [ -d ${REPO_NAME} ]; then
echo "Directory ${REPO_NAME} exists, removing it..."
rm -rf ${REPO_NAME}*
fi
'
wget -q ${fd_archive_url}
tar -xf FastDeploy.tar.gz
rm -rf FastDeploy.tar.gz
cd FastDeploy
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
git log -n 3 --oneline
- name: Run FastDeploy Base Tests
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fastdeploy_wheel_url: ${{ inputs.FASTDEPLOY_WHEEL_URL }}
CACHE_DIR: ${{ inputs.CACHE_DIR }}
MODEL_CACHE_DIR: ${{ inputs.MODEL_CACHE_DIR }}
run: |
runner_name="${{ runner.name }}"
CARD_ID=$(echo "${runner_name}" | awk -F'-' '{print $NF}')
DEVICES=$(echo "$CARD_ID" | fold -w1 | paste -sd,)
DEVICE_PORT=$(echo "$DEVICES" | cut -d',' -f1)
FLASK_PORT=$((42068 + DEVICE_PORT * 100))
FD_API_PORT=$((42088 + DEVICE_PORT * 100))
FD_ENGINE_QUEUE_PORT=$((42058 + DEVICE_PORT * 100))
FD_METRICS_PORT=$((42078 + DEVICE_PORT * 100))
FD_CACHE_QUEUE_PORT=$((42098 + DEVICE_PORT * 100))
echo "Test ENV Parameter:"
echo "========================================================="
echo "FLASK_PORT=${FLASK_PORT}"
echo "FD_API_PORT=${FD_API_PORT}"
echo "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}"
echo "FD_METRICS_PORT=${FD_METRICS_PORT}"
echo "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}"
echo "DEVICES=${DEVICES}"
echo "========================================================="
CACHE_DIR="${CACHE_DIR:-$(dirname "$(dirname "${{ github.workspace }}")")}"
echo "CACHE_DIR is set to ${CACHE_DIR}"
if [ ! -f "${CACHE_DIR}/gitconfig" ]; then
touch "${CACHE_DIR}/gitconfig"
fi
if [ ! -d "${MODEL_CACHE_DIR}" ]; then
echo "Error: MODEL_CACHE_DIR '${MODEL_CACHE_DIR}' does not exist."
exit 1
fi
PORTS=($FLASK_PORT $FD_API_PORT $FD_ENGINE_QUEUE_PORT $FD_METRICS_PORT $FD_CACHE_QUEUE_PORT)
LOG_FILE="./port_cleanup_$(date +%Y%m%d_%H%M%S).log"
echo "==== LOG_FILE is ${LOG_FILE} ===="
echo "==== PORT CLEAN BEFORE TASK RUN ====" | tee -a $LOG_FILE
for port in "${PORTS[@]}"; do
PIDS=$(lsof -t -i :$port || true)
if [ -n "$PIDS" ]; then
echo "Port $port is occupied by PID(s): $PIDS" | tee -a $LOG_FILE
echo "$PIDS" | xargs -r kill -9
echo "Port $port cleared" | tee -a $LOG_FILE
else
echo "Port $port is free" | tee -a $LOG_FILE
fi
done
echo "==== PORT CLEAN COMPLETE ====" | tee -a $LOG_FILE
echo "========================================================="
echo "Ensuring no stale container named ${runner_name} ..."
if [ "$(docker ps -a -q -f name=${runner_name})" ]; then
echo "Removing stale container: ${runner_name}"
docker rm -f ${runner_name} || true
fi
docker run --rm --ipc=host --pid=host --net=host \
--name ${runner_name} \
-v $(pwd):/workspace \
-w /workspace \
-e fastdeploy_wheel_url=${fastdeploy_wheel_url} \
-e "FD_API_PORT=${FD_API_PORT}" \
-e "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}" \
-e "FD_METRICS_PORT=${FD_METRICS_PORT}" \
-e "FLASK_PORT=${FLASK_PORT}" \
-e "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}" \
-v "${MODEL_CACHE_DIR}:/MODELDATA" \
-v "${CACHE_DIR}/gitconfig:/etc/gitconfig:ro" \
-v "${CACHE_DIR}/.cache:/root/.cache" \
-v "${CACHE_DIR}/ConfigDir:/root/.config" \
-e TZ="Asia/Shanghai" \
--gpus '"device='"${DEVICES}"'"' ${docker_image} /bin/bash -xc '
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
python -m pip install ${fastdeploy_wheel_url}
python -m pip install pytest
wget https://paddle-qa.bj.bcebos.com/zhengtianyu/tools/llm-deploy-linux-amd64
chmod +x ./llm-deploy-linux-amd64
./llm-deploy-linux-amd64 -python python3.10 \
-model_name ERNIE-4.5-0.3B-Paddle \
-model_path /MODELDATA \
--skip install
git config --global --add safe.directory /workspace/FastDeploy
cd FastDeploy
pushd tests/ce/deploy
python3.10 deploy.py > dd.log 2>&1 &
sleep 3
curl -X POST http://0.0.0.0:${FLASK_PORT}/start \
-H "Content-Type: application/json" \
-d "{\"--model\": \"/MODELDATA/ERNIE-4.5-0.3B-Paddle\"}"
check_service() {
local timeout=${1:-90}
local url="http://localhost:${FLASK_PORT}/wait_for_infer?timeout=${timeout}"
local resp
resp=$(curl -s -X POST "$url")
if echo "$resp" | grep -q "服务启动超时"; then
exit 8
fi
}
check_service 90
popd
pushd tests/ce/server
export URL=http://localhost:${FD_API_PORT}/v1/chat/completions
export TEMPLATE=TOKEN_LOGPROB
TEST_EXIT_CODE=0
python -m pytest -sv test_base_chat.py test_compare_top_logprobs.py test_logprobs.py test_params_boundary.py test_seed_usage.py test_stream.py test_evil_cases.py test_completions.py test_return_token_ids.py || TEST_EXIT_CODE=1
curl -X POST http://0.0.0.0:${FLASK_PORT}/switch \
-H "Content-Type: application/json" \
-d "{\"--model\": \"/MODELDATA/ERNIE-4.5-0.3B-Paddle\", \"--early-stop-config\": \"{\\\"enable_early_stop\\\":true, \\\"window_size\\\":6, \\\"threshold\\\":0.93}\"}"
check_service 90
python -m pytest -sv test_repetition_early_stop.py || TEST_EXIT_CODE=1
curl -X POST http://0.0.0.0:${FLASK_PORT}/switch \
-H "Content-Type: application/json" \
-d "{ \"--model\": \"/MODELDATA/ERNIE-4.5-0.3B-Paddle\", \"--max-concurrency\": 5, \"--max-waiting-time\": 1 }"
check_service 90
python -m pytest -sv test_max_concurrency.py || TEST_EXIT_CODE=1
curl -X POST http://0.0.0.0:${FLASK_PORT}/switch \
-H "Content-Type: application/json" \
-d "{ \"--model\": \"/MODELDATA/ERNIE-4.5-0.3B-Paddle\", \"--max-concurrency\": 5000, \"--max-waiting-time\": 1 }"
check_service 90
python -m pytest -sv test_max_waiting_time.py || TEST_EXIT_CODE=1
curl -X POST http://0.0.0.0:${FLASK_PORT}/switch \
-H "Content-Type: application/json" \
-d "{\"--model\": \"/MODELDATA/ernie-4_5-21b-a3b-bf16-paddle\", \"--config\": \"21b_mtp.yaml\", \"--enable-logprob\": \"False\"}"
check_service 180
export TEMPLATE=TOKEN_NORMAL
python -m pytest -sv test_seed_usage.py -k "not test_seed_stream" || TEST_EXIT_CODE=1
curl -X POST http://0.0.0.0:${FLASK_PORT}/switch \
-H "Content-Type: application/json" \
-d "{\"--model\": \"/MODELDATA/ernie-4_5-21b-a3b-bf16-paddle\", \"--config\": \"21b_sot.yaml\", \"--enable-logprob\": \"False\"}"
check_service 360
export TEMPLATE=TOKEN_NORMAL
python -m pytest -sv test_seed_usage.py -k "not test_seed_stream" || TEST_EXIT_CODE=1
popd
echo "TEST_EXIT_CODE=${TEST_EXIT_CODE}" >> /workspace/FastDeploy/exit_code.env
'
if [ -f ./FastDeploy/exit_code.env ]; then
source ./FastDeploy/exit_code.env
cat ./FastDeploy/exit_code.env >> $GITHUB_ENV
fi
echo "TEST_EXIT_CODE=${TEST_EXIT_CODE}"
exit ${TEST_EXIT_CODE}

200
.github/workflows/_build_linux.yml vendored Normal file
View File

@@ -0,0 +1,200 @@
name: FastDeploy Linux GPU Build Task
description: "FastDeploy packages build and upload"
on:
workflow_call:
inputs:
DOCKER_IMAGE:
description: "Build Images"
required: true
type: string
default: "ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:cuda126-py310"
FASTDEPLOY_ARCHIVE_URL:
description: "URL of the compressed FastDeploy code archive."
required: true
type: string
COMPILE_ARCH:
description: "Build GPU Archs"
required: true
type: string
default: "80,90"
WITH_NIGHTLY_BUILD:
description: "Enable nightly build mode (e.g. add date suffix to version)"
required: false
type: string
default: "OFF"
FD_VERSION:
description: "FastDeploy Package Version"
required: false
type: string
default: ""
PADDLEVERSION:
description: "Paddle Version Build Use"
required: false
type: string
default: ""
PADDLE_WHL_URL:
description: "Paddle Wheel Package URL"
required: false
type: string
default: ""
UPLOAD:
description: "Upload Package"
required: false
type: string
default: "ON"
CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
outputs:
wheel_path:
description: "Output path of the generated wheel"
value: ${{ jobs.fd-build.outputs.wheel_path }}
jobs:
fd-build:
runs-on: [self-hosted, GPU-Build]
timeout-minutes: 240
outputs:
wheel_path: ${{ steps.set_output.outputs.wheel_path }}
steps:
- name: Code Prepare
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fd_archive_url: ${{ inputs.FASTDEPLOY_ARCHIVE_URL }}
IS_PR: ${{ github.event_name == 'pull_request' }}
run: |
set -x
REPO="https://github.com/${{ github.repository }}.git"
FULL_REPO="${{ github.repository }}"
REPO_NAME="${FULL_REPO##*/}"
BASE_BRANCH="${{ github.base_ref }}"
# Clean the repository directory before starting
docker run --rm --net=host -v $(pwd):/workspace -w /workspace \
-e "REPO_NAME=${REPO_NAME}" \
${docker_image} /bin/bash -c '
if [ -d ${REPO_NAME} ]; then
echo "Directory ${REPO_NAME} exists, removing it..."
rm -rf ${REPO_NAME}*
fi
'
wget -q ${fd_archive_url}
tar -xf FastDeploy.tar.gz
rm -rf FastDeploy.tar.gz
cd FastDeploy
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
git log -n 3 --oneline
- name: FastDeploy Build
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
compile_arch: ${{ inputs.COMPILE_ARCH }}
fd_version: ${{ inputs.FD_VERSION }}
CACHE_DIR: ${{ inputs.CACHE_DIR }}
BRANCH_REF: ${{ github.ref_name }}
PADDLEVERSION: ${{ inputs.PADDLEVERSION }}
PADDLE_WHL_URL: ${{ inputs.PADDLE_WHL_URL }}
WITH_NIGHTLY_BUILD: ${{ inputs.WITH_NIGHTLY_BUILD }}
run: |
set -x
runner_name="${{ runner.name }}"
CARD_ID=$(echo "${runner_name}" | awk -F'-' '{print $NF}')
gpu_id=$(echo "$CARD_ID" | fold -w1 | paste -sd,)
CACHE_DIR="${CACHE_DIR:-$(dirname "$(dirname "${{ github.workspace }}")")}"
echo "CACHE_DIR is set to ${CACHE_DIR}"
if [ ! -f "${CACHE_DIR}/gitconfig" ]; then
touch "${CACHE_DIR}/gitconfig"
fi
PARENT_DIR=$(dirname "$WORKSPACE")
echo "PARENT_DIR:$PARENT_DIR"
docker run --rm --net=host \
--cap-add=SYS_PTRACE --privileged --shm-size=64G \
-v $(pwd):/workspace -w /workspace \
-v "${CACHE_DIR}/gitconfig:/etc/gitconfig:ro" \
-v "${CACHE_DIR}/.cache:/root/.cache" \
-v "${CACHE_DIR}/.ccache:/root/.ccache" \
-v "${CACHE_DIR}/ConfigDir:/root/.config" \
-e TZ="Asia/Shanghai" \
-e "COMPILE_ARCH=${compile_arch}" \
-e "FD_VERSION=${fd_version}" \
-e "WITH_NIGHTLY_BUILD=${WITH_NIGHTLY_BUILD}" \
-e "PADDLEVERSION=${PADDLEVERSION}" \
-e "PADDLE_WHL_URL=${PADDLE_WHL_URL}" \
-e "BRANCH_REF=${BRANCH_REF}" \
--gpus "\"device=${gpu_id}\"" ${docker_image} /bin/bash -c '
if [[ -n "${FD_VERSION}" ]]; then
export FASTDEPLOY_VERSION=${FD_VERSION}
echo "Custom FastDeploy version: ${FASTDEPLOY_VERSION}"
fi
git config --global --add safe.directory /workspace/FastDeploy
chown -R $(whoami) /workspace/FastDeploy
cd FastDeploy
if [[ "${WITH_NIGHTLY_BUILD}" == "ON" ]];then
GIT_COMMIT_TIME=$(git --no-pager show -s --format=%ci HEAD)
DATE_ONLY=$(echo $GIT_COMMIT_TIME | sed "s/ .*//;s/-//g")
echo "Git Commit Time: $GIT_COMMIT_TIME"
echo "Date Only: $DATE_ONLY"
export FASTDEPLOY_VERSION="${FASTDEPLOY_VERSION}.dev${DATE_ONLY}"
fi
# 针对不同分支和tag使用不同的PaddlePaddle安装包
if [[ "${PADDLE_WHL_URL}" != "" ]];then
python -m pip install ${PADDLE_WHL_URL}
elif [[ "${PADDLEVERSION}" != "" ]];then
python -m pip install paddlepaddle-gpu==${PADDLEVERSION} -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
else
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
fi
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install wheel
# 编译RDMA
export ENABLE_FD_RDMA=1
bash build.sh 1 python false [${COMPILE_ARCH}]
ls ./dist/*.whl
'
- name: Package Upload
id: set_output
env:
compile_arch: ${{ inputs.COMPILE_ARCH }}
run: |
set -x
if [[ "${{ github.event_name }}" == "pull_request" ]];then
commit_id=${{ github.event.pull_request.head.sha }}
pr_num=${{ github.event.pull_request.number }}
target_path=paddle-github-action/PR/FastDeploy/${pr_num}/${commit_id}/SM${compile_arch//,/_}
elif [[ "${{ github.ref_type }}" == "tag" ]]; then
commit_id=${{ github.sha }}
tag_name=${{ github.ref_name }}
target_path=paddle-github-action/TAG/FastDeploy/${tag_name}/${commit_id}/SM${compile_arch//,/_}
else
commit_id=${{ github.sha }}
branch_name=${{ github.ref_name }}
target_path=paddle-github-action/BRANCH/FastDeploy/${branch_name}/${commit_id}/SM${compile_arch//,/_}
fi
wget -q --no-proxy --no-check-certificate https://paddle-qa.bj.bcebos.com/CodeSync/develop/PaddlePaddle/PaddleTest/tools/bos_tools.py
push_file=$(realpath bos_tools.py)
python --version
python -m pip install bce-python-sdk==0.9.29
cd FastDeploy/dist/
matches=($(ls fastdeploy*.whl))
if [ ${#matches[@]} -ne 1 ]; then
echo "Error: Found ${#matches[@]} matching files, expected exactly 1"
exit 1
fi
fd_wheel_name=${matches[0]}
echo "Found: $fd_wheel_name"
tree -L 3
python ${push_file} fastdeploy*.whl ${target_path}
target_path_stripped="${target_path#paddle-github-action/}"
WHEEL_PATH=https://paddle-github-action.bj.bcebos.com/${target_path_stripped}/${fd_wheel_name}
echo "wheel_path=${WHEEL_PATH}" >> $GITHUB_OUTPUT

78
.github/workflows/_clone_linux.yml vendored Normal file
View File

@@ -0,0 +1,78 @@
name: FastDeploy Code Clone
description: "FastDeploy clone and upload"
on:
workflow_call:
inputs:
bos_dir:
type: string
required: false
default: 'FastDeploy'
outputs:
repo_archive_url:
description: "Compressed source code archive."
value: ${{ jobs.code-clone.outputs.repo_archive_url }}
jobs:
code-clone:
runs-on:
group: HK-Clone
outputs:
repo_archive_url: ${{ steps.set_output.outputs.repo_archive_url }}
steps:
- name: Clone FastDeploy
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'pull_request'
&& github.event.pull_request.base.ref
|| github.ref_name }}
submodules: 'recursive'
fetch-depth: 1000
- name: Merge PR (if needed)
if: ${{ github.event_name == 'pull_request' }}
run: |
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
echo "Fetching and merging PR..."
git fetch origin pull/${{ github.event.pull_request.number }}/head:pr/${{ github.event.pull_request.number }}
git merge --no-ff pr/${{ github.event.pull_request.number }}
echo "PR Branch log "
git log --oneline -n 5 pr/${{ github.event.pull_request.number }}
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Code Info Show and Upload
id: set_output
env:
AK: paddle
SK: paddle
run: |
git config --unset http.https://github.com/.extraheader
git submodule foreach --recursive sh -c "git config --local --unset-all 'http.https://github.com/.extraheader'"
git submodule foreach --recursive sh -c "git config remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*'"
echo "Current HEAD Log:"
git log --oneline -n 5
ls
cd ..
tar -zcf FastDeploy.tar.gz FastDeploy
if [[ "${{ github.event_name }}" == "pull_request" ]];then
commit_id=${{ github.event.pull_request.head.sha }}
pr_num=${{ github.event.pull_request.number }}
target_path=paddle-github-action/PR/FastDeploy/${pr_num}/${commit_id}
elif [[ "${{ github.ref_type }}" == "tag" ]]; then
commit_id=${{ github.sha }}
tag_name=${{ github.ref_name }}
target_path=paddle-github-action/TAG/FastDeploy/${tag_name}/${commit_id}
else
commit_id=${{ github.sha }}
branch_name=${{ github.ref_name }}
target_path=paddle-github-action/BRANCH/FastDeploy/${branch_name}/${commit_id}
fi
wget -O bos_tools.py -q --no-proxy --no-check-certificate https://paddle-qa.bj.bcebos.com/CodeSync/develop/PaddlePaddle/PaddleTest/tools/bos_tools.py
push_file=$(realpath bos_tools.py)
python -m pip install bce-python-sdk==0.9.29
ls
python ${push_file} FastDeploy.tar.gz ${target_path}
target_path_stripped="${target_path#paddle-github-action/}"
REPO_ARCHIVE_URL=https://paddle-github-action.bj.bcebos.com/${target_path_stripped}/FastDeploy.tar.gz
echo "repo_archive_url=${REPO_ARCHIVE_URL}" >> $GITHUB_OUTPUT

View File

@@ -0,0 +1,184 @@
name: Run FastDeploy LogProb Tests
description: "Run FastDeploy LogProb Tests"
on:
workflow_call:
inputs:
DOCKER_IMAGE:
description: "Build Images"
required: true
type: string
default: "ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:cuda126-py310"
PADDLETEST_ARCHIVE_URL:
description: "URL of the compressed FastDeploy code archive."
required: true
type: string
default: "https://xly-devops.bj.bcebos.com/PaddleTest/PaddleTest.tar.gz"
FASTDEPLOY_WHEEL_URL:
description: "URL of the FastDeploy Wheel."
required: true
type: string
CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
MODEL_CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
jobs:
run_tests_logprob:
runs-on: [self-hosted, GPU-h20-1Cards]
steps:
- name: Code Prepare
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
paddletest_archive_url: ${{ inputs.PADDLETEST_ARCHIVE_URL }}
run: |
# Clean the repository directory before starting
docker run --rm --net=host -v $(pwd):/workspace -w /workspace \
-e "REPO_NAME=${REPO_NAME}" \
-e "BASE_BRANCH=${BASE_BRANCH}" \
${docker_image} /bin/bash -c '
rm -rf /workspace/*
'
wget -q ${paddletest_archive_url}
tar -xf PaddleTest.tar.gz
rm -rf PaddleTest.tar.gz
cd PaddleTest
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
git log -n 3 --oneline
- name: logprob test
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fastdeploy_wheel_url: ${{ inputs.FASTDEPLOY_WHEEL_URL }}
CACHE_DIR: ${{ inputs.CACHE_DIR }}
MODEL_CACHE_DIR: ${{ inputs.MODEL_CACHE_DIR }}
run: |
runner_name="${{ runner.name }}"
CARD_ID=$(echo "${runner_name}" | awk -F'-' '{print $NF}')
DEVICES=$(echo "$CARD_ID" | fold -w1 | paste -sd,)
DEVICE_PORT=$(echo "$DEVICES" | cut -d',' -f1)
FLASK_PORT=$((42068 + DEVICE_PORT * 100))
FD_API_PORT=$((42088 + DEVICE_PORT * 100))
FD_ENGINE_QUEUE_PORT=$((42058 + DEVICE_PORT * 100))
FD_METRICS_PORT=$((42078 + DEVICE_PORT * 100))
FD_CACHE_QUEUE_PORT=$((42098 + DEVICE_PORT * 100))
echo "Test ENV Parameter:"
echo "========================================================="
echo "FLASK_PORT=${FLASK_PORT}"
echo "FD_API_PORT=${FD_API_PORT}"
echo "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}"
echo "FD_METRICS_PORT=${FD_METRICS_PORT}"
echo "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}"
echo "DEVICES=${DEVICES}"
echo "========================================================="
CACHE_DIR="${CACHE_DIR:-$(dirname "$(dirname "${{ github.workspace }}")")}"
echo "CACHE_DIR is set to ${CACHE_DIR}"
if [ ! -f "${CACHE_DIR}/gitconfig" ]; then
touch "${CACHE_DIR}/gitconfig"
fi
if [ ! -d "${MODEL_CACHE_DIR}" ]; then
echo "Error: MODEL_CACHE_DIR '${MODEL_CACHE_DIR}' does not exist."
exit 1
fi
PORTS=($FLASK_PORT $FD_API_PORT $FD_ENGINE_QUEUE_PORT $FD_METRICS_PORT $FD_CACHE_QUEUE_PORT)
LOG_FILE="./port_cleanup_$(date +%Y%m%d_%H%M%S).log"
echo "==== LOG_FILE is ${LOG_FILE} ===="
echo "==== PORT CLEAN BEFORE TASK RUN ====" | tee -a $LOG_FILE
for port in "${PORTS[@]}"; do
PIDS=$(lsof -t -i :$port || true)
if [ -n "$PIDS" ]; then
echo "Port $port is occupied by PID(s): $PIDS" | tee -a $LOG_FILE
echo "$PIDS" | xargs -r kill -9
echo "Port $port cleared" | tee -a $LOG_FILE
else
echo "Port $port is free" | tee -a $LOG_FILE
fi
done
echo "==== PORT CLEAN COMPLETE ====" | tee -a $LOG_FILE
echo "========================================================="
echo "Ensuring no stale container named ${runner_name} ..."
if [ "$(docker ps -a -q -f name=${runner_name})" ]; then
echo "Removing stale container: ${runner_name}"
docker rm -f ${runner_name} || true
fi
docker run --rm --ipc=host --pid=host --net=host \
--name ${runner_name} \
-v $(pwd):/workspace \
-w /workspace \
-e fastdeploy_wheel_url=${fastdeploy_wheel_url} \
-e "FD_API_PORT=${FD_API_PORT}" \
-e "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}" \
-e "FD_METRICS_PORT=${FD_METRICS_PORT}" \
-e "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}" \
-e "FLASK_PORT=${FLASK_PORT}" \
-v "${MODEL_CACHE_DIR}:/MODELDATA" \
-v "${CACHE_DIR}/gitconfig:/etc/gitconfig:ro" \
-v "${CACHE_DIR}/.cache:/root/.cache" \
-v "${CACHE_DIR}/ConfigDir:/root/.config" \
-e TZ="Asia/Shanghai" \
--gpus '"device='"${DEVICES}"'"' ${docker_image} /bin/bash -xc '
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
python -m pip install ${fastdeploy_wheel_url}
wget https://paddle-qa.bj.bcebos.com/zhengtianyu/tools/llm-deploy-linux-amd64
chmod +x ./llm-deploy-linux-amd64
./llm-deploy-linux-amd64 -python python3.10 \
-model_name ERNIE-4.5-0.3B-Paddle \
-model_path /MODELDATA \
--skip install
cd PaddleTest/framework/ServeTest
python3.10 deploy.py > dd.log 2>&1 &
sleep 3
curl -X POST http://0.0.0.0:${FLASK_PORT}/start \
-H "Content-Type: application/json" \
-d "{\"--model\": \"/MODELDATA/ERNIE-4.5-0.3B-Paddle\"}"
curl -X POST http://localhost:${FLASK_PORT}/wait_for_infer?timeout=90
curl -s -o /dev/null -w "%{http_code}" -m 2 "http://0.0.0.0:${FD_API_PORT}/health"
curl -X POST "http://0.0.0.0:${FD_API_PORT}/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"messages\": [{\"role\": \"user\", \"content\": \"1+1=?\"}], \"logprobs\": true}"
set +e
rm -rf ./baseline_output
cp -r baseline/ERNIE-4.5-0.3B-Paddle ./baseline_output
LOGPROB_EXIT_CODE=0
python3.10 lanucher.py --request_template TOKEN_LOGPROB --url http://localhost:${FD_API_PORT}/v1/chat/completions --case ./cases/demo.yaml --concurrency 1 --name demo --exe logprob || LOGPROB_EXIT_CODE=$?
echo "LOGPROB_EXIT_CODE=${LOGPROB_EXIT_CODE}" > /workspace/exit_code.env
curl -X POST http://localhost:${FLASK_PORT}/stop
sleep 10s
cat *result.log
exit 0
'
if [ $? -ne 0 ];then
exit 1
fi
if [ -f exit_code.env ]; then
cat exit_code.env >> $GITHUB_ENV
fi
- name: logprob test result
if: ${{ env.LOGPROB_EXIT_CODE != 0 }}
shell: bash
run: |
echo "logprob test failed with exit code ${{ env.LOGPROB_EXIT_CODE }}"
exit 8

148
.github/workflows/_pre_ce_test.yml vendored Normal file
View File

@@ -0,0 +1,148 @@
name: Pre-CE-Test
on:
workflow_call:
inputs:
DOCKER_IMAGE:
description: "Build Images"
required: true
type: string
default: "ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:fastdeploy-ciuse-cuda126"
FASTDEPLOY_ARCHIVE_URL:
description: "URL of the compressed FastDeploy code archive."
required: true
type: string
FASTDEPLOY_WHEEL_URL:
description: "URL of the FastDeploy Wheel."
required: true
type: string
CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
MODEL_CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
jobs:
run_ce_cases:
runs-on: [self-hosted, PRE_CE_RUN_2Card]
timeout-minutes: 60
steps:
- name: Print current runner name
run: |
echo "Current runner name: ${{ runner.name }}"
- name: Code Prepare
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fd_archive_url: ${{ inputs.FASTDEPLOY_ARCHIVE_URL }}
run: |
set -x
REPO="https://github.com/${{ github.repository }}.git"
FULL_REPO="${{ github.repository }}"
REPO_NAME="${FULL_REPO##*/}"
BASE_BRANCH="${{ github.base_ref }}"
# Clean the repository directory before starting
docker run --rm --net=host -v $(pwd):/workspace -w /workspace \
-e "REPO_NAME=${REPO_NAME}" \
${docker_image} /bin/bash -c '
if [ -d ${REPO_NAME} ]; then
echo "Directory ${REPO_NAME} exists, removing it..."
rm -rf ${REPO_NAME}*
fi
'
wget -q ${fd_archive_url}
tar -xf FastDeploy.tar.gz
rm -rf FastDeploy.tar.gz
cd FastDeploy
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
git log -n 3 --oneline
- name: Run CI unittest
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fd_wheel_url: ${{ inputs.FASTDEPLOY_WHEEL_URL }}
CACHE_DIR: ${{ inputs.CACHE_DIR }}
MODEL_CACHE_DIR: ${{ inputs.MODEL_CACHE_DIR }}
run: |
runner_name="${{ runner.name }}"
CARD_ID=$(echo "${runner_name}" | awk -F'-' '{print $NF}')
DEVICES=$(echo "$CARD_ID" | fold -w1 | paste -sd,)
DEVICE_PORT=$(echo "$DEVICES" | cut -d',' -f1)
FLASK_PORT=$((42068 + DEVICE_PORT * 100))
FD_API_PORT=$((42088 + DEVICE_PORT * 100))
FD_ENGINE_QUEUE_PORT=$((42058 + DEVICE_PORT * 100))
FD_METRICS_PORT=$((42078 + DEVICE_PORT * 100))
FD_CACHE_QUEUE_PORT=$((42098 + DEVICE_PORT * 100))
echo "Test ENV Parameter:"
echo "========================================================="
echo "FLASK_PORT=${FLASK_PORT}"
echo "FD_API_PORT=${FD_API_PORT}"
echo "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}"
echo "FD_METRICS_PORT=${FD_METRICS_PORT}"
echo "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}"
echo "DEVICES=${DEVICES}"
echo "========================================================="
CACHE_DIR="${CACHE_DIR:-$(dirname "$(dirname "${{ github.workspace }}")")}"
echo "CACHE_DIR is set to ${CACHE_DIR}"
if [ ! -f "${CACHE_DIR}/gitconfig" ]; then
touch "${CACHE_DIR}/gitconfig"
fi
PORTS=($FLASK_PORT $FD_API_PORT $FD_ENGINE_QUEUE_PORT $FD_METRICS_PORT $FD_CACHE_QUEUE_PORT)
LOG_FILE="./port_cleanup_$(date +%Y%m%d_%H%M%S).log"
echo "==== LOG_FILE is ${LOG_FILE} ===="
echo "==== PORT CLEAN BEFORE TASK RUN ====" | tee -a $LOG_FILE
for port in "${PORTS[@]}"; do
PIDS=$(lsof -t -i :$port || true)
if [ -n "$PIDS" ]; then
echo "Port $port is occupied by PID(s): $PIDS" | tee -a $LOG_FILE
echo "$PIDS" | xargs -r kill -9
echo "Port $port cleared" | tee -a $LOG_FILE
else
echo "Port $port is free" | tee -a $LOG_FILE
fi
done
echo "==== PORT CLEAN COMPLETE ====" | tee -a $LOG_FILE
echo "========================================================="
echo "Ensuring no stale container named ${runner_name} ..."
if [ "$(docker ps -a -q -f name=${runner_name})" ]; then
echo "Removing stale container: ${runner_name}"
docker rm -f ${runner_name} || true
fi
docker run --rm --net=host \
--name ${runner_name} \
-v $(pwd):/workspace \
-w /workspace \
-v "${CACHE_DIR}/gitconfig:/etc/gitconfig:ro" \
-v "${CACHE_DIR}/.cache:/root/.cache" \
-v "${CACHE_DIR}/ConfigDir:/root/.config" \
-v "${MODEL_CACHE_DIR}:/ModelData:ro" \
-e "MODEL_PATH=/ModelData" \
-e "FD_API_PORT=${FD_API_PORT}" \
-e "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}" \
-e "FD_METRICS_PORT=${FD_METRICS_PORT}" \
-e "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}" \
-e "FLASK_PORT=${FLASK_PORT}" \
-e "fd_wheel_url=${fd_wheel_url}" \
--gpus "\"device=${DEVICES}\"" ${docker_image} /bin/bash -c '
git config --global --add safe.directory /workspace/FastDeploy
cd FastDeploy
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
python -m pip install ${fd_wheel_url}
bash scripts/run_pre_ce.sh
'

170
.github/workflows/_stable_test.yml vendored Normal file
View File

@@ -0,0 +1,170 @@
name: Stable Test
description: "Run Stable Tests"
on:
workflow_call:
inputs:
DOCKER_IMAGE:
description: "Build Images"
required: true
type: string
default: "ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:cuda126-py310"
FASTDEPLOY_ARCHIVE_URL:
description: "URL of the compressed FastDeploy code archive."
required: true
type: string
FASTDEPLOY_WHEEL_URL:
description: "URL of the FastDeploy Wheel."
required: true
type: string
CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
MODEL_CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
jobs:
stable_tests:
runs-on: [self-hosted, GPU-h1z1-2Cards]
timeout-minutes: 60
steps:
- name: Code Prepare
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fd_archive_url: ${{ inputs.FASTDEPLOY_ARCHIVE_URL }}
run: |
set -x
REPO="https://github.com/${{ github.repository }}.git"
FULL_REPO="${{ github.repository }}"
REPO_NAME="${FULL_REPO##*/}"
BASE_BRANCH="${{ github.base_ref }}"
# Clean the repository directory before starting
docker run --rm --net=host -v $(pwd):/workspace -w /workspace \
-e "REPO_NAME=${REPO_NAME}" \
${docker_image} /bin/bash -c '
if [ -d ${REPO_NAME} ]; then
echo "Directory ${REPO_NAME} exists, removing it..."
rm -rf ${REPO_NAME}*
fi
'
wget -q ${fd_archive_url}
tar -xf FastDeploy.tar.gz
rm -rf FastDeploy.tar.gz
cd FastDeploy
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
git log -n 3 --oneline
- name: Run FastDeploy Stable Tests
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fastdeploy_wheel_url: ${{ inputs.FASTDEPLOY_WHEEL_URL }}
CACHE_DIR: ${{ inputs.CACHE_DIR }}
MODEL_CACHE_DIR: ${{ inputs.MODEL_CACHE_DIR }}
run: |
runner_name="${{ runner.name }}"
CARD_ID=$(echo "${runner_name}" | awk -F'-' '{print $NF}')
DEVICES=$(echo "$CARD_ID" | fold -w1 | paste -sd,)
DEVICE_PORT=$(echo "$DEVICES" | cut -d',' -f1)
FLASK_PORT=$((42068 + DEVICE_PORT * 100))
FD_API_PORT=$((42088 + DEVICE_PORT * 100))
FD_ENGINE_QUEUE_PORT=$((42058 + DEVICE_PORT * 100))
FD_METRICS_PORT=$((42078 + DEVICE_PORT * 100))
FD_CACHE_QUEUE_PORT=$((42038 + DEVICE_PORT * 100))
FD_INFERENCE_MSG_QUEUE_ID=$(( 42048 + DEVICE_PORT * 100))
echo "Test ENV Parameter:"
echo "========================================================="
echo "FLASK_PORT=${FLASK_PORT}"
echo "FD_API_PORT=${FD_API_PORT}"
echo "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}"
echo "FD_METRICS_PORT=${FD_METRICS_PORT}"
echo "FD_INFERENCE_MSG_QUEUE_ID=${FD_INFERENCE_MSG_QUEUE_ID}"
echo "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}"
echo "DEVICES=${DEVICES}"
echo "========================================================="
CACHE_DIR="${CACHE_DIR:-$(dirname "$(dirname "${{ github.workspace }}")")}"
echo "CACHE_DIR is set to ${CACHE_DIR}"
if [ ! -f "${CACHE_DIR}/gitconfig" ]; then
touch "${CACHE_DIR}/gitconfig"
fi
if [ ! -d "${MODEL_CACHE_DIR}" ]; then
echo "Error: MODEL_CACHE_DIR '${MODEL_CACHE_DIR}' does not exist."
exit 1
fi
PORTS=($FLASK_PORT $FD_API_PORT $FD_ENGINE_QUEUE_PORT $FD_METRICS_PORT)
LOG_FILE="./port_cleanup_$(date +%Y%m%d_%H%M%S).log"
echo "==== LOG_FILE is ${LOG_FILE} ===="
echo "==== PORT CLEAN BEFORE TASK RUN ====" | tee -a $LOG_FILE
for port in "${PORTS[@]}"; do
PIDS=$(lsof -t -i :$port || true)
if [ -n "$PIDS" ]; then
echo "Port $port is occupied by PID(s): $PIDS" | tee -a $LOG_FILE
echo "$PIDS" | xargs -r kill -9
echo "Port $port cleared" | tee -a $LOG_FILE
else
echo "Port $port is free" | tee -a $LOG_FILE
fi
done
echo "==== PORT CLEAN COMPLETE ====" | tee -a $LOG_FILE
echo "========================================================="
echo "Ensuring no stale container named ${runner_name} ..."
if [ "$(docker ps -a -q -f name=${runner_name})" ]; then
echo "Removing stale container: ${runner_name}"
docker rm -f ${runner_name} || true
fi
docker run --rm --ipc=host --pid=host --net=host \
--name ${runner_name} \
-v $(pwd):/workspace \
-w /workspace \
-e fastdeploy_wheel_url=${fastdeploy_wheel_url} \
-e "FD_API_PORT=${FD_API_PORT}" \
-e "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}" \
-e "FD_METRICS_PORT=${FD_METRICS_PORT}" \
-e "FLASK_PORT=${FLASK_PORT}" \
-e "FD_INFERENCE_MSG_QUEUE_ID=${FD_INFERENCE_MSG_QUEUE_ID}" \
-e "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}" \
-v "${MODEL_CACHE_DIR}:/MODELDATA" \
-v "${CACHE_DIR}/gitconfig:/etc/gitconfig:ro" \
-v "${CACHE_DIR}/.cache:/root/.cache" \
-v "${CACHE_DIR}/ConfigDir:/root/.config" \
-e TZ="Asia/Shanghai" \
--gpus '"device='"${DEVICES}"'"' ${docker_image} /bin/bash -xc '
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
python -m pip install ${fastdeploy_wheel_url}
python -m pip install pytest
git config --global --add safe.directory /workspace/FastDeploy
cd FastDeploy
TEST_EXIT_CODE=0
pushd tests/ce/stable_cases
bash launch_model.sh /MODELDATA
bash run.sh || TEST_EXIT_CODE=1
popd
echo "TEST_EXIT_CODE=${TEST_EXIT_CODE}" >> /workspace/FastDeploy/exit_code.env
'
if [ -f ./FastDeploy/exit_code.env ]; then
source ./FastDeploy/exit_code.env
cat ./FastDeploy/exit_code.env >> $GITHUB_ENV
fi
echo "TEST_EXIT_CODE=${TEST_EXIT_CODE}"
exit ${TEST_EXIT_CODE}

View File

@@ -0,0 +1,322 @@
name: Coverage Check
description: "Run FastDeploy Unit Tests and Coverage"
on:
workflow_call:
inputs:
DOCKER_IMAGE:
description: "Build Images"
required: true
type: string
default: "ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:cuda126-py310"
FASTDEPLOY_ARCHIVE_URL:
description: "URL of the compressed FastDeploy code archive."
required: true
type: string
FASTDEPLOY_WHEEL_URL:
description: "URL of the FastDeploy Wheel."
required: true
type: string
CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
MODEL_CACHE_DIR:
description: "Cache Dir Use"
required: false
type: string
default: ""
secrets:
github-token:
required: true
jobs:
check_cov_skip:
uses: ./.github/workflows/check-bypass.yml
secrets:
github-token: ${{ secrets.github-token }}
with:
workflow-name: coverage
run_tests_with_coverage:
runs-on: [self-hosted, GPU-h1z1-2Cards]
timeout-minutes: 60
needs: check_cov_skip
if: needs.check_cov_skip.outputs.can-skip != 'true'
outputs:
diff_cov_file_url: ${{ steps.cov_upload.outputs.diff_cov_file_url }}
unittest_failed_url: ${{ steps.cov_upload.outputs.unittest_failed_url }}
diff_cov_result_json_url: ${{ steps.cov_upload.outputs.diff_cov_result_json_url }}
steps:
- name: Code Prepare
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fd_archive_url: ${{ inputs.FASTDEPLOY_ARCHIVE_URL }}
run: |
set -x
REPO="https://github.com/${{ github.repository }}.git"
FULL_REPO="${{ github.repository }}"
REPO_NAME="${FULL_REPO##*/}"
BASE_BRANCH="${{ github.base_ref }}"
# Clean the repository directory before starting
docker run --rm --net=host -v $(pwd):/workspace -w /workspace \
-e "REPO_NAME=${REPO_NAME}" \
${docker_image} /bin/bash -c '
if [ -d ${REPO_NAME} ]; then
echo "Directory ${REPO_NAME} exists, removing it..."
rm -rf ${REPO_NAME}*
fi
'
wget -q ${fd_archive_url}
tar -xf FastDeploy.tar.gz
rm -rf FastDeploy.tar.gz
cd FastDeploy
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
git log -n 3 --oneline
- name: Run FastDeploy Unit Tests and Coverage
shell: bash
env:
docker_image: ${{ inputs.DOCKER_IMAGE }}
fd_wheel_url: ${{ inputs.FASTDEPLOY_WHEEL_URL }}
CACHE_DIR: ${{ inputs.CACHE_DIR }}
BASE_REF: ${{ github.event.pull_request.base.ref }}
MODEL_CACHE_DIR: ${{ inputs.MODEL_CACHE_DIR }}
IS_PR: ${{ github.event_name == 'pull_request' }}
run: |
if [[ "$IS_PR" == "true" ]]; then
echo "Running on PR"
else
echo "Not a PR"
fi
runner_name="${{ runner.name }}"
CARD_ID=$(echo "${runner_name}" | awk -F'-' '{print $NF}')
DEVICES=$(echo "$CARD_ID" | fold -w1 | paste -sd,)
DEVICE_PORT=$(echo "$DEVICES" | cut -d',' -f1)
FLASK_PORT=$((42068 + DEVICE_PORT * 100))
FD_API_PORT=$((42088 + DEVICE_PORT * 100))
FD_ENGINE_QUEUE_PORT=$((42058 + DEVICE_PORT * 100))
FD_METRICS_PORT=$((42078 + DEVICE_PORT * 100))
FD_CACHE_QUEUE_PORT=$((42098 + DEVICE_PORT * 100))
echo "Test ENV Parameter:"
echo "========================================================="
echo "FLASK_PORT=${FLASK_PORT}"
echo "FD_API_PORT=${FD_API_PORT}"
echo "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}"
echo "FD_METRICS_PORT=${FD_METRICS_PORT}"
echo "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}"
echo "DEVICES=${DEVICES}"
echo "========================================================="
CACHE_DIR="${CACHE_DIR:-$(dirname "$(dirname "${{ github.workspace }}")")}"
echo "CACHE_DIR is set to ${CACHE_DIR}"
if [ ! -f "${CACHE_DIR}/gitconfig" ]; then
touch "${CACHE_DIR}/gitconfig"
fi
PORTS=($FLASK_PORT $FD_API_PORT $FD_ENGINE_QUEUE_PORT $FD_METRICS_PORT $FD_CACHE_QUEUE_PORT)
LOG_FILE="./port_cleanup_$(date +%Y%m%d_%H%M%S).log"
echo "==== LOG_FILE is ${LOG_FILE} ===="
echo "==== PORT CLEAN BEFORE TASK RUN ====" | tee -a $LOG_FILE
for port in "${PORTS[@]}"; do
PIDS=$(lsof -t -i :$port || true)
if [ -n "$PIDS" ]; then
echo "Port $port is occupied by PID(s): $PIDS" | tee -a $LOG_FILE
echo "$PIDS" | xargs -r kill -9
echo "Port $port cleared" | tee -a $LOG_FILE
else
echo "Port $port is free" | tee -a $LOG_FILE
fi
done
echo "==== PORT CLEAN COMPLETE ====" | tee -a $LOG_FILE
echo "========================================================="
echo "Ensuring no stale container named ${runner_name} ..."
if [ "$(docker ps -a -q -f name=${runner_name})" ]; then
echo "Removing stale container: ${runner_name}"
docker rm -f ${runner_name} || true
fi
docker run --rm --net=host \
--name ${runner_name} \
--cap-add=SYS_PTRACE --shm-size=64G \
-v $(pwd):/workspace -w /workspace \
-v "${CACHE_DIR}/gitconfig:/etc/gitconfig:ro" \
-v "${CACHE_DIR}/.cache:/root/.cache" \
-v "${CACHE_DIR}/ConfigDir:/root/.config" \
-v "${MODEL_CACHE_DIR}:/ModelData:ro" \
-e "MODEL_PATH=/ModelData" \
-e "FD_API_PORT=${FD_API_PORT}" \
-e "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}" \
-e "FD_METRICS_PORT=${FD_METRICS_PORT}" \
-e "FLASK_PORT=${FLASK_PORT}" \
-e "FD_CACHE_QUEUE_PORT=${FD_CACHE_QUEUE_PORT}" \
-e TZ="Asia/Shanghai" \
-e "fd_wheel_url=${fd_wheel_url}" \
-e "BASE_REF=${BASE_REF}" \
-e "IS_PR=${IS_PR}" \
--gpus "\"device=${DEVICES}\"" ${docker_image} /bin/bash -c '
git config --global --add safe.directory /workspace/FastDeploy
cd FastDeploy
git diff origin/${BASE_REF}..HEAD --unified=0 > diff.txt
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
pip config set global.extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
python -m pip install coverage
python -m pip install diff-cover
python -m pip install pytest-cov
python -m pip install jsonschema aistudio_sdk==0.3.5
python -m pip install ${fd_wheel_url}
rm -rf fastdeploy
# coverage subprocess use
python -m pip install ${fd_wheel_url} --no-deps --target=/workspace/FastDeploy
export PYTHONPATH=/workspace/FastDeploy/
if [ -d "tests/plugins" ]; then
cd tests/plugins
python setup.py install
cd ../..
else
echo "Warning: tests/plugins directory not found, skipping setup.py install"
fi
export COVERAGE_FILE=/workspace/FastDeploy/coveragedata/.coverage
export COVERAGE_RCFILE=/workspace/FastDeploy/scripts/.coveragerc
TEST_EXIT_CODE=0
bash scripts/coverage_run.sh || TEST_EXIT_CODE=8
echo "TEST_EXIT_CODE=${TEST_EXIT_CODE}" >> exit_code.env
coverage combine coveragedata/ || echo "No data to combine"
coverage report
coverage xml -o python_coverage_all.xml
COVERAGE_EXIT_CODE=0
if [[ "$IS_PR" == "true" ]]; then
diff-cover python_coverage_all.xml --diff-file=diff.txt --fail-under=80 --json-report diff_coverage.json || COVERAGE_EXIT_CODE=9
python scripts/generate_diff_coverage_xml.py diff.txt python_coverage_all.xml
else
echo "Not a PR, skipping diff-cover"
fi
echo "COVERAGE_EXIT_CODE=${COVERAGE_EXIT_CODE}" >> exit_code.env
'
if [ -f FastDeploy/exit_code.env ]; then
cat FastDeploy/exit_code.env >> $GITHUB_ENV
fi
- name: Upload unit resule and diff coverage to bos
id: cov_upload
shell: bash
run: |
cd FastDeploy
commit_id=${{ github.event.pull_request.head.sha }}
pr_num=${{ github.event.pull_request.number }}
target_path=paddle-github-action/PR/FastDeploy/${pr_num}/${commit_id}/SM${compile_arch//,/_}
wget -q --no-proxy --no-check-certificate https://paddle-qa.bj.bcebos.com/CodeSync/develop/PaddlePaddle/PaddleTest/tools/bos_tools.py -O bos_tools.py
push_file=$(realpath bos_tools.py)
python -m pip install bce-python-sdk==0.9.29
diff_cov_file="diff_coverage.xml"
if [ -f ${diff_cov_file} ];then
python ${push_file} ${diff_cov_file} ${target_path}/CoverageData
target_path_stripped="${target_path#paddle-github-action/}"
DIFF_COV_FILE_URL=https://paddle-github-action.bj.bcebos.com/${target_path_stripped}/CoverageData/${diff_cov_file}
echo "diff_cov_file_url=${DIFF_COV_FILE_URL}" >> $GITHUB_OUTPUT
echo "diff_cov_file_url=${DIFF_COV_FILE_URL}" >> $GITHUB_ENV
fi
diff_cov_result_json="diff_coverage.json"
if [ -f ${diff_cov_result_json} ];then
python ${push_file} ${diff_cov_result_json} ${target_path}/CoverageData
target_path_stripped="${target_path#paddle-github-action/}"
DIFF_COV_JSON_URL=https://paddle-github-action.bj.bcebos.com/${target_path_stripped}/CoverageData/${diff_cov_result_json}
echo "diff_cov_result_json_url=${DIFF_COV_JSON_URL}" >> $GITHUB_OUTPUT
echo "diff_cov_result_json_url=${DIFF_COV_JSON_URL}" >> $GITHUB_ENV
fi
unittest_result="failed_tests.log"
if [ -s ${unittest_result} ];then
python ${push_file} ${unittest_result} ${target_path}/UnitTestResult
target_path_stripped="${target_path#paddle-github-action/}"
UNIT_TEST_RESULT_URL=https://paddle-github-action.bj.bcebos.com/${target_path_stripped}/UnitTestResult/${unittest_result}
echo "unittest_failed_url=${UNIT_TEST_RESULT_URL}" >> $GITHUB_OUTPUT
echo "unittest_failed_url=${UNIT_TEST_RESULT_URL}" >> $GITHUB_ENV
fi
- name: Check Unit Test Success
shell: bash
run: |
cd FastDeploy
if [ "$TEST_EXIT_CODE" -eq 8 ]; then
filename=$(basename "$unittest_failed_url")
if [ -z "${unittest_failed_url}" ]; then
echo "No diff unit failed file URL provided."
else
rm -rf "${filename}"
wget -O ${filename} ${unittest_failed_url} || echo "Download unittest file failed, but continuing..."
fi
echo "Unit tests failed (exit code 8)"
if [ -f "${filename}" ];then
echo "Failed test cases:"
cat "${filename}"
fi
exit "$TEST_EXIT_CODE"
fi
echo "All tests passed"
- name: Verify Code Coverage Threshold (80%)
if: ${{ github.event_name == 'pull_request' }}
shell: bash
run: |
cd FastDeploy
if [ "$COVERAGE_EXIT_CODE" -eq 9 ]; then
echo "Coverage generation failed (exit code 9)"
filename=$(basename "$diff_cov_result_json_url")
if [ -z "${diff_cov_result_json_url}" ]; then
echo "No diff cov result file URL provided."
else
rm -rf "${filename}"
wget -O ${filename} ${diff_cov_result_json_url} || echo "Download cov json file failed, but continuing..."
fi
if [ -f "${filename}" ];then
echo "Failed test cases:"
if command -v jq >/dev/null 2>&1; then
jq . "${filename}"
else
cat "${filename}"
fi
fi
exit "$COVERAGE_EXIT_CODE"
fi
echo "coverage passed"
exit 0
diff_coverage_report:
needs: run_tests_with_coverage
if: always()
runs-on: ubuntu-latest
env:
fd_archive_url: ${{ inputs.FASTDEPLOY_ARCHIVE_URL }}
steps:
- name: coverage diff file download
shell: bash
env:
diff_cov_file_url: ${{ needs.run_tests_with_coverage.outputs.diff_cov_file_url }}
run: |
wget ${fd_archive_url}
tar -xf FastDeploy.tar.gz
cd FastDeploy
if [ -z "${diff_cov_file_url}" ]; then
echo "No diff coverage file URL provided."
exit 0
fi
wget "${diff_cov_file_url}" -O ./diff_coverage.xml || echo "Download cov file failed, but continuing..."
- name: Upload diff coverage report
if: ${{ needs.run_tests_with_coverage.outputs.diff_cov_file_url != null && needs.run_tests_with_coverage.outputs.diff_cov_file_url != '' }}
uses: codecov/codecov-action@v5
with:
files: ./FastDeploy/diff_coverage.xml
name: python diff coverage
verbose: true
disable_search: true
commit_parent: false
flags: diff

42
.github/workflows/approve.yml vendored Normal file
View File

@@ -0,0 +1,42 @@
name: Approval
on:
pull_request:
branches:
- develop
- 'release/*'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
jobs:
Approval:
name: Approval
if: ${{ github.repository_owner == 'PaddlePaddle' }}
runs-on: ubuntu-latest
env:
PR_ID: ${{ github.event.pull_request.number }}
BRANCH: ${{ github.event.pull_request.base.ref }}
steps:
- name: Checkout base repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.base.ref }}
fetch-depth: 1000
- name: Merge PR to test branch
run: |
git fetch origin pull/${PR_ID}/merge
git checkout -b test FETCH_HEAD
git log -n 3 --oneline
git remote add upstream https://github.com/PaddlePaddle/FastDeploy.git
git fetch upstream $BRANCH
- name: Setup python3.10
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Run approval check script
run: |
bash scripts/check_approval.sh

View File

@@ -1,31 +0,0 @@
name: Build
on: [pull_request]
jobs:
macOS-latest-py:
runs-on: macos-latest
steps:
- name: Clone
uses: actions/checkout@v1
- name: Get CMake
uses: lukka/get-cmake@latest
- name: Get Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Build FastDeploy
working-directory: ./python
run: |
export ENABLE_ORT_BACKEND=ON
export ENABLE_PADDLE_BACKEND=OFF
export ENABLE_OPENVINO_BACKEND=OFF
export ENABLE_VISION=ON
export ENABLE_TEXT=ON
python -m pip install wheel
python setup.py build
python setup.py bdist_wheel
ls -l

248
.github/workflows/ce_job.yml vendored Normal file
View File

@@ -0,0 +1,248 @@
name: CE Compile Job
on:
workflow_dispatch:
push:
branches:
- develop
- 'release/*'
permissions: read-all
concurrency:
group: ${{ github.ref }}-${{ github.sha }}
cancel-in-progress: true
jobs:
ce_job_pre_check:
runs-on: ubuntu-latest
env:
COMPILE_BRANCH: ${{ vars.COMPILE_BRANCH }}
CE_COMPILE_SELECTION: ${{ vars.CE_COMPILE_SELECTION }}
COMPILE_USE_PADDLE_WHL_URL_MAPPINGS: ${{ vars.COMPILE_USE_PADDLE_WHL_URL_MAPPINGS }}
outputs:
branch_match: ${{ steps.set_output.outputs.branch_match }}
compile_use_paddle_whl_url: ${{ steps.set_output.outputs.compile_use_paddle_whl_url }}
sm8689_match: ${{ steps.set_output.outputs.sm8689_match }}
sm8090_match: ${{ steps.set_output.outputs.sm8090_match }}
steps:
- name: Set Version
id: set_output
env:
COMPILE_BRANCH: ${{ env.COMPILE_BRANCH }}
CE_COMPILE_SELECTION: ${{ env.CE_COMPILE_SELECTION }}
COMPILE_USE_PADDLE_WHL_URL_MAPPINGS: ${{ env.COMPILE_USE_PADDLE_WHL_URL_MAPPINGS }}
GITHUB_REF_NAME: ${{ github.ref_name }}
run: |
# 选择要触发编译任务的分支 done
# 选择指定分支要编译的任务 8090或者8689
# 指定分支编译要使用的Paddle的安装包,默认使用nightly最新的
IFS=',' read -ra BRANCHES <<< "$COMPILE_BRANCH"
MATCH=false
for b in "${BRANCHES[@]}"; do
if [[ "$b" == "${GITHUB_REF_NAME}" ]]; then
MATCH=true
break
fi
done
echo "branch_match=$MATCH" >> $GITHUB_OUTPUT
# 通过变量CE_COMPILE_SELECTION中的映射关系,决定分支是编译sm8090还是sm8689
for pair in $(echo "$CE_COMPILE_SELECTION" | tr ';' ' '); do
branch=$(echo "$pair" | cut -d',' -f1)
compile_task_list=$(echo "$pair" | cut -d',' -f2)
if [[ "$branch" == "$GITHUB_REF_NAME" ]]; then
# 判断里面是否包含 sm8090 或 sm8689
if [[ "$compile_task_list" == *"sm8090"* ]]; then
echo "sm8090_match=true" >> $GITHUB_OUTPUT
fi
if [[ "$compile_task_list" == *"sm8689"* ]]; then
echo "sm8689_match=true" >> $GITHUB_OUTPUT
fi
break
fi
done
# 通过变量COMPILE_USE_PADDLE_WHL_URL_MAPPINGS中的映射关系,决定是否是安装指定版本的Paddle还是直接安装URL
for pair in $(echo $COMPILE_USE_PADDLE_WHL_URL_MAPPINGS | tr ';' ' '); do
branch=$(echo "$pair" | cut -d',' -f1)
paddle_whl_url=$(echo "$pair" | cut -d',' -f2)
if [[ "$branch" == "${{ github.ref_name }}" ]]; then
FOUND_PADDLE_URL="$paddle_whl_url"
echo "compile_use_paddle_whl_url=${FOUND_PADDLE_URL}" >> $GITHUB_OUTPUT
break
fi
done
print_ce_job_pre_check_outputs:
runs-on: ubuntu-latest
needs: ce_job_pre_check
steps:
- name: Print outputs as JSON
run: |
echo '${{ toJSON(needs.ce_job_pre_check.outputs) }}'
clone:
environment: CodeSync
name: FD-Clone-Linux
runs-on: ubuntu-latest
needs: ce_job_pre_check
if: ${{ needs.ce_job_pre_check.outputs.branch_match == 'true' }}
outputs:
repo_archive_url: ${{ steps.set_output.outputs.repo_archive_url }}
steps:
- name: Clone FastDeploy
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'pull_request'
&& github.event.pull_request.base.ref
|| github.ref_name }}
submodules: 'recursive'
fetch-depth: 1000
- name: Python Setup
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Code Info Show and Upload
id: set_output
env:
AK: ${{ secrets.BOS_AK }}
SK: ${{ secrets.BOS_SK }}
run: |
git config --unset http.https://github.com/.extraheader
git submodule foreach --recursive sh -c "git config --local --unset-all 'http.https://github.com/.extraheader'"
git submodule foreach --recursive sh -c "git config remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*'"
echo "Current HEAD Log:"
git log --oneline -n 5
ls
cd ..
tar -zcf FastDeploy.tar.gz FastDeploy
commit_id=${{ github.sha }}
branch_name=${{ github.ref_name }}
target_path=paddle-qa/BRANCH/FastDeploy/${branch_name}/${commit_id}
wget -q --no-proxy --no-check-certificate https://paddle-qa.bj.bcebos.com/CodeSync/develop/PaddlePaddle/PaddleTest/tools/bos_tools.py
push_file=$(realpath bos_tools.py)
python -m pip install bce-python-sdk==0.9.29
ls
python ${push_file} FastDeploy.tar.gz ${target_path}
target_path_stripped="${target_path#paddle-qa/}"
REPO_ARCHIVE_URL=https://paddle-qa.bj.bcebos.com/${target_path_stripped}/FastDeploy.tar.gz
echo "repo_archive_url=${REPO_ARCHIVE_URL}" >> $GITHUB_OUTPUT
resultshow:
name: Show Code Archive Output
needs: clone
runs-on: ubuntu-latest
steps:
- name: Print wheel path
run: |
echo "The code archive is located at: ${{ needs.clone.outputs.repo_archive_url }}"
build_sm8090:
name: BUILD_SM8090
needs: [clone, ce_job_pre_check]
if: ${{ needs.ce_job_pre_check.outputs.sm8090_match == 'true' }}
uses: ./.github/workflows/_build_linux.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
COMPILE_ARCH: "80,90"
WITH_NIGHTLY_BUILD: OFF
FD_VERSION: 0.0.0
PADDLE_WHL_URL: ${{ needs.ce_job_pre_check.outputs.compile_use_paddle_whl_url }}
build_sm8689:
name: BUILD_SM8689
needs: [clone, ce_job_pre_check]
if: ${{ needs.ce_job_pre_check.outputs.sm8689_match == 'true' }}
uses: ./.github/workflows/_build_linux.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
COMPILE_ARCH: "86,89"
WITH_NIGHTLY_BUILD: OFF
FD_VERSION: 0.0.0
PADDLE_WHL_URL: ${{ needs.ce_job_pre_check.outputs.compile_use_paddle_whl_url }}
ce_upload_sm8090:
environment: CodeSync
name: CE_UPLOAD
needs: build_sm8090
runs-on: ubuntu-latest
env:
AK: ${{ secrets.BOS_AK }}
SK: ${{ secrets.BOS_SK }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build_sm8090.outputs.wheel_path }}
COMPILE_ARCH: "80,90"
steps:
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Wheel Info Show and Upload
run: |
echo "The wheel is located at: ${{ needs.build_sm8090.outputs.wheel_path }}"
wget -q --no-check-certificate ${{ needs.build_sm8090.outputs.wheel_path }}
filename=$(basename ${{ needs.build_sm8090.outputs.wheel_path }})
commit_id=${{ github.sha }}
branch_name=${{ github.ref_name }}
target_path=paddle-qa/paddle-pipeline/FastDeploy_ActionCE/SM${COMPILE_ARCH//,/_}/${branch_name}/${commit_id}
wget -q --no-proxy --no-check-certificate https://paddle-qa.bj.bcebos.com/CodeSync/develop/PaddlePaddle/PaddleTest/tools/bos_tools.py
push_file=$(realpath bos_tools.py)
python -m pip install bce-python-sdk==0.9.29
ls
python ${push_file} ${filename} ${target_path}
target_path_stripped="${target_path#paddle-qa/}"
WHEEL_PATH=https://paddle-qa.bj.bcebos.com/${target_path_stripped}/${fd_wheel_name}
echo "commit wheel url is ${WHEEL_PATH}"
target_path_latest=paddle-qa/paddle-pipeline/FastDeploy_ActionCE/SM${COMPILE_ARCH//,/_}/${branch_name}/latest
python ${push_file} ${filename} ${target_path_latest}
target_path_stripped_latest="${target_path_latest#paddle-qa/}"
WHEEL_PATH_LATEST=https://paddle-qa.bj.bcebos.com/${target_path_stripped_latest}/${fd_wheel_name}
echo "latest wheel url is ${WHEEL_PATH_LATEST}"
ce_upload_sm8689:
environment: CodeSync
name: CE_UPLOAD
needs: build_sm8689
runs-on: ubuntu-latest
env:
AK: ${{ secrets.BOS_AK }}
SK: ${{ secrets.BOS_SK }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build_sm8689.outputs.wheel_path }}
COMPILE_ARCH: "86,89"
steps:
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Wheel Info Show and Upload
run: |
echo "The wheel is located at: ${{ needs.build_sm8090.outputs.wheel_path }}"
wget -q --no-check-certificate ${{ needs.build_sm8090.outputs.wheel_path }}
filename=$(basename ${{ needs.build_sm8090.outputs.wheel_path }})
commit_id=${{ github.sha }}
branch_name=${{ github.ref_name }}
target_path=paddle-qa/paddle-pipeline/FastDeploy_ActionCE/SM${COMPILE_ARCH//,/_}/${branch_name}/${commit_id}
wget -q --no-proxy --no-check-certificate https://paddle-qa.bj.bcebos.com/CodeSync/develop/PaddlePaddle/PaddleTest/tools/bos_tools.py
push_file=$(realpath bos_tools.py)
python -m pip install bce-python-sdk==0.9.29
ls
python ${push_file} ${filename} ${target_path}
target_path_stripped="${target_path#paddle-qa/}"
WHEEL_PATH=https://paddle-qa.bj.bcebos.com/${target_path_stripped}/${fd_wheel_name}
echo "commit wheel url is ${WHEEL_PATH}"
target_path_latest=paddle-qa/paddle-pipeline/FastDeploy_ActionCE/SM${COMPILE_ARCH//,/_}/${branch_name}/latest
python ${push_file} ${filename} ${target_path_latest}
target_path_stripped_latest="${target_path_latest#paddle-qa/}"
WHEEL_PATH_LATEST=https://paddle-qa.bj.bcebos.com/${target_path_stripped_latest}/${fd_wheel_name}
echo "latest wheel url is ${WHEEL_PATH_LATEST}"

51
.github/workflows/check-bypass.yml vendored Normal file
View File

@@ -0,0 +1,51 @@
on:
workflow_call:
inputs:
workflow-name:
required: true
type: string
secrets:
github-token:
required: true
outputs:
can-skip:
description: "Whether the workflow can be skipped."
value: ${{ jobs.check-bypass.outputs.can-skip }}
jobs:
check-bypass:
name: Check bypass
runs-on: ubuntu-latest
permissions:
contents: read
env:
CI_TEAM_MEMBERS: '["yuanlehome","YuanRisheng","Jiang-Jia-Jun","DDDivano","XieYunshen"]'
outputs:
can-skip: ${{ steps.check-bypass.outputs.can-skip }}
steps:
- name: Cleanup
run: |
rm -rf * .[^.]*
- id: check-bypass
name: Check Bypass
uses: PFCCLab/ci-bypass@v1
with:
github-token: ${{ secrets.github-token }}
non-pull-request-event-strategy: 'never-skipped'
type: 'composite'
composite-rule: |
{
"any": [
{
"type": "labeled",
"label": ["skip-ci: ${{ inputs.workflow-name }}", "skip-ci: all"],
"username": ${{ env.CI_TEAM_MEMBERS }}
},
{
"type": "commented",
"comment-pattern": [".*/skip-ci ${{ inputs.workflow-name }}.*", ".*/skip-ci all.*"],
"username": ${{ env.CI_TEAM_MEMBERS }}
}
]
}

98
.github/workflows/ci_gcu.yml vendored Normal file
View File

@@ -0,0 +1,98 @@
name: CI_GCU
on:
pull_request:
branches:
- develop
- 'release/*'
workflow_dispatch:
concurrency:
group: ${{ github.event.pull_request.number }}-gcu-ci
cancel-in-progress: true
jobs:
CI_GCU:
runs-on:
group: GCU
steps:
- name: Print current runner name
run: |
echo "Current runner name: ${{ runner.name }}"
- name: Code Checkout
env:
docker_image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:topsrider3.5.102-ubuntu20-x86_64-gcc84
run: |
REPO="https://github.com/${{ github.repository }}.git"
FULL_REPO="${{ github.repository }}"
REPO_NAME="${FULL_REPO##*/}"
BASE_BRANCH="${{ github.base_ref }}"
# Clean the repository directory before starting
docker run --rm --net=host -v $(pwd):/workspace \
-v ${{ github.workspace }}/../../..:${{ github.workspace }}/../../.. \
-w /workspace \
-e "REPO_NAME=${REPO_NAME}" \
-e "BASE_BRANCH=${BASE_BRANCH}" \
${docker_image} /bin/bash -c '
if [ -d ${REPO_NAME} ]; then
echo "Directory ${REPO_NAME} exists, removing it..."
rm -rf ${REPO_NAME}
fi
'
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
source ${{ github.workspace }}/../../../proxy
git clone ${REPO} ${REPO_NAME} -b ${BASE_BRANCH}
cd FastDeploy
if [ "${{ github.event_name }}" = "pull_request" ]; then
git fetch origin pull/${{ github.event.pull_request.number }}/head:pr/${{ github.event.pull_request.number }}
git merge pr/${{ github.event.pull_request.number }}
git log -n 3 --oneline
else
git checkout ${{ github.sha }}
git log -n 3 --oneline
fi
echo "Copy models..."
sudo mkdir -p ci_models && sudo cp -r /work/deps/ERNIE-4.5-21B-A3B-Paddle ci_models
echo "Copy models done."
- name: Run CI unittest
env:
docker_image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:topsrider3.5.102-ubuntu20-x86_64-gcc84
run: |
runner_name="${{ runner.name }}"
last_char="${runner_name: -1}"
if [[ "$last_char" =~ [0-3] ]]; then
gcu_id="$last_char"
else
gcu_id="0"
fi
FD_API_PORT=$((9180 + gcu_id * 100))
FD_ENGINE_QUEUE_PORT=$((9150 + gcu_id * 100))
FD_METRICS_PORT=$((9170 + gcu_id * 100))
PARENT_DIR=$(dirname "$WORKSPACE")
echo "PARENT_DIR:$PARENT_DIR"
echo "Install drivers..."
cd /work/deps
sudo bash TopsRider_i3x_*_deb_amd64.run --driver --no-auto-load -y
cd -
echo "Create docker..."
docker run --rm --network=host --ipc=host --privileged \
-v $(pwd):/workspace \
-v /home:/home \
-v /work:/work \
-w /workspace \
-e "MODEL_PATH=./ci_models" \
-e "http_proxy=$(git config --global --get http.proxy)" \
-e "https_proxy=$(git config --global --get https.proxy)" \
-e "FD_API_PORT=${FD_API_PORT}" \
-e "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}" \
-e "FD_METRICS_PORT=${FD_METRICS_PORT}" \
${docker_image} /bin/bash -c "
git config --global --add safe.directory /workspace/FastDeploy
cd FastDeploy
bash scripts/run_ci_gcu.sh
"

85
.github/workflows/ci_iluvatar.yml vendored Normal file
View File

@@ -0,0 +1,85 @@
name: CI_ILUVATAR
on:
pull_request:
branches: [ develop ]
workflow_dispatch:
concurrency:
group: ${{ github.event.pull_request.number }}-iluvatar-ci
cancel-in-progress: true
jobs:
CI_ILUVATAR:
runs-on:
group: IXUCA
steps:
- name: Print current runner name
run: |
echo "Current runner name: ${{ runner.name }}"
# Because the system version is lower than 2.23, the checkout cannot be used.
# - name: Checkout code
# uses: actions/checkout@v4
- name: Code Checkout
env:
docker_image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-ixuca:latest
run: |
REPO="https://github.com/${{ github.repository }}.git"
FULL_REPO="${{ github.repository }}"
REPO_NAME="${FULL_REPO##*/}"
# Clean the repository directory before starting
docker run --rm --net=host -v $(pwd):/workspace -w /workspace \
-e "REPO_NAME=${REPO_NAME}" \
${docker_image} /bin/bash -c '
if [ -d ${REPO_NAME} ]; then
echo "Directory ${REPO_NAME} exists, removing it..."
rm -rf ${REPO_NAME}
fi
'
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
git clone ${REPO} ${REPO_NAME}
cd FastDeploy
if [ "${{ github.event_name }}" = "pull_request" ]; then
git fetch origin pull/${{ github.event.pull_request.number }}/head:pr/${{ github.event.pull_request.number }}
git merge pr/${{ github.event.pull_request.number }}
git log -n 3 --oneline
else
git checkout ${{ github.sha }}
git log -n 3 --oneline
fi
- name: Run CI unittest
env:
docker_image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-ixuca:latest
run: |
runner_name="${{ runner.name }}"
last_char="${runner_name: -1}"
if [[ "$last_char" =~ [0-3] ]]; then
gpu_id="$last_char"
else
gpu_id="0"
fi
FD_API_PORT=$((9180 + gpu_id * 100))
FD_ENGINE_QUEUE_PORT=$((9150 + gpu_id * 100))
FD_METRICS_PORT=$((9170 + gpu_id * 100))
PARENT_DIR=$(dirname "$WORKSPACE")
echo "PARENT_DIR:$PARENT_DIR"
docker run --rm --net=host --pid=host --cap-add=ALL --privileged --shm-size=64G \
-v /usr/src:/usr/src -v /lib/modules:/lib/modules -v /dev:/dev \
-v $(pwd):/workspace -w /workspace \
-v "/data1/fastdeploy:/data1/fastdeploy" \
-e "MODEL_PATH=/ssd3/model" \
-e "http_proxy=$(git config --global --get http.proxy)" \
-e "https_proxy=$(git config --global --get https.proxy)" \
-e "FD_API_PORT=${FD_API_PORT}" \
-e "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}" \
-e "FD_METRICS_PORT=${FD_METRICS_PORT}" \
${docker_image} /bin/bash -c "
git config --global --add safe.directory /workspace/FastDeploy
cd FastDeploy
bash scripts/run_ci_iluvatar.sh
"

88
.github/workflows/ci_xpu.yml vendored Normal file
View File

@@ -0,0 +1,88 @@
name: CI_XPU
on:
pull_request:
branches:
- develop
- 'release/*'
workflow_dispatch:
concurrency:
group: ${{ github.event.pull_request.number }}-xpu-ci
cancel-in-progress: true
jobs:
CI_XPU:
runs-on: [self-hosted, XPU-P800-8Card]
steps:
- name: Print current runner name
run: |
echo "Current runner name: ${{ runner.name }}"
# Because the system version is lower than 2.23, the checkout cannot be used.
# - name: Checkout code
# uses: actions/checkout@v4
- name: Code Checkout
env:
docker_image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.1.0
run: |
REPO="https://github.com/${{ github.repository }}.git"
FULL_REPO="${{ github.repository }}"
REPO_NAME="${FULL_REPO##*/}"
BASE_BRANCH="${{ github.base_ref }}"
# Clean the repository directory before starting
docker run --rm --net=host -v $(pwd):/workspace -w /workspace \
-e "REPO_NAME=${REPO_NAME}" \
-e "BASE_BRANCH=${BASE_BRANCH}" \
${docker_image} /bin/bash -c '
if [ -d ${REPO_NAME} ]; then
echo "Directory ${REPO_NAME} exists, removing it..."
rm -rf ${REPO_NAME}
fi
'
git config --global user.name "FastDeployCI"
git config --global user.email "fastdeploy_ci@example.com"
git clone ${REPO} ${REPO_NAME} -b ${BASE_BRANCH}
cd FastDeploy
if [ "${{ github.event_name }}" = "pull_request" ]; then
git fetch origin pull/${{ github.event.pull_request.number }}/head:pr/${{ github.event.pull_request.number }}
git merge pr/${{ github.event.pull_request.number }}
git log -n 3 --oneline
else
git checkout ${{ github.sha }}
git log -n 3 --oneline
fi
- name: Run CI unittest
env:
docker_image: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.1.0
run: |
runner_name="${{ runner.name }}"
last_char="${runner_name: -1}"
if [[ "$last_char" =~ [0-3] ]]; then
gpu_id="$last_char"
else
gpu_id="0"
fi
FD_API_PORT=$((9180 + gpu_id * 100))
FD_ENGINE_QUEUE_PORT=$((9150 + gpu_id * 100))
FD_METRICS_PORT=$((9170 + gpu_id * 100))
PARENT_DIR=$(dirname "$WORKSPACE")
echo "PARENT_DIR:$PARENT_DIR"
docker run --rm --net=host --cap-add=SYS_PTRACE --privileged --shm-size=64G \
-v $(pwd):/workspace -w /workspace \
-v "/ssd3:/ssd3" \
-e "MODEL_PATH=/ssd3/model" \
-e "http_proxy=$(git config --global --get http.proxy)" \
-e "https_proxy=$(git config --global --get https.proxy)" \
-e "no_proxy=bcebos.com,mirrors.tuna.tsinghua.edu.cn,127.0.0.1,localhost" \
-e "FD_API_PORT=${FD_API_PORT}" \
-e "FD_ENGINE_QUEUE_PORT=${FD_ENGINE_QUEUE_PORT}" \
-e "FD_METRICS_PORT=${FD_METRICS_PORT}" \
${docker_image} /bin/bash -c "
git config --global --add safe.directory /workspace/FastDeploy
cd FastDeploy
bash scripts/run_ci_xpu.sh
"

24
.github/workflows/gh-pages.yml vendored Normal file
View File

@@ -0,0 +1,24 @@
name: Deploy GitHub Pages
on:
push:
branches: [ develop ]
permissions:
contents: write
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.x
- run: pip install mkdocs-material mkdocs-get-deps mkdocs-material-extensions mkdocs-multilang mkdocs-static-i18n
- name: Deploy to GitHub Pages
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
git remote set-url origin https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }}.git
mkdocs gh-deploy --force --remote-name origin

97
.github/workflows/pr_build_and_test.yml vendored Normal file
View File

@@ -0,0 +1,97 @@
name: PR Build and Test
on:
pull_request:
types: [opened, synchronize]
branches: [develop, release/**]
permissions: read-all
concurrency:
group: ${{ github.event.pull_request.number }}-${{ github.workflow }}
cancel-in-progress: true
jobs:
clone:
name: FD-Clone-Linux
uses: ./.github/workflows/_clone_linux.yml
build:
name: FD-Build-Linux
needs: clone
uses: ./.github/workflows/_build_linux.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
COMPILE_ARCH: "89,90"
WITH_NIGHTLY_BUILD: "OFF"
FD_VERSION: "0.0.0"
resultshow:
name: Use Build Output
needs: build
runs-on: ubuntu-latest
steps:
- name: Print wheel path
run: |
echo "The built wheel is located at: ${{ needs.build.outputs.wheel_path }}"
unittest_coverage:
name: Run FastDeploy Unit Tests and Coverage
needs: [clone,build]
uses: ./.github/workflows/_unit_test_coverage.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
logprob_test:
name: Run FastDeploy LogProb Tests
needs: [build]
uses: ./.github/workflows/_logprob_test_linux.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
PADDLETEST_ARCHIVE_URL: "https://xly-devops.bj.bcebos.com/PaddleTest/PaddleTest.tar.gz"
FASTDEPLOY_WHEEL_URL: ${{ needs.build.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"
pre_ce_test:
name: Extracted partial CE model tasks to run in CI.
needs: [clone,build]
uses: ./.github/workflows/_pre_ce_test.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"
base_test:
name: Run Base Tests
needs: [clone,build]
uses: ./.github/workflows/_base_test.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"
accuracy_test:
name: Run Accuracy Tests
needs: [clone,build]
uses: ./.github/workflows/_accuracy_test.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"
stable_test:
name: Run Stable Tests
needs: [clone,build]
uses: ./.github/workflows/_stable_test.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"

321
.github/workflows/publish_job.yml vendored Normal file
View File

@@ -0,0 +1,321 @@
name: Publish Job
on:
workflow_dispatch:
schedule:
- cron: '0 18 * * *' # 2:00 AM China Standard Time (UTC+8)
push:
# branches:
# - develop
tags:
- '*'
permissions: read-all
concurrency:
group: ${{ github.ref }}-${{ github.sha }}
cancel-in-progress: true
jobs:
publish_pre_check:
runs-on: ubuntu-latest
if: |
github.event.repository.fork == false &&
(
(github.event_name == 'schedule' && github.ref_name == 'develop') ||
(github.event_name == 'push' && github.ref_type == 'tag') ||
((github.event_name == 'workflow_dispatch') &&
(github.ref_name == 'develop' || github.ref_type == 'tag'))
)
env:
TAG_VERSION_MAPPINGS: ${{ vars.TAG_VERSION_MAPPINGS }}
FD_VERSION_DEV: ${{ vars.FD_VERSION_DEV }}
COMPILE_USE_PADDLE_WHL_URL_MAPPINGS: ${{ vars.COMPILE_USE_PADDLE_WHL_URL_MAPPINGS }}
outputs:
compile_use_paddle_version: ${{ steps.set_output.outputs.compile_use_paddle_version }}
compile_continue: ${{ steps.set_output.outputs.compile_continue }}
fd_version: ${{ steps.set_output.outputs.fd_version }}
with_nightly_build: ${{ steps.set_output.outputs.with_nightly_build }}
compile_use_paddle_whl_url: ${{ steps.set_output.outputs.compile_use_paddle_whl_url }}
steps:
- name: Get tag version
if: github.ref_type == 'tag'
run: |
TAG_NAME="${GITHUB_REF##*/}" # 提取 tag 名称,比如 v2.1.0
TAG_VERSION="${TAG_NAME#v}" # 去掉前缀 v
echo "FD_VERSION=$TAG_VERSION" >> $GITHUB_ENV
- name: Check FD version to Paddle version mapping
if: github.ref_type == 'tag'
env:
TARGET_FD: ${{ env.FD_VERSION }}
run: |
FOUND_PADDLE=""
# 遍历映射
for pair in $(echo $TAG_VERSION_MAPPINGS | tr ';' ' '); do
fd=$(echo "$pair" | cut -d',' -f1)
paddle=$(echo "$pair" | cut -d',' -f2)
if [[ "$fd" == "$TARGET_FD" ]]; then
FOUND_PADDLE="$paddle"
break
fi
done
if [[ -z "$FOUND_PADDLE" ]]; then
echo "No Paddle version found for FD $TARGET_FD"
else
echo "FD $TARGET_FD maps to Paddle $FOUND_PADDLE"
echo "PADDLE_VERSION=$FOUND_PADDLE" >> $GITHUB_ENV
fi
- name: Set Version
id: set_output
env:
PADDLE_VERSION: ${{ env.PADDLE_VERSION }}
FD_VERSION: ${{ env.FD_VERSION }}
run: |
if [[ "${{ github.ref_type }}" == "tag" ]]; then
if [[ -z "$PADDLE_VERSION" ]]; then
compile_continue=false
else
compile_use_paddle_version=$PADDLE_VERSION
compile_continue=true
fi
fd_version=$FD_VERSION
fi
if [[ "${{ github.ref_name }}" == "develop" ]];then
compile_continue=true
compile_use_paddle_version=""
fd_version=${FD_VERSION_DEV}
with_nightly_build=ON
fi
# Todo
# 通过变量COMPILE_USE_PADDLE_WHL_URL_MAPPINGS中的映射关系,决定是否是安装指定版本的Paddle还是直接安装URL
for pair in $(echo $COMPILE_USE_PADDLE_WHL_URL_MAPPINGS | tr ';' ' '); do
branch=$(echo "$pair" | cut -d',' -f1)
paddle_whl_url=$(echo "$pair" | cut -d',' -f2)
if [[ "$branch" == "${{ github.ref_name }}" ]]; then
FOUND_PADDLE_URL="$paddle_whl_url"
echo "compile_use_paddle_whl_url=${FOUND_PADDLE_URL}" >> $GITHUB_OUTPUT
compile_continue=true
break
fi
done
echo "compile_continue=${compile_continue}" >> $GITHUB_OUTPUT
echo "compile_use_paddle_version=${compile_use_paddle_version}" >> $GITHUB_OUTPUT
echo "fd_version=${fd_version}" >> $GITHUB_OUTPUT
echo "with_nightly_build=${with_nightly_build:-OFF}" >> $GITHUB_OUTPUT
print_publish_pre_check_outputs:
runs-on: ubuntu-latest
needs: publish_pre_check
steps:
- name: Print outputs as JSON
run: |
echo '${{ toJSON(needs.publish_pre_check.outputs) }}'
clone:
environment: CodeSync
name: FD-Clone-Linux
runs-on: ubuntu-latest
needs: publish_pre_check
if: ${{ needs.publish_pre_check.outputs.compile_continue == 'true' }}
outputs:
repo_archive_url: ${{ steps.set_output.outputs.repo_archive_url }}
steps:
- name: Clone FastDeploy
uses: actions/checkout@v4
with:
ref: ${{ github.ref_name }}
submodules: 'recursive'
fetch-depth: 1000
- name: Python Setup
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Code Info Show and Upload
id: set_output
env:
AK: ${{ secrets.BOS_AK }}
SK: ${{ secrets.BOS_SK }}
run: |
git config --unset http.https://github.com/.extraheader
git submodule foreach --recursive sh -c "git config --local --unset-all 'http.https://github.com/.extraheader'"
git submodule foreach --recursive sh -c "git config remote.origin.fetch '+refs/heads/*:refs/remotes/origin/*'"
echo "Current HEAD Log:"
git log --oneline -n 5
ls
cd ..
tar -zcf FastDeploy.tar.gz FastDeploy
if [[ "${{ github.ref_type }}" == "tag" ]]; then
commit_id=${{ github.sha }}
tag_name=${{ github.ref_name }}
target_path=paddle-qa/TAG/FastDeploy/${tag_name}/${commit_id}
else
commit_id=${{ github.sha }}
branch_name=${{ github.ref_name }}
target_path=paddle-qa/BRANCH/FastDeploy/${branch_name}/${commit_id}
fi
wget -q --no-proxy --no-check-certificate https://paddle-qa.bj.bcebos.com/CodeSync/develop/PaddlePaddle/PaddleTest/tools/bos_tools.py
push_file=$(realpath bos_tools.py)
python -m pip install bce-python-sdk==0.9.29
ls
python ${push_file} FastDeploy.tar.gz ${target_path}
target_path_stripped="${target_path#paddle-qa/}"
REPO_ARCHIVE_URL=https://paddle-qa.bj.bcebos.com/${target_path_stripped}/FastDeploy.tar.gz
echo "repo_archive_url=${REPO_ARCHIVE_URL}" >> $GITHUB_OUTPUT
resultshow:
name: Show Code Archive Output
needs: clone
runs-on: ubuntu-latest
steps:
- name: Print wheel path
run: |
echo "The code archive is located at: ${{ needs.clone.outputs.repo_archive_url }}"
build_sm8090:
name: BUILD_SM8090
needs: [clone, publish_pre_check]
uses: ./.github/workflows/_build_linux.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
COMPILE_ARCH: "80,90"
WITH_NIGHTLY_BUILD: ${{ needs.publish_pre_check.outputs.with_nightly_build }}
FD_VERSION: ${{ needs.publish_pre_check.outputs.fd_version }}
PADDLEVERSION: ${{ needs.publish_pre_check.outputs.compile_use_paddle_version }}
PADDLE_WHL_URL: ${{ needs.publish_pre_check.outputs.compile_use_paddle_whl_url }}
build_sm8689:
name: BUILD_SM8689
needs: [clone, publish_pre_check]
uses: ./.github/workflows/_build_linux.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
COMPILE_ARCH: "86,89"
WITH_NIGHTLY_BUILD: ${{ needs.publish_pre_check.outputs.with_nightly_build }}
FD_VERSION: ${{ needs.publish_pre_check.outputs.fd_version }}
PADDLEVERSION: ${{ needs.publish_pre_check.outputs.compile_use_paddle_version }}
PADDLE_WHL_URL: ${{ needs.publish_pre_check.outputs.compile_use_paddle_whl_url }}
paddle_pypi_upload_sm8090:
environment: PaddleSourceUpload
name: PADDLE_PYPI_UPLOAD_8090
needs: build_sm8090
runs-on: ubuntu-latest
env:
AK: ${{ secrets.BOS_AK }}
SK: ${{ secrets.BOS_SK }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build_sm8090.outputs.wheel_path }}
COMPILE_ARCH: "80,90"
steps:
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Wheel Info Show and Upload
if: github.ref_name == 'develop' || github.ref_type == 'tag'
run: |
echo "The wheel is located at: ${FASTDEPLOY_WHEEL_URL}"
wget -q --no-check-certificate ${FASTDEPLOY_WHEEL_URL}
filename=$(basename ${FASTDEPLOY_WHEEL_URL})
if [[ "${{ github.ref_name }}" == "develop" ]];then
target_path=paddle-whl/nightly/fastdeploy-gpu-${COMPILE_ARCH//,/_}/fastdeploy-gpu
elif [[ "${{ github.ref_type }}" == "tag" ]]; then
target_path=paddle-whl/stable/fastdeploy-gpu-${COMPILE_ARCH//,/_}/fastdeploy-gpu
else
echo "Not develop or tag, do nothing"
fi
wget -q --no-proxy --no-check-certificate https://paddle-qa.bj.bcebos.com/CodeSync/develop/PaddlePaddle/PaddleTest/tools/bos_tools.py
push_file=$(realpath bos_tools.py)
python -m pip install bce-python-sdk==0.9.29
ls
python ${push_file} ${filename} ${target_path}
paddle_pypi_upload_sm8689:
environment: PaddleSourceUpload
name: PADDLE_PYPI_UPLOAD_8689
needs: build_sm8689
runs-on: ubuntu-latest
env:
AK: ${{ secrets.BOS_AK }}
SK: ${{ secrets.BOS_SK }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build_sm8689.outputs.wheel_path }}
COMPILE_ARCH: "86,89"
steps:
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Wheel Info Show and Upload
if: github.ref_name == 'develop' || github.ref_type == 'tag'
run: |
echo "The wheel is located at: ${FASTDEPLOY_WHEEL_URL}"
wget -q --no-check-certificate ${FASTDEPLOY_WHEEL_URL}
filename=$(basename ${FASTDEPLOY_WHEEL_URL})
if [[ "${{ github.ref_name }}" == "develop" ]];then
target_path=paddle-whl/nightly/fastdeploy-gpu-${COMPILE_ARCH//,/_}/fastdeploy-gpu
elif [[ "${{ github.ref_type }}" == "tag" ]]; then
target_path=paddle-whl/stable/fastdeploy-gpu-${COMPILE_ARCH//,/_}/fastdeploy-gpu
else
echo "Not develop or tag, do nothing"
fi
wget -q --no-proxy --no-check-certificate https://paddle-qa.bj.bcebos.com/CodeSync/develop/PaddlePaddle/PaddleTest/tools/bos_tools.py
push_file=$(realpath bos_tools.py)
python -m pip install bce-python-sdk==0.9.29
ls
python ${push_file} ${filename} ${target_path}
unittest_coverage:
name: Run FastDeploy Unit Tests and Coverage
needs: [clone,build_sm8090]
uses: ./.github/workflows/_unit_test_coverage.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build_sm8090.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}
logprob_test:
name: Run FastDeploy LogProb Tests
needs: [build_sm8090]
uses: ./.github/workflows/_logprob_test_linux.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
PADDLETEST_ARCHIVE_URL: "https://xly-devops.bj.bcebos.com/PaddleTest/PaddleTest.tar.gz"
FASTDEPLOY_WHEEL_URL: ${{ needs.build_sm8090.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"
pre_ce_test:
name: Extracted partial CE model tasks to run in CI.
needs: [clone,build_sm8090]
uses: ./.github/workflows/_pre_ce_test.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build_sm8090.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"
base_test:
name: Run Base Tests
needs: [clone,build_sm8090]
uses: ./.github/workflows/_base_test.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build_sm8090.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"
accuracy_test:
name: Run Accuracy Tests
needs: [clone,build_sm8090]
uses: ./.github/workflows/_accuracy_test.yml
with:
DOCKER_IMAGE: ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleqa:fastdeploy-ciuse-cuda126-dailyupdate
FASTDEPLOY_ARCHIVE_URL: ${{ needs.clone.outputs.repo_archive_url }}
FASTDEPLOY_WHEEL_URL: ${{ needs.build_sm8090.outputs.wheel_path }}
MODEL_CACHE_DIR: "/ssd2/actions-runner/ModelData"

243
.gitignore vendored
View File

@@ -1,69 +1,178 @@
build
cmake-build-debug
cmake-build-release
.vscode
FastDeploy.cmake
build-debug.sh
*dist
fastdeploy.egg-info
fastdeploy_python.egg-info
fastdeploy_gpu_python.egg-info
.setuptools-cmake-build
fastdeploy/version.py
fastdeploy/core/config.h
python/fastdeploy/c_lib_wrap.py
python/fastdeploy/LICENSE*
python/build_cpu.sh
python/fastdeploy/ThirdPartyNotices*
*.so*
fpython/astdeploy/libs/third_libs
fastdeploy/core/config.h
fastdeploy/pybind/main.cc
python/fastdeploy/libs/lib*
python/fastdeploy/libs/third_libs
__pycache__
build_fd_android.sh
python/scripts/process_libraries.py
.vs
.idea
.DS_Store
miniprogram_npm
node_modules
.DS_Store
dist
etc
lib
dist-ssr
coverage
*.local
yalc.*
.yalc
examples/vision/collect_quantize_cc.sh
examples/vision/tests_quantize
fastdeploy/LICENSE
fastdeploy/ThirdPartyNotices.txt
FastDeployCSharp.cmake
python/fastdeploy/code_version.py
*.pdmodel
*.pdiparams
*.pdiparams.info
log.txt
serving/build
serving/build.encrypt
serving/build.encrypt.auth
output
res
tmp
# Virtualenv
/.venv/
/venv/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
/bin/
/build/
/develop-eggs/
dist/
/eggs/
/lib/
/lib64/
/output/
/parts/
/sdist/
/var/
*.egg-info/
.installed.cfg
*.egg
.eggs
# AUTHORS and ChangeLog will be generated while packaging
/AUTHORS
/ChangeLog
# BCloud / BuildSubmitter
/build_submitter.*
/logger_client_log
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
.tox/
.coverage
.cache
.pytest_cache
nosetests.xml
coverage.xml
# Translations
*.mo
*.pot
*.doctree
# Sphinx documentation
/docs/_build/
.env
log
nohup.out
llm/server/__pycache__
llm/server/data/__pycache__
llm/server/engine/__pycache__
llm/server/http_server/__pycache__
llm/server/log/
llm/client/build/
llm/client/dist/
llm/client/fastdeploy_client.egg-info/
llm/client/fastdeploy_client/tests/log/
*.pyc
.vscode
.idea
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pycharm
.DS_Store
.idea/
FETCH_HEAD
#log
log/
checkpoints/
checkpoints_origin/
result/
result_lora/
# npu kernel cache
kernel_meta*
# building custom ops cache and auto-generated codes
*.o
fastdeploy_ops.py
version.txt
EGG-INFO/
# fp8 generated codes
autogen/
fp8_fp8_gemm_scale_bias_act.cu
fp8_fp8_dual_gemm_scale_bias_act.cu
visitor_fp8_gemm_fused.cu
# third party
custom_ops/third_party
fastdeploy/model_executor/ops/base
fastdeploy/model_executor/ops/gpu/deep_gemm
gemm_profiles.json
nohup.out
#fp8_deep_gemm
custom_ops/gpu_ops/fp8_deep_gemm/deep_gemm/include/cutlass
custom_ops/gpu_ops/fp8_deep_gemm/deep_gemm/include/cute
#marlin_kernel
custom_ops/gpu_ops/moe/moe_wna16_marlin_utils/kernel_*.cu
#machete_kernel
custom_ops/gpu_ops/machete/generated
# buff
custom_ops/tmp*
build
.ccls-cache
third_party
custom_ops/gpu_ops/w4afp8_gemm/w4afp8_gemm_*.cu
custom_ops/gpu_ops/w4afp8_gemm/w4afp8_gemm_template.h
custom_ops/gpu_ops/wfp8afp8_sparse_gemm/wfp8Afp8_sparse_gemm_*.cu
custom_ops/gpu_ops/wfp8afp8_sparse_gemm/wfp8Afp8_sparse_gemm_template.h

9
.gitmodules vendored Normal file
View File

@@ -0,0 +1,9 @@
[submodule "custom_ops/third_party/DeepGEMM"]
path = custom_ops/third_party/DeepGEMM
url = https://github.com/deepseek-ai/DeepGEMM.git
[submodule "custom_ops/third_party/cutlass"]
path = custom_ops/third_party/cutlass
url = https://github.com/NVIDIA/cutlass.git
[submodule "custom_ops/third_party/nlohmann_json"]
path = custom_ops/third_party/nlohmann_json
url = https://github.com/nlohmann/json.git

View File

@@ -1,6 +1,48 @@
default_install_hook_types:
- pre-commit
- commit-msg
default_stages:
- pre-commit # Run locally
- commit-msg
# - manual # Run in CI
repos:
- repo: https://github.com/psf/black.git
rev: 25.1.0
hooks:
- id: black
files: \.(py|pyi)$
additional_dependencies: [toml]
# 自动排序
- repo: https://github.com/PyCQA/isort
rev: 5.11.5
hooks:
- id: isort
- repo: https://github.com/PyCQA/flake8
rev: 7.0.0
hooks:
- id: flake8
# 代码检查
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.11.7
hooks:
- id: ruff
args: [--output-format, github, --fix, --line-length=120, --config, pyproject.toml]
# # 拼写检查
# - repo: https://github.com/codespell-project/codespell
# rev: v2.4.1
# hooks:
# - id: codespell
# additional_dependencies: ['tomli']
# args: ['--toml', 'pyproject.toml']
# markdown
- repo: https://github.com/jackdewinter/pymarkdown
rev: v0.9.29
hooks:
- id: pymarkdown
args: ["-d", "MD029,MD031", fix]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: ed714747d7acbc5790b171702bb012af3b0fe145
rev: v5.0.0
hooks:
- id: check-merge-conflict
- id: check-symlinks
@@ -9,30 +51,3 @@ repos:
- id: detect-private-key
- id: check-symlinks
- id: check-added-large-files
- repo: local
hooks:
- id: copyright_checker
name: copyright_checker
entry: python ./.copyright.hook
language: system
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
exclude: (?!.*third_party)^.*$
- repo: local
hooks:
- id: clang-format-with-version-check
name: clang-format
description: Format files with ClangFormat.
entry: bash .clang_format.hook -i
language: system
files: \.(c|cc|cxx|cpp|cu|hxx|proto)$
- repo: local
hooks:
- id: cpplint-cpp-source
name: cpplint
description: Check C++ code style using cpplint.py.
entry: bash .cpplint_pre_commit.hook
language: system
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx)$

View File

@@ -1,772 +0,0 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
PROJECT(fastdeploy C CXX)
CMAKE_MINIMUM_REQUIRED(VERSION 3.10)
option(CSRCS_DIR_NAME "Name of source code directory")
option(LIBRARY_NAME "Name of build library name")
option(PY_LIBRARY_NAME "Name of build python library name")
if(NOT CSRCS_DIR_NAME)
set(CSRCS_DIR_NAME ".")
endif()
if(NOT LIBRARY_NAME)
set(LIBRARY_NAME "fastdeploy")
endif()
if(NOT PY_LIBRARY_NAME)
set(PY_LIBRARY_NAME "fastdeploy_main")
endif()
include(ExternalProject)
set(THIRD_PARTY_PATH ${CMAKE_CURRENT_BINARY_DIR}/third_libs)
add_subdirectory(${CSRCS_DIR_NAME}/fastdeploy)
include(${PROJECT_SOURCE_DIR}/cmake/utils.cmake)
# Set C++11 as standard for the whole project
if(NOT MSVC)
if(NOT DEFINED CMAKE_CXX_STANDARD)
set(CMAKE_CXX_STANDARD 11)
endif()
set(CMAKE_CXX_FLAGS "-Wno-format -g0 -O3")
if(NEED_ABI0)
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0)
else()
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)
endif()
endif(NOT MSVC)
include(${PROJECT_SOURCE_DIR}/cmake/build_tools.cmake)
if(UNIX AND (NOT APPLE) AND (NOT ANDROID) AND (NOT WITH_TIMVX))
download_patchelf()
set(PATCHELF_EXE ${THIRD_PARTY_PATH}/patchelf/bin/patchelf)
endif()
############################# Basic Options for FastDeploy ################################
option(WITH_GPU "Whether WITH_GPU=ON, will enable onnxruntime-gpu/paddle-infernce-gpu/poros-gpu" OFF)
option(WITH_IPU "Whether WITH_IPU=ON, will enable paddle-infernce-ipu" OFF)
option(WITH_OPENCL "Whether WITH_OPENCL=ON, will enable paddle-lite-gpu" OFF)
option(ENABLE_ORT_BACKEND "Whether to enable onnxruntime backend." OFF)
option(ENABLE_TRT_BACKEND "Whether to enable tensorrt backend." OFF)
option(ENABLE_PADDLE_BACKEND "Whether to enable paddle backend." OFF)
option(ENABLE_POROS_BACKEND "Whether to enable poros backend." OFF)
option(ENABLE_OPENVINO_BACKEND "Whether to enable openvino backend." OFF)
option(ENABLE_RKNPU2_BACKEND "Whether to enable RKNPU2 backend." OFF)
option(ENABLE_SOPHGO_BACKEND "Whether to enable SOPHON backend." OFF)
option(ENABLE_TVM_BACKEND "Whether to enable TVM backend." OFF)
option(ENABLE_LITE_BACKEND "Whether to enable paddle lite backend." OFF)
option(ENABLE_HORIZON_BACKEND "Whether to enable HORIZON backend." OFF)
option(ENABLE_VISION "Whether to enable vision models usage." OFF)
option(ENABLE_TEXT "Whether to enable text models usage." OFF)
option(ENABLE_FLYCV "Whether to enable flycv to boost image preprocess." OFF)
option(ENABLE_CVCUDA "Whether to enable NVIDIA CV-CUDA to boost image preprocess." OFF)
option(ENABLE_ENCRYPTION "Whether to enable ENCRYPTION." OFF)
option(ENABLE_BENCHMARK "Whether to enable Benchmark mode." OFF)
option(WITH_ASCEND "Whether to compile for Huawei Ascend deploy." OFF)
option(WITH_DIRECTML "Whether to compile for onnxruntime DirectML deploy." OFF)
option(WITH_TIMVX "Whether to compile for TIMVX deploy." OFF)
option(WITH_KUNLUNXIN "Whether to compile for KunlunXin XPU deploy." OFF)
option(WITH_TESTING "Whether to compile with unittest." OFF)
option(WITH_CAPI "Whether to compile with c api." OFF)
option(WITH_CSHARPAPI "Whether to compile with c# api" OFF)
option(BUILD_EXAMPLES "Whether to build fastdeploy with vision examples" OFF)
option(BUILD_PADDLE2ONNX "Whether to build paddle2onnx from sources" OFF)
######################### Paths to user's custom libraries directory #####################
set(CUDA_DIRECTORY "" CACHE PATH "If build tensorrt backend, need to define path of cuda library.")
set(TRT_DIRECTORY "" CACHE PATH "If build tensorrt backend, need to define path of tensorrt library.")
set(ORT_DIRECTORY "" CACHE PATH "User can specify the installed onnxruntime directory.")
set(OPENCV_DIRECTORY "" CACHE PATH "User can specify the installed opencv directory.")
set(OPENVINO_DIRECTORY "" CACHE PATH "User can specify the installed openvino directory.")
# Whether to build fastdeploy on device Nvidia Jetson
# Only support CPU Inference & GPU(TensorRT) Inference Now
option(BUILD_ON_JETSON "Whether to build fastdeploy on Nvidia Jetson" OFF)
if(BUILD_ON_JETSON)
set(WITH_GPU ON)
set(ENABLE_TRT_BACKEND ON)
set(ENABLE_ORT_BACKEND ON)
endif()
# config GIT_URL with github mirrors to speed up dependent repos clone
option(GIT_URL "Git URL to clone dependent repos" ${GIT_URL})
if(NOT GIT_URL)
set(GIT_URL "https://github.com")
endif()
# check build options
include(${PROJECT_SOURCE_DIR}/cmake/check.cmake)
if(WIN32 AND ENABLE_VISION)
add_definitions(-DYAML_CPP_DLL)
set(YAML_BUILD_SHARED_LIBS ON)
set(YAML_CPP_INSTALL ON)
set(CMAKE_POLICY_DEFAULT_CMP0077 NEW)
endif()
if(NOT CUDA_DIRECTORY)
set(CUDA_DIRECTORY "/usr/local/cuda")
endif()
option(BUILD_FASTDEPLOY_PYTHON "if build python lib for fastdeploy." OFF)
set(HEAD_DIR "${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}")
include_directories(${HEAD_DIR})
include_directories(${CMAKE_CURRENT_BINARY_DIR})
if (WITH_TIMVX)
include(${PROJECT_SOURCE_DIR}/cmake/timvx.cmake)
endif()
if (WITH_ASCEND)
include(${PROJECT_SOURCE_DIR}/cmake/ascend.cmake)
endif()
if (WITH_KUNLUNXIN)
include(${PROJECT_SOURCE_DIR}/cmake/kunlunxin.cmake)
endif()
if(WITH_IPU)
if(NOT ENABLE_PADDLE_BACKEND)
message("Will force to set ENABLE_PADDLE_BACKEND when build with GraphCore IPU.")
set(ENABLE_PADDLE_BACKEND ON)
endif()
add_definitions(-DWITH_IPU)
endif()
if(ANDROID)
include(${PROJECT_SOURCE_DIR}/cmake/android.cmake)
check_android_options_policy()
set_android_cxx_complie_flags()
endif()
# Check for macOS architecture
get_osx_architecture()
##################################### Building: FastDeploy C++ SDK #######################################
add_definitions(-DFASTDEPLOY_LIB)
# set CMAKE_BUILD_TYPE to Release
add_definitions(-DCMAKE_BUILD_TYPE=Release)
# configure files before glob sources.
configure_file(${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/core/config.h.in ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/core/config.h)
configure_file(${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/pybind/main.cc.in ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/pybind/main.cc)
file(GLOB_RECURSE ALL_DEPLOY_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/*.cc)
file(GLOB_RECURSE DEPLOY_ORT_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/ort/*.cc)
file(GLOB_RECURSE DEPLOY_PADDLE_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/paddle/*.cc)
file(GLOB_RECURSE DEPLOY_POROS_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/poros/*.cc)
file(GLOB_RECURSE DEPLOY_TRT_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/tensorrt/*.cc ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/tensorrt/*.cpp)
file(GLOB_RECURSE DEPLOY_OPENVINO_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/openvino/*.cc)
file(GLOB_RECURSE DEPLOY_RKNPU2_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/rknpu2/*.cc)
file(GLOB_RECURSE DEPLOY_HORIZON_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/horizon/*.cc)
file(GLOB_RECURSE DEPLOY_SOPHGO_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/sophgo/*.cc)
file(GLOB_RECURSE DEPLOY_TVM_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/tvm/*.cc)
file(GLOB_RECURSE DEPLOY_LITE_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/lite/*.cc)
file(GLOB_RECURSE DEPLOY_ENCRYPTION_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/encryption/*.cc)
file(GLOB_RECURSE DEPLOY_PIPELINE_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/pipeline/*.cc)
file(GLOB_RECURSE DEPLOY_VISION_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/vision/*.cc)
file(GLOB_RECURSE DEPLOY_TEXT_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/text/*.cc)
file(GLOB_RECURSE DEPLOY_PYBIND_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/pybind/*.cc ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/*_pybind.cc)
file(GLOB_RECURSE DEPLOY_PADDLE_CUSTOM_OP_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/paddle/ops/*.cc)
if(WITH_GPU)
file(GLOB_RECURSE DEPLOY_CUDA_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/*.cu)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_CUDA_SRCS})
file(GLOB_RECURSE DEPLOY_PADDLE_CUSTOM_OP_CUDA_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/paddle/ops/*.cu)
list(REMOVE_ITEM ALL_DEPLOY_SRCS ${DEPLOY_PADDLE_CUSTOM_OP_CUDA_SRCS})
file(GLOB_RECURSE DEPLOY_VISION_CUDA_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/vision/*.cu)
list(APPEND DEPLOY_VISION_SRCS ${DEPLOY_VISION_CUDA_SRCS})
file(GLOB_RECURSE DEPLOY_TEXT_CUDA_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/text/*.cu)
list(APPEND DEPLOY_TEXT_SRCS ${DEPLOY_TEXT_CUDA_SRCS})
endif()
list(REMOVE_ITEM DEPLOY_PADDLE_SRCS ${DEPLOY_PADDLE_CUSTOM_OP_SRCS})
list(REMOVE_ITEM ALL_DEPLOY_SRCS ${DEPLOY_ORT_SRCS} ${DEPLOY_PADDLE_SRCS}
${DEPLOY_POROS_SRCS} ${DEPLOY_TRT_SRCS}
${DEPLOY_OPENVINO_SRCS} ${DEPLOY_LITE_SRCS}
${DEPLOY_VISION_SRCS} ${DEPLOY_TEXT_SRCS}
${DEPLOY_PIPELINE_SRCS} ${DEPLOY_RKNPU2_SRCS}
${DEPLOY_SOPHGO_SRCS} ${DEPLOY_ENCRYPTION_SRCS}
${DEPLOY_HORIZON_SRCS} ${DEPLOY_TVM_SRCS}
${DEPLOY_PADDLE_CUSTOM_OP_SRCS})
set(DEPEND_LIBS "")
file(READ "${PROJECT_SOURCE_DIR}/VERSION_NUMBER" FASTDEPLOY_VERSION)
string(STRIP "${FASTDEPLOY_VERSION}" FASTDEPLOY_VERSION)
# Add eigen lib
include_directories(${PROJECT_SOURCE_DIR}/third_party/eigen)
if(WIN32)
add_definitions(-DEIGEN_STRONG_INLINE=inline)
endif()
if(ANDROID)
# Set tensor function/openmp compile policy after
# ALL_DEPLOY_SRCS/DEPEND_LIBS defined
set_android_tensor_funcs_compile_policy()
set_android_openmp_compile_policy()
endif()
# sw(sunway) not support thread_local semantic
if(WITH_SW)
add_definitions(-DEIGEN_AVOID_THREAD_LOCAL)
endif()
if(ENABLE_ORT_BACKEND)
set(ENABLE_PADDLE2ONNX ON)
add_definitions(-DENABLE_ORT_BACKEND)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_ORT_SRCS})
include(${PROJECT_SOURCE_DIR}/cmake/onnxruntime.cmake)
list(APPEND DEPEND_LIBS external_onnxruntime)
endif()
if(ENABLE_LITE_BACKEND)
add_definitions(-DENABLE_LITE_BACKEND)
include(${PROJECT_SOURCE_DIR}/cmake/paddlelite.cmake)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_LITE_SRCS})
list(APPEND DEPEND_LIBS external_paddle_lite)
endif()
if(ENABLE_PADDLE_BACKEND)
set(ENABLE_PADDLE2ONNX ON)
add_definitions(-DENABLE_PADDLE_BACKEND)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_PADDLE_SRCS})
include(${PROJECT_SOURCE_DIR}/cmake/paddle_inference.cmake)
list(APPEND DEPEND_LIBS external_paddle_inference)
if(external_dnnl_FOUND)
list(APPEND DEPEND_LIBS external_dnnl external_omp)
endif()
if(external_ort_FOUND)
list(APPEND DEPEND_LIBS external_p2o external_ort)
endif()
if(PADDLEINFERENCE_API_CUSTOM_OP)
set_paddle_custom_ops_compatible_policy()
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_PADDLE_CUSTOM_OP_SRCS})
if(WITH_GPU)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_PADDLE_CUSTOM_OP_CUDA_SRCS})
endif()
endif()
endif()
if(ENABLE_OPENVINO_BACKEND)
set(ENABLE_PADDLE2ONNX ON)
add_definitions(-DENABLE_OPENVINO_BACKEND)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_OPENVINO_SRCS})
include(${PROJECT_SOURCE_DIR}/cmake/openvino.cmake)
endif()
if(ENABLE_RKNPU2_BACKEND)
add_definitions(-DENABLE_RKNPU2_BACKEND)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_RKNPU2_SRCS})
include(${PROJECT_SOURCE_DIR}/cmake/rknpu2.cmake)
list(APPEND DEPEND_LIBS ${RKNN_RT_LIB})
endif()
if(ENABLE_HORIZON_BACKEND)
add_definitions(-DENABLE_HORIZON_BACKEND)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_HORIZON_SRCS})
include(${PROJECT_SOURCE_DIR}/cmake/horizon.cmake)
list(APPEND DEPEND_LIBS ${BPU_libs})
endif()
if(ENABLE_TVM_BACKEND)
set(CMAKE_CXX_STANDARD 17)
add_definitions(-DENABLE_TVM_BACKEND)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_TVM_SRCS})
include(${PROJECT_SOURCE_DIR}/cmake/tvm.cmake)
list(APPEND DEPEND_LIBS ${TVM_RUNTIME_LIB})
endif()
if(ENABLE_SOPHGO_BACKEND)
add_definitions(-DENABLE_SOPHGO_BACKEND)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_SOPHGO_SRCS})
include(${PROJECT_SOURCE_DIR}/cmake/sophgo.cmake)
list(APPEND DEPEND_LIBS ${SOPHGO_RT_LIB})
endif()
if(ENABLE_POROS_BACKEND)
set(CMAKE_CXX_STANDARD 14)
add_definitions(-DENABLE_POROS_BACKEND)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_POROS_SRCS})
include(${PROJECT_SOURCE_DIR}/cmake/poros.cmake)
list(APPEND DEPEND_LIBS external_poros)
set(PYTHON_MINIMUM_VERSION 3.6)
set(PYTORCH_MINIMUM_VERSION 1.9)
set(TENSORRT_MINIMUM_VERSION 8.0)
# find python3
find_package(Python3 ${PYTHON_MINIMUM_VERSION} REQUIRED COMPONENTS Interpreter Development)
message(STATUS "Found Python: ${Python3_VERSION_MAJOR}.${Python3_VERSION_MINOR}.${Python3_VERSION_PATCH}")
if (NOT Python3_SITELIB)
message(FATAL_ERROR "site-packages not found. ")
else ()
message(STATUS "site-packages: ${Python3_SITELIB}")
endif ()
include_directories(${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/poros/common)
# find trt
if(NOT WITH_GPU)
message(FATAL_ERROR "While -DENABLE_POROS_BACKEND=ON, must set -DWITH_GPU=ON, but now it's OFF")
endif()
if(NOT TRT_DIRECTORY)
message(FATAL_ERROR "While -DENABLE_POROS_BACKEND=ON, must define -DTRT_DIRECTORY, e.g -DTRT_DIRECTORY=/Downloads/TensorRT-8.4")
endif()
include_directories(${TRT_DIRECTORY}/include)
find_library(TRT_INFER_LIB nvinfer ${TRT_DIRECTORY}/lib)
find_library(TRT_ONNX_LIB nvonnxparser ${TRT_DIRECTORY}/lib)
find_library(TRT_PLUGIN_LIB nvinfer_plugin ${TRT_DIRECTORY}/lib)
list(APPEND DEPEND_LIBS ${TRT_INFER_LIB} ${TRT_ONNX_LIB} ${TRT_PLUGIN_LIB})
if(NOT EXISTS "${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt")
file(MAKE_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt")
endif()
if(EXISTS "${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib")
file(REMOVE_RECURSE "${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib")
endif()
find_package(Python COMPONENTS Interpreter Development REQUIRED)
message(STATUS "Copying ${TRT_DIRECTORY}/lib to ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib ...")
execute_process(COMMAND ${Python_EXECUTABLE} ${PROJECT_SOURCE_DIR}/scripts/copy_directory.py ${TRT_DIRECTORY}/lib ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib)
endif()
if(WITH_GPU)
add_definitions(-DWITH_GPU)
include_directories(${CUDA_DIRECTORY}/include)
if(WIN32)
find_library(CUDA_LIB cudart ${CUDA_DIRECTORY}/lib/x64)
find_library(NVJPEG_LIB nvjpeg ${CUDA_DIRECTORY}/lib/x64)
add_definitions(-DENABLE_NVJPEG)
else()
find_library(CUDA_LIB cudart ${CUDA_DIRECTORY}/lib64)
if(NOT BUILD_ON_JETSON)
find_library(NVJPEG_LIB nvjpeg ${CUDA_DIRECTORY}/lib64)
add_definitions(-DENABLE_NVJPEG)
endif()
endif()
list(APPEND DEPEND_LIBS ${CUDA_LIB} ${NVJPEG_LIB})
# build CUDA source files in fastdeploy, CUDA source files include CUDA preprocessing, TRT plugins, etc.
enable_language(CUDA)
message(STATUS "CUDA compiler: ${CMAKE_CUDA_COMPILER}, version: "
"${CMAKE_CUDA_COMPILER_ID} ${CMAKE_CUDA_COMPILER_VERSION}")
include(${PROJECT_SOURCE_DIR}/cmake/cuda.cmake)
endif()
if(WITH_OPENCL)
add_definitions(-DWITH_OPENCL)
endif()
if(ENABLE_TRT_BACKEND)
set(ENABLE_PADDLE2ONNX ON)
if(APPLE OR ANDROID OR IOS)
message(FATAL_ERROR "Cannot enable tensorrt backend in mac/ios/android os, please set -DENABLE_TRT_BACKEND=OFF.")
endif()
if(NOT WITH_GPU)
message(FATAL_ERROR "While -DENABLE_TRT_BACKEND=ON, must set -DWITH_GPU=ON, but now it's OFF")
endif()
if(NOT BUILD_ON_JETSON)
if(NOT TRT_DIRECTORY)
set(TRT_INC_DIR /usr/include/x86_64-linux-gnu/)
set(TRT_LIB_DIR /usr/lib/x86_64-linux-gnu/)
endif()
endif()
if(BUILD_ON_JETSON)
set(TRT_INC_DIR /usr/include/aarch64-linux-gnu/)
set(TRT_LIB_DIR /usr/lib/aarch64-linux-gnu/)
else()
set(TRT_INC_DIR /usr/include/x86_64-linux-gnu/)
set(TRT_LIB_DIR /usr/lib/x86_64-linux-gnu/)
if(TRT_DIRECTORY)
set(TRT_INC_DIR ${TRT_DIRECTORY}/include)
set(TRT_LIB_DIR ${TRT_DIRECTORY}/lib)
endif()
endif()
add_definitions(-DENABLE_TRT_BACKEND)
include_directories(${TRT_INC_DIR})
include_directories(${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/tensorrt/common)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_TRT_SRCS})
find_library(TRT_INFER_LIB nvinfer ${TRT_LIB_DIR} NO_DEFAULT_PATH)
find_library(TRT_ONNX_LIB nvonnxparser ${TRT_LIB_DIR} NO_DEFAULT_PATH)
find_library(TRT_PLUGIN_LIB nvinfer_plugin ${TRT_LIB_DIR} NO_DEFAULT_PATH)
list(APPEND DEPEND_LIBS ${TRT_INFER_LIB} ${TRT_ONNX_LIB} ${TRT_PLUGIN_LIB})
if(NOT BUILD_ON_JETSON AND TRT_DIRECTORY)
if(NOT EXISTS "${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt")
file(MAKE_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt")
endif()
if(EXISTS "${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib")
file(REMOVE_RECURSE "${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib")
endif()
if (NOT Python_EXECUTABLE)
find_package(Python COMPONENTS Interpreter Development REQUIRED)
endif()
message(STATUS "Copying ${TRT_DIRECTORY}/lib to ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib ...")
execute_process(COMMAND ${Python_EXECUTABLE} ${PROJECT_SOURCE_DIR}/scripts/copy_directory.py ${TRT_DIRECTORY}/lib ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib)
file(GLOB_RECURSE TRT_STATIC_LIBS ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib/*.a)
if(TRT_STATIC_LIBS)
file(REMOVE ${TRT_STATIC_LIBS})
endif()
if(UNIX AND (NOT APPLE) AND (NOT ANDROID))
execute_process(COMMAND sh -c "ls *.so*" WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib
COMMAND sh -c "xargs ${PATCHELF_EXE} --force-rpath --set-rpath '$ORIGIN'" WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/tensorrt/lib
RESULT_VARIABLE result
OUTPUT_VARIABLE curr_out
ERROR_VARIABLE curr_out)
if(ret EQUAL "1")
message(FATAL_ERROR "Failed to patchelf tensorrt libraries.")
endif()
message(STATUS "result:${result} out:${curr_out}")
endif()
endif()
endif()
if(ENABLE_VISION)
add_definitions(-DENABLE_VISION)
add_subdirectory(${PROJECT_SOURCE_DIR}/third_party/yaml-cpp)
list(APPEND DEPEND_LIBS yaml-cpp)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_VISION_SRCS})
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_PIPELINE_SRCS})
include_directories(${PROJECT_SOURCE_DIR}/third_party/yaml-cpp/include)
include(${PROJECT_SOURCE_DIR}/cmake/opencv.cmake)
if(ENABLE_FLYCV)
add_definitions(-DENABLE_FLYCV)
include(${PROJECT_SOURCE_DIR}/cmake/flycv.cmake)
list(APPEND DEPEND_LIBS ${FLYCV_LIBRARIES})
endif()
if(ENABLE_CVCUDA)
include(${PROJECT_SOURCE_DIR}/cmake/cvcuda.cmake)
add_definitions(-DENABLE_CVCUDA)
list(APPEND DEPEND_LIBS nvcv_types cvcuda)
endif()
endif()
if(ENABLE_TEXT)
add_definitions(-DENABLE_TEXT)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_TEXT_SRCS})
include(${PROJECT_SOURCE_DIR}/cmake/fast_tokenizer.cmake)
endif()
if(ENABLE_ENCRYPTION)
add_definitions(-DENABLE_ENCRYPTION)
list(APPEND ALL_DEPLOY_SRCS ${DEPLOY_ENCRYPTION_SRCS})
# include(${PROJECT_SOURCE_DIR}/cmake/gflags.cmake)
include(${PROJECT_SOURCE_DIR}/cmake/openssl.cmake)
list(APPEND DEPEND_LIBS ${OPENSSL_LIBRARIES})
endif()
if(ENABLE_PADDLE2ONNX)
add_definitions(-DENABLE_PADDLE2ONNX)
if(BUILD_PADDLE2ONNX)
download_protobuf()
include(${PROJECT_SOURCE_DIR}/cmake/build_paddle2onnx.cmake)
list(APPEND ALL_DEPLOY_SRCS ${PADDLE2ONNX_ALL_SRCS})
list(APPEND DEPEND_LIBS p2o_paddle_proto onnx)
else()
include(${PROJECT_SOURCE_DIR}/cmake/paddle2onnx.cmake)
list(APPEND DEPEND_LIBS external_paddle2onnx)
endif()
endif(ENABLE_PADDLE2ONNX)
if(WITH_CAPI)
include(${PROJECT_SOURCE_DIR}/c_api/CMakeLists.txt)
if(MSVC)
add_definitions(-DFD_CAPI)
endif()
endif()
if(WITH_CSHARPAPI)
if(MSVC)
add_subdirectory(${PROJECT_SOURCE_DIR}/csharp)
endif()
endif()
configure_file(${PROJECT_SOURCE_DIR}/FastDeploy.cmake.in ${PROJECT_SOURCE_DIR}/FastDeploy.cmake @ONLY)
configure_file(${PROJECT_SOURCE_DIR}/FastDeployCSharp.cmake.in ${PROJECT_SOURCE_DIR}/FastDeployCSharp.cmake @ONLY)
configure_file(${PROJECT_SOURCE_DIR}/python/fastdeploy/c_lib_wrap.py.in ${PROJECT_SOURCE_DIR}/python/fastdeploy/c_lib_wrap.py)
configure_file(${PROJECT_SOURCE_DIR}/python/scripts/process_libraries.py.in ${PROJECT_SOURCE_DIR}/python/scripts/process_libraries.py)
list(REMOVE_ITEM ALL_DEPLOY_SRCS ${DEPLOY_PYBIND_SRCS})
add_library(${LIBRARY_NAME} SHARED ${ALL_DEPLOY_SRCS})
redefine_file_macro(${LIBRARY_NAME})
file(READ "${PROJECT_SOURCE_DIR}/VERSION_NUMBER" FASTDEPLOY_VERSION)
string(STRIP "${FASTDEPLOY_VERSION}" FASTDEPLOY_VERSION)
if (APPLE)
set_target_properties(${LIBRARY_NAME} PROPERTIES COMPILE_FLAGS "-fvisibility=hidden")
elseif(ANDROID)
set_android_library_cxx_link_flags()
elseif(MSVC)
else()
if(WITH_GPU)
set_target_properties(${LIBRARY_NAME} PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
set_target_properties(${LIBRARY_NAME} PROPERTIES INTERFACE_COMPILE_OPTIONS
"$<$<BUILD_INTERFACE:$<COMPILE_LANGUAGE:CXX>>:-fvisibility=hidden>$<$<BUILD_INTERFACE:$<COMPILE_LANGUAGE:CUDA>>:-Xcompiler=-fvisibility=hidden>")
else()
set_target_properties(${LIBRARY_NAME} PROPERTIES COMPILE_FLAGS "-fvisibility=hidden")
endif()
set_target_properties(${LIBRARY_NAME} PROPERTIES LINK_FLAGS "-Wl,--exclude-libs,ALL")
set_target_properties(${LIBRARY_NAME} PROPERTIES LINK_FLAGS_RELEASE -s)
endif()
set_target_properties(${LIBRARY_NAME} PROPERTIES VERSION ${FASTDEPLOY_VERSION})
if(MSVC)
# disable warnings for dll export
target_compile_options(${LIBRARY_NAME} PRIVATE "$<$<BUILD_INTERFACE:$<COMPILE_LANGUAGE:CXX>>:/wd4251>$<$<BUILD_INTERFACE:$<COMPILE_LANGUAGE:CUDA>>:-Xcompiler=/wd4251>")
file(GLOB FD_FILES_REQUIRE_BIGOBJ ${CSRCS_DIR_NAME}/fastdeploy/function/reduce.cc)
set_source_files_properties(${FD_FILES_REQUIRE_BIGOBJ} PROPERTIES COMPILE_FLAGS "/bigobj")
endif()
target_link_libraries(${LIBRARY_NAME} ${DEPEND_LIBS})
if(ENABLE_PADDLE_BACKEND)
set_paddle_encrypt_auth_compatible_policy(${LIBRARY_NAME})
endif()
if(ANDROID)
set_android_extra_libraries_target()
endif()
##################################### Examples ####################################
if(WIN32)
if(ENABLE_VISION)
if("${CMAKE_GENERATOR}" STREQUAL "Ninja")
add_custom_target(copy_yaml_library ALL COMMAND ${CMAKE_COMMAND} -E copy_directory ${CMAKE_CURRENT_BINARY_DIR}/third_party/yaml-cpp ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/yaml-cpp/lib DEPENDS ${LIBRARY_NAME})
else()
add_custom_target(copy_yaml_library ALL COMMAND ${CMAKE_COMMAND} -E copy_directory ${CMAKE_CURRENT_BINARY_DIR}/third_party/yaml-cpp/Release ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/yaml-cpp/lib DEPENDS ${LIBRARY_NAME})
add_custom_target(copy_yaml_include ALL COMMAND ${CMAKE_COMMAND} -E copy_directory ${PROJECT_SOURCE_DIR}/third_party/yaml-cpp/include ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/yaml-cpp/include DEPENDS ${LIBRARY_NAME})
endif()
endif()
endif()
# add examples after prepare include paths for third-parties
if(BUILD_EXAMPLES AND EXISTS ${PROJECT_SOURCE_DIR}/examples)
add_definitions(-DBUILD_EXAMPLES)
if(NOT EXECUTABLE_OUTPUT_PATH STREQUAL ${CMAKE_CURRENT_BINARY_DIR}/bin)
set(EXECUTABLE_OUTPUT_PATH ${CMAKE_CURRENT_BINARY_DIR}/bin)
endif()
include(${PROJECT_SOURCE_DIR}/cmake/gflags.cmake)
add_subdirectory(examples)
endif()
if (WITH_TESTING AND EXISTS ${PROJECT_SOURCE_DIR}/tests)
add_definitions(-DWITH_TESTING)
include(${PROJECT_SOURCE_DIR}/cmake/gtest.cmake)
if(NOT BUILD_EXAMPLES)
include(${PROJECT_SOURCE_DIR}/cmake/gflags.cmake)
endif()
include(${PROJECT_SOURCE_DIR}/cmake/glog.cmake)
add_subdirectory(tests)
endif()
include(${PROJECT_SOURCE_DIR}/cmake/summary.cmake)
fastdeploy_summary()
################################ Installation: FastDeploy C++ SDK ###############################
if(WIN32)
install(
TARGETS ${LIBRARY_NAME}
LIBRARY DESTINATION lib
ARCHIVE DESTINATION lib
RUNTIME DESTINATION lib
)
elseif(ANDROID)
set_android_libraries_installation()
else()
install(
TARGETS ${LIBRARY_NAME}
LIBRARY DESTINATION lib)
endif()
install(
DIRECTORY ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy
DESTINATION ${CMAKE_INSTALL_PREFIX}/include
FILES_MATCHING
PATTERN "*.h"
PATTERN "${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/runtime/backends/*/*.h"
)
if(NOT EXISTS "${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/")
file(MAKE_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/third_libs/install/")
endif()
if(NOT ANDROID)
install(
DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install
DESTINATION ${CMAKE_INSTALL_PREFIX}/third_libs
)
else()
set_android_third_libs_installation()
endif()
install(
FILES
${PROJECT_SOURCE_DIR}/LICENSE
${PROJECT_SOURCE_DIR}/ThirdPartyNotices.txt
${PROJECT_SOURCE_DIR}/VERSION_NUMBER
${PROJECT_SOURCE_DIR}/FastDeploy.cmake
${PROJECT_SOURCE_DIR}/FastDeployCSharp.cmake
${PROJECT_SOURCE_DIR}/cmake/FastDeployConfig.cmake
${PROJECT_SOURCE_DIR}/cmake/utils.cmake
${PROJECT_SOURCE_DIR}/cmake/summary.cmake
${PROJECT_SOURCE_DIR}/cmake/openmp.cmake
DESTINATION ${CMAKE_INSTALL_PREFIX}
)
install(
FILES ${PROJECT_SOURCE_DIR}/cmake/gflags.cmake
DESTINATION ${CMAKE_INSTALL_PREFIX}/utils
)
if(NOT WIN32)
install(
FILES ${PROJECT_SOURCE_DIR}/scripts/fastdeploy_init.sh
DESTINATION ${CMAKE_INSTALL_PREFIX}
)
else()
install(
FILES ${PROJECT_SOURCE_DIR}/scripts/fastdeploy_init.bat
DESTINATION ${CMAKE_INSTALL_PREFIX}
)
endif()
if(WITH_ASCEND)
install(
FILES ${PROJECT_SOURCE_DIR}/scripts/ascend_init.sh
DESTINATION ${CMAKE_INSTALL_PREFIX}
)
endif()
if(WITH_CAPI)
install(
DIRECTORY ${PROJECT_SOURCE_DIR}/c_api/fastdeploy_capi
DESTINATION ${CMAKE_INSTALL_PREFIX}/include
FILES_MATCHING
PATTERN "*.h"
PATTERN "*/types_internal.h" EXCLUDE
)
endif()
include(${PROJECT_SOURCE_DIR}/cmake/config_cpack.cmake)
if(WIN32 AND BUILD_EXAMPLES)
get_windows_path(_tmp_install_dir ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install)
get_windows_path(_publish_exe_dir ${EXECUTABLE_OUTPUT_PATH}/Release)
list(GET CMAKE_CONFIGURATION_TYPES 0 _CONFIG_TYPE)
if((${CMAKE_BUILD_TYPE} MATCHES "Release") OR (${_CONFIG_TYPE} MATCHES "Release"))
install(TARGETS ${LIBRARY_NAME} RUNTIME DESTINATION ${EXECUTABLE_OUTPUT_PATH}/Release)
add_custom_target(
copy_fd_third_dlls_examples ALL COMMAND
cmd /C ${PROJECT_SOURCE_DIR}/scripts/fastdeploy_init.bat install ${_tmp_install_dir} ${_publish_exe_dir} noconfirm)
add_dependencies(copy_fd_third_dlls_examples ${LIBRARY_NAME} copy_yaml_library)
endif()
endif()
############################### Building: FastDeploy Python Wheel #############################
if(BUILD_FASTDEPLOY_PYTHON)
add_definitions(-DBUILD_FASTDEPLOY_PYTHON)
if("${PY_EXT_SUFFIX}" STREQUAL "")
if(MSVC)
set(PY_EXT_SUFFIX ".pyd")
else()
set(PY_EXT_SUFFIX ".so")
endif()
endif()
# find_package Python has replaced PythonInterp and PythonLibs since cmake 3.12
# Use the following command in the future; now this is only compatible with the latest pybind11
# find_package(Python ${PY_VERSION} COMPONENTS Interpreter Development REQUIRED)
find_package(PythonInterp ${PY_VERSION} REQUIRED)
find_package(PythonLibs ${PY_VERSION})
if(CMAKE_SYSTEM_NAME STREQUAL "AIX")
set(CMAKE_NO_SYSTEM_FROM_IMPORTED 1)
endif()
if(NOT ENABLE_VISION)
file(GLOB_RECURSE VISION_PYBIND_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/vision/*_pybind.cc)
file(GLOB_RECURSE PIPELINE_PYBIND_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/pipeline/*_pybind.cc)
list(REMOVE_ITEM DEPLOY_PYBIND_SRCS ${VISION_PYBIND_SRCS} ${PIPELINE_PYBIND_SRCS})
endif()
if(NOT ENABLE_ENCRYPTION)
file(GLOB_RECURSE ENCRYPTION_PYBIND_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/encryption/*_pybind.cc)
list(REMOVE_ITEM DEPLOY_PYBIND_SRCS ${ENCRYPTION_PYBIND_SRCS})
endif()
if (NOT ENABLE_TEXT)
file(GLOB_RECURSE TEXT_PYBIND_SRCS ${PROJECT_SOURCE_DIR}/${CSRCS_DIR_NAME}/fastdeploy/text/*_pybind.cc)
list(REMOVE_ITEM DEPLOY_PYBIND_SRCS ${TEXT_PYBIND_SRCS})
endif()
add_library(${PY_LIBRARY_NAME} MODULE ${DEPLOY_PYBIND_SRCS})
redefine_file_macro(${PY_LIBRARY_NAME})
set_target_properties(${PY_LIBRARY_NAME} PROPERTIES PREFIX "")
set_target_properties(${PY_LIBRARY_NAME}
PROPERTIES COMPILE_FLAGS "-fvisibility=hidden")
set_target_properties(${PY_LIBRARY_NAME} PROPERTIES SUFFIX ${PY_EXT_SUFFIX})
set_target_properties(${PY_LIBRARY_NAME}
PROPERTIES LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})
target_include_directories(${PY_LIBRARY_NAME} PRIVATE
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
$<INSTALL_INTERFACE:include>
${PYTHON_INCLUDE_DIR})
target_include_directories(${PY_LIBRARY_NAME} PUBLIC ${PROJECT_SOURCE_DIR}/third_party/pybind11/include)
target_include_directories(${PY_LIBRARY_NAME} PUBLIC ${PROJECT_SOURCE_DIR}/third_party/dlpack/include)
if(APPLE)
set_target_properties(${PY_LIBRARY_NAME}
PROPERTIES LINK_FLAGS "-undefined dynamic_lookup")
endif()
target_link_libraries(${PY_LIBRARY_NAME} PUBLIC ${LIBRARY_NAME})
if(MSVC)
target_link_libraries(${PY_LIBRARY_NAME} PRIVATE ${PYTHON_LIBRARIES})
target_compile_options(${PY_LIBRARY_NAME}
PRIVATE /MP
/wd4244 # 'argument': conversion from 'google::
# protobuf::uint64' to 'int', possible
# loss of data
/wd4267 # Conversion from 'size_t' to 'int',
# possible loss of data
/wd4996 # The second parameter is ignored.
${EXTRA_FLAGS})
target_compile_options(${PY_LIBRARY_NAME} PRIVATE $<$<NOT:$<CONFIG:Debug>>:/MT> $<$<CONFIG:Debug>:/MTd>)
endif()
file(REMOVE_RECURSE ${PROJECT_SOURCE_DIR}/fastdeploy/libs)
file(MAKE_DIRECTORY ${PROJECT_SOURCE_DIR}/fastdeploy/libs)
if(WIN32)
add_custom_target(copy_fd_libraries ALL COMMAND ${CMAKE_COMMAND} -E copy_directory ${CMAKE_CURRENT_BINARY_DIR}/Release ${PROJECT_SOURCE_DIR}/python/fastdeploy/libs/ DEPENDS ${PY_LIBRARY_NAME})
elseif(APPLE)
add_custom_target(copy_fd_libraries ALL COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_BINARY_DIR}/*.so** ${CMAKE_CURRENT_BINARY_DIR}/*.dylib** ${PROJECT_SOURCE_DIR}/python/fastdeploy/libs/ DEPENDS ${PY_LIBRARY_NAME})
else()
add_custom_target(copy_fd_libraries ALL COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_BINARY_DIR}/*.so* ${PROJECT_SOURCE_DIR}/python/fastdeploy/libs/ DEPENDS ${PY_LIBRARY_NAME})
endif()
add_custom_target(copy_third_libraries ALL COMMAND ${CMAKE_COMMAND} -E copy_directory ${CMAKE_CURRENT_BINARY_DIR}/third_libs/install ${PROJECT_SOURCE_DIR}/python/fastdeploy/libs/third_libs DEPENDS ${PY_LIBRARY_NAME})
endif(BUILD_FASTDEPLOY_PYTHON)
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS "5.4.0")
string(STRIP "${CMAKE_CXX_COMPILER_VERSION}" CMAKE_CXX_COMPILER_VERSION)
message(FATAL_ERROR "[ERROR] FastDeploy require g++ version >= 5.4.0, but now your g++ version is ${CMAKE_CXX_COMPILER_VERSION}, this may cause failure! Use -DCMAKE_CXX_COMPILER to define path of your compiler.")
endif()
endif()

View File

@@ -1,133 +0,0 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall
community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of
any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address,
without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
fastdeploy@baidu.com.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of
actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or permanent
ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the
community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
[https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

View File

@@ -1,497 +0,0 @@
CMAKE_MINIMUM_REQUIRED(VERSION 3.8)
# FastDeploy basic infos
set(FASTDEPLOY_VERSION @FASTDEPLOY_VERSION@)
set(LIBRARY_NAME @LIBRARY_NAME@)
# If compile with GLIBC_CXX_ABI=0
set(NEED_ABI0 @NEED_ABI0@)
# Hardware and Language API
set(WITH_GPU @WITH_GPU@)
set(WITH_IPU @WITH_IPU@)
set(WITH_OPENCL @WITH_OPENCL@)
set(WITH_ASCEND @WITH_ASCEND@)
set(WITH_DIRECTML @WITH_DIRECTML@)
set(WITH_TIMVX @WITH_TIMVX@)
set(WITH_KUNLUNXIN @WITH_KUNLUNXIN@)
set(WITH_CAPI @WITH_CAPI@)
set(WITH_CSHARPAPI @WITH_CSHARPAPI@)
set(WITH_TESTING @WITH_TESTING@)
set(BUILD_ON_JETSON @BUILD_ON_JETSON@)
set(RKNN2_TARGET_SOC "@RKNN2_TARGET_SOC@")
# Inference backend and FastDeploy Moudle
set(ENABLE_ORT_BACKEND @ENABLE_ORT_BACKEND@)
set(ENABLE_RKNPU2_BACKEND @ENABLE_RKNPU2_BACKEND@)
set(ENABLE_TVM_BACKEND @ENABLE_TVM_BACKEND@)
set(ENABLE_HORIZON_BACKEND @ENABLE_HORIZON_BACKEND@)
set(ENABLE_SOPHGO_BACKEND @ENABLE_SOPHGO_BACKEND@)
set(ENABLE_LITE_BACKEND @ENABLE_LITE_BACKEND@)
set(ENABLE_PADDLE_BACKEND @ENABLE_PADDLE_BACKEND@)
set(ENABLE_OPENVINO_BACKEND @ENABLE_OPENVINO_BACKEND@)
set(ENABLE_POROS_BACKEND @ENABLE_POROS_BACKEND@)
set(ENABLE_TRT_BACKEND @ENABLE_TRT_BACKEND@)
set(ENABLE_PADDLE2ONNX @ENABLE_PADDLE2ONNX@)
set(BUILD_PADDLE2ONNX @BUILD_PADDLE2ONNX@)
set(ENABLE_VISION @ENABLE_VISION@)
set(ENABLE_FLYCV @ENABLE_FLYCV@)
set(ENABLE_CVCUDA @ENABLE_CVCUDA@)
set(ENABLE_TEXT @ENABLE_TEXT@)
set(ENABLE_ENCRYPTION @ENABLE_ENCRYPTION@)
set(ENABLE_BENCHMARK @ENABLE_BENCHMARK@)
# Version infos and custom settings for third libs
set(PADDLEINFERENCE_VERSION @PADDLEINFERENCE_VERSION@)
set(POROS_VERSION @POROS_VERSION@)
set(OPENVINO_VERSION @OPENVINO_VERSION@)
set(OPENCV_FILENAME @OPENCV_FILENAME@)
set(OPENVINO_FILENAME @OPENVINO_FILENAME@)
set(PADDLELITE_FILENAME @PADDLELITE_FILENAME@)
set(OPENCV_DIRECTORY "@OPENCV_DIRECTORY@")
set(ORT_DIRECTORY "@ORT_DIRECTORY@")
set(OPENVINO_DIRECTORY "@OPENVINO_DIRECTORY@")
# Android: specific option for Android OS
set(WITH_ANDROID_STATIC_LIB @WITH_ANDROID_STATIC_LIB@)
set(WITH_ANDROID_LITE_STATIC @WITH_ANDROID_LITE_STATIC@)
set(WITH_ANDROID_OPENCV_STATIC @WITH_ANDROID_OPENCV_STATIC@)
set(WITH_ANDROID_FLYCV_STATIC @WITH_ANDROID_FLYCV_STATIC@)
set(WITH_ANDROID_OPENMP @WITH_ANDROID_OPENMP@)
set(WITH_ANDROID_JAVA @WITH_ANDROID_JAVA@)
set(WITH_ANDROID_TENSOR_FUNCS @WITH_ANDROID_TENSOR_FUNCS@)
# encryption and auth
set(PADDLEINFERENCE_WITH_ENCRYPT @PADDLEINFERENCE_WITH_ENCRYPT@)
set(PADDLEINFERENCE_WITH_AUTH @PADDLEINFERENCE_WITH_AUTH@)
set(FASTDEPLOY_LIBS "")
set(FASTDEPLOY_INCS "")
list(APPEND FASTDEPLOY_INCS ${CMAKE_CURRENT_LIST_DIR}/include)
# Note(zhoushunjie): include some useful utils function
include(${CMAKE_CURRENT_LIST_DIR}/utils.cmake)
# Set C++11 as standard for the whole project
if(NOT MSVC)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_FLAGS "-Wno-format")
if(NEED_ABI0)
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0)
else()
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=1)
endif()
endif(NOT MSVC)
# Set FastDeploy static lib definitions
if(WITH_ANDROID_LITE_STATIC)
add_definitions(-DWITH_LITE_STATIC)
add_definitions(-DWITH_ANDROID_LITE_STATIC)
endif()
if(WITH_ANDROID_STATIC_LIB)
add_definitions(-DWITH_STATIC_LIB)
add_definitions(-DWITH_ANDROID_STATIC_LIB)
# add_definitions(-DWITH_STATIC_WARNING)
endif()
# Still need omp while using FastDeploy static lib.
# This is due to the use of openmp for Paddle Lite's
# static library.
if(ANDROID AND WITH_ANDROID_STATIC_LIB AND WITH_ANDROID_LITE_STATIC)
include(${CMAKE_CURRENT_LIST_DIR}/openmp.cmake)
endif()
if(ANDROID)
add_library(fastdeploy STATIC IMPORTED GLOBAL)
if(WITH_ANDROID_STATIC_LIB)
set_property(TARGET fastdeploy PROPERTY IMPORTED_LOCATION
${CMAKE_CURRENT_LIST_DIR}/lib/${ANDROID_ABI}/lib${LIBRARY_NAME}_static.a)
else()
set_property(TARGET fastdeploy PROPERTY IMPORTED_LOCATION
${CMAKE_CURRENT_LIST_DIR}/lib/${ANDROID_ABI}/lib${LIBRARY_NAME}.so)
endif()
list(APPEND FASTDEPLOY_LIBS fastdeploy)
if(WITH_ANDROID_OPENMP AND (NOT WITH_ANDROID_LITE_STATIC))
add_library(fastdeploy_omp STATIC IMPORTED GLOBAL)
set_property(TARGET fastdeploy_omp PROPERTY IMPORTED_LOCATION ${CMAKE_CURRENT_LIST_DIR}/lib/${ANDROID_ABI}/libomp.so)
list(APPEND FASTDEPLOY_LIBS fastdeploy_omp)
endif()
else()
find_library(FDLIB ${LIBRARY_NAME} ${CMAKE_CURRENT_LIST_DIR}/lib NO_DEFAULT_PATH)
list(APPEND FASTDEPLOY_LIBS ${FDLIB})
endif()
if(ENABLE_ORT_BACKEND)
if (ORT_DIRECTORY)
set(ORT_LIB_PATH ${ORT_DIRECTORY}/lib)
else()
set(ORT_LIB_PATH ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/onnxruntime/lib)
endif()
message(STATUS "The path of ONNXRuntime is ${ORT_LIB_PATH}.")
find_library(ORT_LIB onnxruntime ${ORT_LIB_PATH} NO_DEFAULT_PATH)
list(APPEND FASTDEPLOY_LIBS ${ORT_LIB})
endif()
if(ENABLE_TVM_BACKEND)
if(APPLE)
set(TVM_RUNTIME_LIB ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/tvm/lib/libtvm_runtime.dylib)
else()
set(TVM_RUNTIME_LIB ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/tvm/lib/libtvm_runtime.so)
endif()
list(APPEND FASTDEPLOY_LIBS ${TVM_RUNTIME_LIB})
endif()
if(ENABLE_PADDLE_BACKEND)
find_library(PADDLE_LIB paddle_inference ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/paddle_inference/paddle/lib NO_DEFAULT_PATH)
if(WIN32)
set(DNNL_LIB "${CMAKE_CURRENT_LIST_DIR}/third_libs/install/paddle_inference/third_party/install/mkldnn/lib/mkldnn.lib")
set(IOMP_LIB "${CMAKE_CURRENT_LIST_DIR}/third_libs/install/paddle_inference/third_party/install/mklml/lib/libiomp5md.lib")
elseif(APPLE)
message(STATUS "No third parties libs(mkldnn and omp) need to link into paddle_inference on MacOS OSX.")
else()
set(DNNL_LIB "${CMAKE_CURRENT_LIST_DIR}/third_libs/install/paddle_inference/third_party/install/mkldnn/lib/libmkldnn.so.0")
set(IOMP_LIB "${CMAKE_CURRENT_LIST_DIR}/third_libs/install/paddle_inference/third_party/install/mklml/lib/libiomp5.so")
set(FDMODEL_LIB "${PADDLEINFERENCE_INSTALL_DIR}/third_party/install/fdmodel/lib/libfastdeploy_wenxin.so")
set(FDMODEL_MODEL_LIB "${PADDLEINFERENCE_INSTALL_DIR}/third_party/install/fdmodel/lib/libfastdeploy_model.so.2.0.0")
set(FDMODEL_AUTH_LIB "${PADDLEINFERENCE_INSTALL_DIR}/third_party/install/fdmodel/lib/libfastdeploy_auth.so")
if((EXISTS ${FDMODEL_LIB}) AND (EXISTS ${FDMODEL_MODEL_LIB}))
set(PADDLEINFERENCE_WITH_ENCRYPT ON CACHE BOOL "" FORCE)
list(APPEND FASTDEPLOY_LIBS ${FDMODEL_LIB} ${FDMODEL_MODEL_LIB})
endif()
if((EXISTS ${FDMODEL_LIB}) AND (EXISTS ${FDMODEL_AUTH_LIB}))
set(PADDLEINFERENCE_WITH_AUTH ON CACHE BOOL "" FORCE)
list(APPEND FASTDEPLOY_LIBS ${FDMODEL_AUTH_LIB})
endif()
if(PADDLEINFERENCE_WITH_ENCRYPT OR PADDLEINFERENCE_WITH_AUTH)
if(WITH_KUNLUNXIN)
list(APPEND FASTDEPLOY_LIBS -lssl -lcrypto)
endif()
endif()
endif()
list(APPEND FASTDEPLOY_LIBS ${PADDLE_LIB})
if(EXISTS "${DNNL_LIB}")
list(APPEND FASTDEPLOY_LIBS ${DNNL_LIB} ${IOMP_LIB})
endif()
endif()
if(ENABLE_OPENVINO_BACKEND)
if (OPENVINO_DIRECTORY)
set(OPENVINO_DIR ${OPENVINO_DIRECTORY})
else()
set(OPENVINO_DIR ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/${OPENVINO_FILENAME}/runtime)
endif()
get_openvino_libs(${OPENVINO_DIR})
message(STATUS "OPENVINO_LIBS = ${OPENVINO_LIBS}")
list(APPEND FASTDEPLOY_LIBS ${OPENVINO_LIBS})
endif()
if(ENABLE_RKNPU2_BACKEND)
if(RKNN2_TARGET_SOC STREQUAL "RK356X")
set(RKNPU2_LIB ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/rknpu2_runtime/lib/librknnrt.so)
elseif (RKNN2_TARGET_SOC STREQUAL "RK3588")
set(RKNPU2_LIB ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/rknpu2_runtime/lib/librknnrt.so)
else ()
message(FATAL_ERROR "RKNN2_TARGET_SOC is not set, ref value: RK356X or RK3588")
endif()
message(STATUS "The path of RKNPU2 is ${RKNPU2_LIB}.")
list(APPEND FASTDEPLOY_LIBS ${RKNPU2_LIB})
endif()
if(ENABLE_HORIZON_BACKEND)
set(DNN_PATH ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/dnn)
set(APPSDK_PATH ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/appsdk/appuser/)
set(DNN_LIB_PATH ${DNN_PATH}/lib)
set(APPSDK_LIB_PATH ${APPSDK_PATH}/lib/hbbpu)
set(BPU_libs dnn cnn_intf hbrt_bernoulli_aarch64)
link_directories(${DNN_LIB_PATH}
${APPSDK_PATH}/lib/hbbpu
${APPSDK_PATH}/lib)
list(APPEND FASTDEPLOY_LIBS ${BPU_libs})
endif()
if(ENABLE_LITE_BACKEND)
set(LITE_DIR ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/${PADDLELITE_FILENAME})
if(ANDROID)
if(WITH_ANDROID_LITE_STATIC)
if(WITH_ANDROID_STATIC_LIB)
list(APPEND FASTDEPLOY_INCS ${LITE_DIR}/include)
endif()
else()
add_library(paddle_full_api_shared STATIC IMPORTED GLOBAL)
set_property(TARGET paddle_full_api_shared PROPERTY IMPORTED_LOCATION ${LITE_DIR}/lib/${ANDROID_ABI}/libpaddle_full_api_shared.so)
list(APPEND FASTDEPLOY_LIBS paddle_full_api_shared)
endif()
else()
# Linux/Mac/Win/...
find_library(LITE_LIB paddle_full_api_shared ${LITE_DIR}/lib NO_DEFAULT_PATH)
list(APPEND FASTDEPLOY_LIBS ${LITE_LIB})
endif()
endif()
if(ENABLE_POROS_BACKEND)
find_library(POROS_LIB poros ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/poros/lib NO_DEFAULT_PATH)
find_library(TORCH_LIB torch ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/torch/lib NO_DEFAULT_PATH)
set(TORCH_INCLUDE "${CMAKE_CURRENT_LIST_DIR}/third_libs/install/torch/include")
list(APPEND FASTDEPLOY_LIBS ${POROS_LIB} ${TORCH_LIB})
list(APPEND FASTDEPLOY_INCS ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/poros/include ${TORCH_INCLUDE})
endif()
if(WITH_GPU)
if(NOT CUDA_DIRECTORY)
set(CUDA_DIRECTORY "/usr/local/cuda")
endif()
if(WIN32)
find_library(CUDA_LIB cudart ${CUDA_DIRECTORY}/lib/x64)
find_library(NVJPEG_LIB nvjpeg ${CUDA_DIRECTORY}/lib/x64)
else()
find_library(CUDA_LIB cudart ${CUDA_DIRECTORY}/lib64)
if(NOT BUILD_ON_JETSON)
find_library(NVJPEG_LIB nvjpeg ${CUDA_DIRECTORY}/lib64)
endif()
endif()
if(NOT CUDA_LIB)
message(FATAL_ERROR "[FastDeploy] Cannot find library cudart in ${CUDA_DIRECTORY}, Please define CUDA_DIRECTORY, e.g -DCUDA_DIRECTORY=/path/to/cuda")
endif()
list(APPEND FASTDEPLOY_LIBS ${CUDA_LIB} ${NVJPEG_LIB})
list(APPEND FASTDEPLOY_INCS ${CUDA_DIRECTORY}/include)
if(ENABLE_TRT_BACKEND)
if(BUILD_ON_JETSON)
find_library(TRT_INFER_LIB nvinfer /usr/lib/aarch64-linux-gnu/)
find_library(TRT_ONNX_LIB nvonnxparser /usr/lib/aarch64-linux-gnu/)
find_library(TRT_PLUGIN_LIB nvinfer_plugin /usr/lib/aarch64-linux-gnu/)
else()
if(EXISTS ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/tensorrt/)
find_library(TRT_INFER_LIB nvinfer ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/tensorrt/lib NO_DEFAULT_PATH)
find_library(TRT_ONNX_LIB nvonnxparser ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/tensorrt/lib NO_DEFAULT_PATH)
find_library(TRT_PLUGIN_LIB nvinfer_plugin ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/tensorrt/lib NO_DEFAULT_PATH)
else()
find_library(TRT_INFER_LIB nvinfer /usr/lib/x86_64-linux-gnu/)
find_library(TRT_ONNX_LIB nvonnxparser /usr/lib/x86_64-linux-gnu/)
find_library(TRT_PLUGIN_LIB nvinfer_plugin /usr/lib/x86_64-linux-gnu/)
endif()
endif()
list(APPEND FASTDEPLOY_LIBS ${TRT_INFER_LIB} ${TRT_ONNX_LIB} ${TRT_PLUGIN_LIB})
endif()
endif()
if(ENABLE_VISION)
if(OPENCV_DIRECTORY)
set(OpenCV_DIR ${OPENCV_DIRECTORY})
else()
if(ANDROID)
set(OpenCV_DIR ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/${OPENCV_FILENAME}/sdk/native/jni)
set(OpenCV_NATIVE_DIR ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/${OPENCV_FILENAME}/sdk/native)
else()
set(OpenCV_DIR ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/${OPENCV_FILENAME})
if(WIN32)
set(OpenCV_DIR ${OpenCV_DIR}/build)
endif()
endif()
endif()
message(STATUS "The path of OpenCV is ${OpenCV_DIR}.")
if(ANDROID)
if(WITH_ANDROID_OPENCV_STATIC)
if(WITH_ANDROID_STATIC_LIB)
# Only need the headers of opencv while using FastDeploy static lib.
list(APPEND FASTDEPLOY_INCS ${OpenCV_DIR}/include)
else()
find_package(OpenCV REQUIRED PATHS ${OpenCV_DIR})
list(APPEND FASTDEPLOY_INCS ${OpenCV_INCLUDE_DIRS})
# For now, we still need to link OpenCV static libs.
# Users may use some of opencv's apis, but they may
# not have been compiled into fastdeploy.
# list(APPEND FASTDEPLOY_LIBS ${OpenCV_LIBS})
list(APPEND FASTDEPLOY_LIBS opencv_core opencv_video opencv_highgui opencv_imgproc opencv_imgcodecs)
endif()
else()
set(OpenCV_INCLUDE_DIRS ${OpenCV_DIR}/include)
get_filename_component(OpenCV_NATIVE_DIR ${OpenCV_DIR} DIRECTORY)
set(OpenCV_LIBS_DIR ${OpenCV_NATIVE_DIR}/libs)
if(ANDROID_TOOLCHAIN MATCHES "clang") # use opencv 4.x
add_library(opencv_java4 STATIC IMPORTED GLOBAL)
set_property(TARGET opencv_java4 PROPERTY IMPORTED_LOCATION ${OpenCV_LIBS_DIR}/${ANDROID_ABI}/libopencv_java4.so)
list(APPEND FASTDEPLOY_LIBS opencv_java4)
elseif(ANDROID_TOOLCHAIN MATCHES "gcc") # use opencv 3.x
add_library(opencv_java3 STATIC IMPORTED GLOBAL)
set_property(TARGET opencv_java3 PROPERTY IMPORTED_LOCATION ${OpenCV_LIBS_DIR}/${ANDROID_ABI}/opencv_java3.so)
list(APPEND FASTDEPLOY_LIBS opencv_java3)
else()
message(FATAL_ERROR "Only support clang/gcc toolchain, but found ${ANDROID_TOOLCHAIN}.")
endif()
list(APPEND FASTDEPLOY_INCS ${OpenCV_INCLUDE_DIRS})
message(STATUS "FASTDEPLOY_INCS: ${FASTDEPLOY_INCS}")
endif()
# Win/Linux/Mac
else()
find_package(OpenCV REQUIRED PATHS ${OpenCV_DIR} NO_DEFAULT_PATH)
list(APPEND FASTDEPLOY_INCS ${OpenCV_INCLUDE_DIRS})
list(APPEND FASTDEPLOY_LIBS ${OpenCV_LIBS})
endif()
if(ENABLE_FLYCV)
include_directories(${CMAKE_CURRENT_LIST_DIR}/third_libs/install/flycv/include)
set(FLYCV_LIB_DIR ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/flycv/lib)
if(ANDROID)
if(NOT WITH_ANDROID_FLYCV_STATIC)
add_library(flycv_shared STATIC IMPORTED GLOBAL)
set_property(TARGET flycv_shared PROPERTY IMPORTED_LOCATION ${FLYCV_LIB_DIR}/${ANDROID_ABI}/libflycv_shared.so)
list(APPEND FASTDEPLOY_LIBS flycv_shared)
else()
# This code may be needed later. Therefore, I choose to
# comment it rather than delete it. (TODO:qiuyanjun)
# add_library(flycv_static STATIC IMPORTED GLOBAL)
# add_library(flycv_png16 STATIC IMPORTED GLOBAL)
# add_library(flycv_turbojpeg STATIC IMPORTED GLOBAL)
# add_library(flycv_z STATIC IMPORTED GLOBAL)
# set_property(TARGET flycv_static PROPERTY IMPORTED_LOCATION ${FLYCV_LIB_DIR}/${ANDROID_ABI}/libflycv_static.a)
# set_property(TARGET flycv_png16 PROPERTY IMPORTED_LOCATION ${FLYCV_LIB_DIR}/${ANDROID_ABI}/libpng16.a)
# set_property(TARGET flycv_turbojpeg PROPERTY IMPORTED_LOCATION ${FLYCV_LIB_DIR}/${ANDROID_ABI}/libturbojpeg.a)
# set_property(TARGET flycv_z PROPERTY IMPORTED_LOCATION ${FLYCV_LIB_DIR}/${ANDROID_ABI}/libz.a)
# list(APPEND FASTDEPLOY_LIBS flycv_static)
# list(APPEND FASTDEPLOY_LIBS flycv_png16)
# list(APPEND FASTDEPLOY_LIBS flycv_turbojpeg)
# list(APPEND FASTDEPLOY_LIBS flycv_z)
endif()
else()
find_library(FLYCV_LIB flycv_shared ${FLYCV_LIB_DIR} NO_DEFAULT_PATH)
list(APPEND FASTDEPLOY_LIBS ${FLYCV_LIB})
endif()
endif()
if(ENABLE_CVCUDA)
find_library(CVCUDA_LIB cvcuda ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/cvcuda/lib NO_DEFAULT_PATH)
find_library(NVCV_TYPES_LIB nvcv_types ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/cvcuda/lib NO_DEFAULT_PATH)
list(APPEND FASTDEPLOY_LIBS ${CVCUDA_LIB} ${NVCV_TYPES_LIB})
list(APPEND FASTDEPLOY_INCS ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/cvcuda/include NO_DEFAULT_PATH)
add_definitions(-DENABLE_CVCUDA)
endif()
endif()
if (ENABLE_TEXT)
if(ANDROID)
if(NOT ANDROID_TOOLCHAIN MATCHES "clang")
message(FATAL_ERROR "Currently, only support clang toolchain while cross compiling FastDeploy for Android with FastTokenizer, but found ${ANDROID_TOOLCHAIN}.")
endif()
add_library(core_tokenizers STATIC IMPORTED GLOBAL)
set_property(TARGET core_tokenizers PROPERTY IMPORTED_LOCATION
${CMAKE_CURRENT_LIST_DIR}/third_libs/install/fast_tokenizer/lib/${ANDROID_ABI}/libcore_tokenizers.so)
list(APPEND FASTDEPLOY_LIBS core_tokenizers)
else()
# Add dependency libs later: Linux/Mac/Win/...
find_library(FAST_TOKENIZER_LIB core_tokenizers ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/fast_tokenizer/lib NO_DEFAULT_PATH)
list(APPEND FASTDEPLOY_LIBS ${FAST_TOKENIZER_LIB})
endif()
list(APPEND FASTDEPLOY_INCS ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/fast_tokenizer/include)
list(APPEND FASTDEPLOY_INCS ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/fast_tokenizer/third_party/include)
endif()
if(ENABLE_PADDLE2ONNX)
if(ANDROID)
message(FATAL_ERROR "Not support fastdeploy-paddle2onnx APIs with Android now!")
endif()
if(NOT BUILD_PADDLE2ONNX)
find_library(PADDLE2ONNX_LIB paddle2onnx ${CMAKE_CURRENT_LIST_DIR}/third_libs/install/paddle2onnx/lib NO_DEFAULT_PATH)
list(APPEND FASTDEPLOY_LIBS ${PADDLE2ONNX_LIB})
endif()
endif()
if(WITH_KUNLUNXIN)
list(APPEND FASTDEPLOY_LIBS -lpthread -lrt -ldl)
endif()
# log lib for Android
if(ANDROID)
find_library(log-lib log)
list(APPEND FASTDEPLOY_LIBS ${log-lib})
endif()
# Update CXX LINKER's FLAGS, reference: https://zhuanlan.zhihu.com/p/595527528
if(ANDROID AND (WITH_ANDROID_OPENCV_STATIC OR WITH_ANDROID_LITE_STATIC))
set(COMMON_LINK_FLAGS_REL "-Wl,-s,--gc-sections,-exclude-libs,ALL")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} ${COMMON_LINK_FLAGS_REL} -Wl,-allow-multiple-definition" CACHE INTERNAL "" FORCE)
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${COMMON_LINK_FLAGS_REL} -Wl,-allow-multiple-definition" CACHE INTERNAL "" FORCE)
endif()
remove_duplicate_libraries(FASTDEPLOY_LIBS)
include(${CMAKE_CURRENT_LIST_DIR}/summary.cmake)
fastdeploy_summary()
message(STATUS " DEPENDENCY_LIBS : ${FASTDEPLOY_LIBS}")
if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS "5.4.0")
string(STRIP "${CMAKE_CXX_COMPILER_VERSION}" CMAKE_CXX_COMPILER_VERSION)
message(FATAL_ERROR "[ERROR] FastDeploy require g++ version >= 5.4.0, but now your g++ version is ${CMAKE_CXX_COMPILER_VERSION}, this may cause failure! Use -DCMAKE_CXX_COMPILER to define path of your compiler.")
endif()
endif()
function(install_fastdeploy_libraries DESTINATION_DIR)
# No dynamic libs need to install while using
# FastDeploy static lib.
if(ANDROID AND WITH_ANDROID_STATIC_LIB)
return()
endif()
set(DYN_LIB_SUFFIX "*.so*")
if(WIN32)
set(DYN_LIB_SUFFIX "*.dll")
elseif(APPLE)
set(DYN_LIB_SUFFIX "*.dylib*")
endif()
if(FastDeploy_DIR)
set(DYN_SEARCH_DIR ${FastDeploy_DIR})
elseif(FASTDEPLOY_INSTALL_DIR)
set(DYN_SEARCH_DIR ${FASTDEPLOY_INSTALL_DIR})
else()
message(FATAL_ERROR "Please set FastDeploy_DIR/FASTDEPLOY_INSTALL_DIR before call install_fastdeploy_libraries.")
endif()
file(GLOB_RECURSE ALL_NEED_DYN_LIBS ${DYN_SEARCH_DIR}/lib/${DYN_LIB_SUFFIX})
file(GLOB_RECURSE ALL_DEPS_DYN_LIBS ${DYN_SEARCH_DIR}/third_libs/${DYN_LIB_SUFFIX})
if(ENABLE_VISION)
# OpenCV
if(ANDROID)
file(GLOB_RECURSE ALL_OPENCV_DYN_LIBS ${OpenCV_NATIVE_DIR}/libs/${DYN_LIB_SUFFIX})
else()
file(GLOB_RECURSE ALL_OPENCV_DYN_LIBS ${OpenCV_DIR}/${DYN_LIB_SUFFIX})
endif()
list(REMOVE_ITEM ALL_DEPS_DYN_LIBS ${ALL_OPENCV_DYN_LIBS})
if(WIN32)
file(GLOB OPENCV_DYN_LIBS ${OpenCV_DIR}/x64/vc15/bin/${DYN_LIB_SUFFIX})
file(INSTALL ${OPENCV_DYN_LIBS} DESTINATION ${DESTINATION_DIR})
elseif(ANDROID AND (NOT WITH_ANDROID_OPENCV_STATIC))
file(GLOB OPENCV_DYN_LIBS ${OpenCV_NATIVE_DIR}/libs/${ANDROID_ABI}/${DYN_LIB_SUFFIX})
file(INSTALL ${OPENCV_DYN_LIBS} DESTINATION ${DESTINATION_DIR})
else() # linux/mac
file(GLOB OPENCV_DYN_LIBS ${OpenCV_DIR}/lib/${DYN_LIB_SUFFIX})
file(INSTALL ${OPENCV_DYN_LIBS} DESTINATION ${DESTINATION_DIR})
endif()
# FlyCV
if(ENABLE_FLYCV)
file(GLOB_RECURSE ALL_FLYCV_DYN_LIBS ${FLYCV_LIB_DIR}/${DYN_LIB_SUFFIX})
list(REMOVE_ITEM ALL_DEPS_DYN_LIBS ${ALL_FLYCV_DYN_LIBS})
if(ANDROID AND (NOT WITH_ANDROID_FLYCV_STATIC))
file(INSTALL ${ALL_FLYCV_DYN_LIBS} DESTINATION ${DESTINATION_DIR})
endif()
endif()
endif()
if(ENABLE_OPENVINO_BACKEND)
# need plugins.xml for openvino backend
set(OPENVINO_RUNTIME_BIN_DIR ${OPENVINO_DIR}/bin)
file(GLOB OPENVINO_PLUGIN_XML ${OPENVINO_RUNTIME_BIN_DIR}/*.xml)
file(INSTALL ${OPENVINO_PLUGIN_XML} DESTINATION ${DESTINATION_DIR})
endif()
# Install other libraries
file(INSTALL ${ALL_NEED_DYN_LIBS} DESTINATION ${DESTINATION_DIR})
file(INSTALL ${ALL_DEPS_DYN_LIBS} DESTINATION ${DESTINATION_DIR})
endfunction()

View File

@@ -1,13 +0,0 @@
list(APPEND FASTDEPLOY_DOTNET_REFERENCES
"Microsoft.CSharp"
"System"
"System.Core"
"System.Data"
"System.Deployment"
"System.Drawing"
"System.Net.Http"
"System.Xml"
"System.Reflection"
"${CMAKE_CURRENT_LIST_DIR}/csharp_lib/fastdeploy_csharp.dll")
set(FASTDEPLOY_PACKAGE_REFERENCES "OpenCvSharp4_4.7.0.20230115;OpenCvSharp4.runtime.win_4.7.0.20230115")

View File

@@ -1,3 +1,5 @@
Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
@@ -186,7 +188,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

View File

@@ -1 +0,0 @@
README_EN.md

91
README.md Normal file
View File

@@ -0,0 +1,91 @@
English | [简体中文](README_CN.md)
<p align="center">
<a href="https://github.com/PaddlePaddle/FastDeploy/releases"><img src="https://github.com/user-attachments/assets/42b0039f-39e3-4279-afda-6d1865dfbffb" width="500"></a>
</p>
<p align="center">
<a href=""><img src="https://img.shields.io/badge/python-3.10-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux-pink.svg"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/FastDeploy?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/FastDeploy?color=3af"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/FastDeploy?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/FastDeploy?color=ccf"></a>
</p>
<p align="center">
<a href="https://trendshift.io/repositories/4046" target="_blank"><img src="https://trendshift.io/api/badge/repositories/4046" alt="PaddlePaddle%2FFastDeploy | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a></br>
<a href="https://paddlepaddle.github.io/FastDeploy/get_started/installation/nvidia_gpu/"><b> Installation </b></a>
|
<a href="https://paddlepaddle.github.io/FastDeploy/get_started/quick_start"><b> Quick Start </b></a>
|
<a href="https://paddlepaddle.github.io/FastDeploy/supported_models/"><b> Supported Models </b></a>
</p>
--------------------------------------------------------------------------------
# FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
## News
**[2025-09] 🔥 FastDeploy v2.2 is newly released!** It now offers compatibility with models in the HuggingFace ecosystem, has further optimized performance, and newly adds support for [baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)!
**[2025-08] 🔥 Released FastDeploy v2.1:** A brand-new KV Cache scheduling strategy has been introduced, and expanded support for PD separation and CUDA Graph across more models. Enhanced hardware support has been added for platforms like Kunlun and Hygon, along with comprehensive optimizations to improve the performance of both the service and inference engine.
**[2025-07] The FastDeploy 2.0 Inference Deployment Challenge is now live!** Complete the inference deployment task for the ERNIE 4.5 series open-source models to win official FastDeploy 2.0 merch and generous prizes! 🎁 You're welcome to try it out and share your feedback! 📌[Sign up here](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[Event details](https://github.com/PaddlePaddle/FastDeploy/discussions/2728)
**[2025-06] 🔥 Released FastDeploy v2.0:** Supports inference and deployment for ERNIE 4.5. Furthermore, we open-source an industrial-grade PD disaggregation with context caching, dynamic role switching for effective resource utilization to further enhance inference performance for MoE models.
## About
**FastDeploy** is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers **production-ready, out-of-the-box deployment solutions** with core acceleration technologies:
- 🚀 **Load-Balanced PD Disaggregation**: Industrial-grade solution featuring context caching and dynamic instance role switching. Optimizes resource utilization while balancing SLO compliance and throughput.
- 🔄 **Unified KV Cache Transmission**: Lightweight high-performance transport library with intelligent NVLink/RDMA selection.
- 🤝 **OpenAI API Server and vLLM Compatible**: One-command deployment with [vLLM](https://github.com/vllm-project/vllm/) interface compatibility.
- 🧮 **Comprehensive Quantization Format Support**: W8A16, W8A8, W4A16, W4A8, W2A16, FP8, and more.
-**Advanced Acceleration Techniques**: Speculative decoding, Multi-Token Prediction (MTP) and Chunked Prefill.
- 🖥️ **Multi-Hardware Support**: NVIDIA GPU, Kunlunxin XPU, Hygon DCU, Ascend NPU, Iluvatar GPU, Enflame GCU, MetaX GPU etc.
## Requirements
- OS: Linux
- Python: 3.10 ~ 3.12
## Installation
FastDeploy supports inference deployment on **NVIDIA GPUs**, **Kunlunxin XPUs**, **Iluvatar GPUs**, **Enflame GCUs**, **Hygon DCUs** and other hardware. For detailed installation instructions:
- [NVIDIA GPU](./docs/get_started/installation/nvidia_gpu.md)
- [Kunlunxin XPU](./docs/get_started/installation/kunlunxin_xpu.md)
- [Iluvatar GPU](./docs/get_started/installation/iluvatar_gpu.md)
- [Enflame GCU](./docs/get_started/installation/Enflame_gcu.md)
- [Hygon DCU](./docs/get_started/installation/hygon_dcu.md)
- [MetaX GPU](./docs/get_started/installation/metax_gpu.md.md)
**Note:** We are actively working on expanding hardware support. Additional hardware platforms including Ascend NPU are currently under development and testing. Stay tuned for updates!
## Get Started
Learn how to use FastDeploy through our documentation:
- [10-Minutes Quick Deployment](./docs/get_started/quick_start.md)
- [ERNIE-4.5 Large Language Model Deployment](./docs/get_started/ernie-4.5.md)
- [ERNIE-4.5-VL Multimodal Model Deployment](./docs/get_started/ernie-4.5-vl.md)
- [Offline Inference Development](./docs/offline_inference.md)
- [Online Service Deployment](./docs/online_serving/README.md)
- [Best Practices](./docs/best_practices/README.md)
## Supported Models
Learn how to download models, enable using the torch format, and more:
- [Full Supported Models List](./docs/supported_models.md)
## Advanced Usage
- [Quantization](./docs/quantization/README.md)
- [PD Disaggregation Deployment](./docs/features/disaggregated.md)
- [Speculative Decoding](./docs/features/speculative_decoding.md)
- [Prefix Caching](./docs/features/prefix_caching.md)
- [Chunked Prefill](./docs/features/chunked_prefill.md)
## Acknowledgement
FastDeploy is licensed under the [Apache-2.0 open-source license](./LICENSE). During development, portions of [vLLM](https://github.com/vllm-project/vllm) code were referenced and incorporated to maintain interface compatibility, for which we express our gratitude.

441
README_CN.md Executable file → Normal file
View File

@@ -1,416 +1,89 @@
[English](README_EN.md) | 简体中文 | [हिन्दी](./docs/docs_i18n/README_हिन्दी.md) | [日本語](./docs/docs_i18n/README_日本語.md) | [한국인](./docs/docs_i18n/README_한국인.md) | [Pу́сский язы́к](./docs/docs_i18n/README_Pу́сский_язы́к.md)
![FastDeploy](https://user-images.githubusercontent.com/31974251/185771818-5d4423cd-c94c-4a49-9894-bc7a8d1c29d0.png)
</p>
[English](README.md) | 简体中文
<p align="center">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/FastDeploy?color=ffa"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/releases"><img src="https://github.com/user-attachments/assets/42b0039f-39e3-4279-afda-6d1865dfbffb" width="500"></a>
</p>
<p align="center">
<a href=""><img src="https://img.shields.io/badge/python-3.10-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux-pink.svg"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/FastDeploy?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/FastDeploy?color=3af"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/FastDeploy?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/FastDeploy?color=ccf"></a>
</p>
<p align="center">
<a href="/docs/cn/build_and_install"><b> 安装 </b></a>
<a href="https://trendshift.io/repositories/4046" target="_blank"><img src="https://trendshift.io/api/badge/repositories/4046" alt="PaddlePaddle%2FFastDeploy | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a></br>
<a href="https://paddlepaddle.github.io/FastDeploy/zh/get_started/installation/nvidia_gpu/"><b> 安装指导 </b></a>
|
<a href="docs/README_CN.md"><b> 使用文档 </b></a>
<a href="https://paddlepaddle.github.io/FastDeploy/zh/get_started/quick_start"><b> 快速入门 </b></a>
|
<a href="README_CN.md#fastdeploy-quick-start-python"><b> 快速开始 </b></a>
|
<a href="https://baidu-paddle.github.io/fastdeploy-api/"><b> API文档 </b></a>
|
<a href="https://github.com/PaddlePaddle/FastDeploy/releases"><b> 更新日志 </b></a>
<a href="https://paddlepaddle.github.io/FastDeploy/zh/supported_models/"><b> 支持模型列表 </b></a>
</p>
<div align="center">
--------------------------------------------------------------------------------
# FastDeploy :基于飞桨的大语言模型与视觉语言模型推理部署工具包
[<img src='https://user-images.githubusercontent.com/54695910/200465949-da478e1b-21ce-43b8-9f3f-287460e786bd.png' height="80px" width="110px">](examples/vision/classification)
[<img src='https://user-images.githubusercontent.com/54695910/188054680-2f8d1952-c120-4b67-88fc-7d2d7d2378b4.gif' height="80px" width="110px">](examples/vision/detection)
[<img src='https://user-images.githubusercontent.com/54695910/188054711-6119f0e7-d741-43b1-b273-9493d103d49f.gif' height="80px" width="110px">](examples/vision/segmentation/paddleseg)
[<img src='https://user-images.githubusercontent.com/54695910/188054718-6395321c-8937-4fa0-881c-5b20deb92aaa.gif' height="80px" width="110px">](examples/vision/segmentation/paddleseg)
[<img src='https://user-images.githubusercontent.com/54695910/188058231-a5fe1ce1-0a38-460f-9582-e0b881514908.gif' height="80px" width="110px">](examples/vision/matting)
[<img src='https://user-images.githubusercontent.com/54695910/188054691-e4cb1a70-09fe-4691-bc62-5552d50bd853.gif' height="80px" width="110px">](examples/vision/matting)
[<img src='https://user-images.githubusercontent.com/54695910/188054669-a85996ba-f7f3-4646-ae1f-3b7e3e353e7d.gif' height="80px" width="110px">](examples/vision/ocr)<br>
[<img src='https://user-images.githubusercontent.com/54695910/188059460-9845e717-c30a-4252-bd80-b7f6d4cf30cb.png' height="80px" width="110px">](examples/vision/facealign)
[<img src='https://user-images.githubusercontent.com/54695910/188054671-394db8dd-537c-42b1-9d90-468d7ad1530e.gif' height="80px" width="110px">](examples/vision/keypointdetection)
[<img src='https://user-images.githubusercontent.com/48054808/173034825-623e4f78-22a5-4f14-9b83-dc47aa868478.gif' height="80px" width="110px">](https://user-images.githubusercontent.com/54695910/200162475-f5d85d70-18fb-4930-8e7e-9ca065c1d618.gif)
[<img src='https://user-images.githubusercontent.com/54695910/200162475-f5d85d70-18fb-4930-8e7e-9ca065c1d618.gif' height="80px" width="110px">](examples/text)
[<img src='https://user-images.githubusercontent.com/54695910/212314909-77624bdd-1d12-4431-9cca-7a944ec705d3.png' height="80px" width="110px">](https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav)
</div>
## 最新活动
**[2025-09] 🔥 FastDeploy v2.2 全新发布**: HuggingFace生态模型兼容性能进一步优化更新增对[baidu/ERNIE-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking)支持!
**[2025-08] FastDeploy v2.1 发布**:全新的KV Cache调度策略更多模型支持PD分离和CUDA Graph昆仑、海光等更多硬件支持增强全方面优化服务和推理引擎的性能。
**FastDeploy**是一款**全场景**、**易用灵活**、**极致高效**的AI推理部署工具 支持**云边端**部署。提供超过 🔥160+ **Text****Vision** **Speech**和**跨模态**模型📦**开箱即用**的部署体验,并实现🔚**端到端**的推理性能优化。包括 [物体检测](./examples/vision/detection)、[字符识别OCR](./examples/vision/ocr)、[人脸](./examples/vision/facedet)、[人像扣图](./examples/vision/matting)、[多目标跟踪系统](./examples/vision/tracking/pptracking)、[NLP](./examples/text)、[Stable Diffusion文图生成](./examples/multimodal/stable_diffusion)、[TTS](./examples/audio/pp-tts) 等几十种任务场景,满足开发者**多场景、多硬件、多平台**的产业部署需求。
**[2025-07] 《FastDeploy2.0推理部署实测》专题活动已上线!** 完成文心4.5系列开源模型的推理部署等任务即可获得骨瓷马克杯等FastDeploy2.0官方周边及丰富奖金!🎁 欢迎大家体验反馈~ 📌[报名地址](https://www.wjx.top/vm/meSsp3L.aspx#) 📌[活动详情](https://github.com/PaddlePaddle/FastDeploy/discussions/2728)
<div align="center">
## 关于
<img src="https://user-images.githubusercontent.com/54695910/213087733-7f2ea97b-baa4-4b0d-9b71-202ff6032a30.png" >
**FastDeploy** 是基于飞桨PaddlePaddle的大语言模型LLM与视觉语言模型VLM推理部署工具包提供**开箱即用的生产级部署方案**,核心技术特性包括:
</div>
- 🚀 **负载均衡式PD分解**工业级解决方案支持上下文缓存与动态实例角色切换在保障SLO达标和吞吐量的同时优化资源利用率
- 🔄 **统一KV缓存传输**轻量级高性能传输库支持智能NVLink/RDMA选择
- 🤝 **OpenAI API服务与vLLM兼容**:单命令部署,兼容[vLLM](https://github.com/vllm-project/vllm/)接口
- 🧮 **全量化格式支持**W8A16、W8A8、W4A16、W4A8、W2A16、FP8等
-**高级加速技术**推测解码、多令牌预测MTP及分块预填充
- 🖥️ **多硬件支持**NVIDIA GPU、昆仑芯XPU、海光DCU、昇腾NPU、天数智芯GPU、燧原GCU、沐曦GPU等
## 要求
## 🌠 近期更新
- 操作系统: Linux
- Python: 3.10 ~ 3.12
- FastDeploy系列[**直播课程回放**](https://aistudio.baidu.com/aistudio/education/group/info/27800)
## 安装
- 服务化部署结合VisualDL新增支持可视化部署。在FastDeploy容器中启动VDL服务后即可在VDL界面修改模型配置、启动/管理模型服务、查看性能数据、发送请求等,详细操作可参考相关文档
- [Serving可视化部署](https://github.com/PaddlePaddle/FastDeploy/blob/develop/serving/docs/zh_CN/vdl_management.md)
- [Serving可视化请求](https://github.com/PaddlePaddle/FastDeploy/blob/develop/serving/docs/zh_CN/client.md#%E4%BD%BF%E7%94%A8fastdeploy-client%E8%BF%9B%E8%A1%8C%E5%8F%AF%E8%A7%86%E5%8C%96%E8%AF%B7%E6%B1%82)
FastDeploy 支持在**英伟达NVIDIAGPU**、**昆仑芯KunlunxinXPU**、**天数IluvatarGPU**、**燧原EnflameGCU**、**海光HygonDCU** 以及其他硬件上进行推理部署。详细安装说明如下:
- [英伟达 GPU](./docs/zh/get_started/installation/nvidia_gpu.md)
- [昆仑芯 XPU](./docs/zh/get_started/installation/kunlunxin_xpu.md)
- [天数 CoreX](./docs/zh/get_started/installation/iluvatar_gpu.md)
- [燧原 S60](./docs/zh/get_started/installation/Enflame_gcu.md)
- [海光 DCU](./docs/zh/get_started/installation/hygon_dcu.md)
- [沐曦 GPU](./docs/zh/get_started/installation/metax_gpu.md.md)
- **✨👥✨ 社区交流**
**注意:** 我们正在积极拓展硬件支持范围。目前包括昇腾AscendNPU 等其他硬件平台正在开发测试中。敬请关注更新!
- **Slack**Join our [Slack community](https://join.slack.com/t/fastdeployworkspace/shared_invite/zt-1o50e4voz-zbiIneCNRf_eH99eS2NVLg) and chat with other community members about ideas
## 入门指南
- **微信**:扫描二维码,填写问卷加入技术社区,与社区开发者交流部署产业落地痛点问题
通过我们的文档了解如何使用 FastDeploy
- [10分钟快速部署](./docs/zh/get_started/quick_start.md)
- [ERNIE-4.5 部署](./docs/zh/get_started/ernie-4.5.md)
- [ERNIE-4.5-VL 部署](./docs/zh/get_started/ernie-4.5-vl.md)
- [离线推理](./docs/zh/offline_inference.md)
- [在线服务](./docs/zh/online_serving/README.md)
- [最佳实践](./docs/zh/best_practices/README.md)
<div align="center">
<img src="https://user-images.githubusercontent.com/54695910/216615983-bbb78319-0231-4635-86d1-f2ebf9eac85d.jpg" width = "150" height = "150" />
</div>
## 支持模型列表
通过我们的文档了解如何下载模型如何支持torch格式等
- [模型支持列表](./docs/zh/supported_models.md)
<div id="fastdeploy-acknowledge"></div>
## 进阶用法
## 🌌 推理后端及能力
- [量化](./docs/zh/quantization/README.md)
- [分离式部署](./docs/zh/features/disaggregated.md)
- [投机解码](./docs/zh/features/speculative_decoding.md)
- [前缀缓存](./docs/zh/features/prefix_caching.md)
- [分块预填充](./docs/zh/features/chunked_prefill.md)
<font size=0.5em>
## 致谢
| | <img src="https://user-images.githubusercontent.com/54695910/212475832-f32502e2-4be2-42fc-a380-2ae265417938.png" height = "26" /> | <img src="https://user-images.githubusercontent.com/54695910/212475828-240036b0-f06c-4c44-830a-d8b136099b09.png" height = "27" /> |<img src="https://user-images.githubusercontent.com/54695910/212475827-b73a1191-b3a8-4ad5-b6f6-855b3d1ffc09.png" height = "26" />| <img src="https://user-images.githubusercontent.com/54695910/212475826-f52b0ef3-e512-49fe-9b52-e1b9d1e8b6c2.png" height = "30" /> | <img src="https://user-images.githubusercontent.com/54695910/212475825-9686ae78-bad9-4be9-852e-6ad23be209da.png" height = "30" /> | <img src="https://user-images.githubusercontent.com/54695910/212475822-067349d2-8c4a-4431-bf02-05387e2962a8.png" height = "30" /> |<img src="https://user-images.githubusercontent.com/54695910/212475820-5210efe0-3e9a-429a-ad9d-48e8da2ffd0b.png" height = "30" /> |
|:----------|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
| X86_64&nbsp;CPU | |&nbsp;&nbsp;&nbsp;<img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/>&nbsp;&nbsp;&nbsp; | <img src="https://user-images.githubusercontent.com/54695910/212474104-d82f3545-04d4-4ddd-b240-ffac34d8a920.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473392-9df374d4-5daa-4e2b-856b-6e50ff1e4282.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473392-9df374d4-5daa-4e2b-856b-6e50ff1e4282.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> | | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473392-9df374d4-5daa-4e2b-856b-6e50ff1e4282.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> |
| NVDIA&nbsp;GPU | <img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/> | <img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/> | <img src="https://user-images.githubusercontent.com/54695910/212474106-a297aa0d-9225-458e-b5b7-e31aec7cfa79.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212474104-d82f3545-04d4-4ddd-b240-ffac34d8a920.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473556-d2ebb7cc-e72b-4b49-896b-83f95ae1fe63.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> |<img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473556-d2ebb7cc-e72b-4b49-896b-83f95ae1fe63.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> | | |
|飞腾 CPU | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473393-ae1958bd-ab7d-4863-94b9-32863e600ba1.svg" height = "17"/> | | | |
| 昆仑芯 XPU | | | <img src="https://user-images.githubusercontent.com/54695910/212474104-d82f3545-04d4-4ddd-b240-ffac34d8a920.svg" height = "17"/> |<img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | | | |
| 华为昇腾 NPU | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212474104-d82f3545-04d4-4ddd-b240-ffac34d8a920.svg" height = "17"/>| <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | | | |
|Graphcore&nbsp;IPU | | <img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/> | | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/> | | | |
| 算能 | | | | <img src="https://user-images.githubusercontent.com/54695910/212473382-e3e9063f-c298-4b61-ad35-a114aa6e6555.svg" height = "17"/> | | | |
|Intel 显卡 | | | | <img src="https://user-images.githubusercontent.com/54695910/212473392-9df374d4-5daa-4e2b-856b-6e50ff1e4282.svg" height = "17"/> | | | |
|Jetson | <img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/> | <img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/> |<img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212474106-a297aa0d-9225-458e-b5b7-e31aec7cfa79.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473556-d2ebb7cc-e72b-4b49-896b-83f95ae1fe63.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> |<img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473556-d2ebb7cc-e72b-4b49-896b-83f95ae1fe63.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> | | |
|ARM&nbsp;CPU | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212474104-d82f3545-04d4-4ddd-b240-ffac34d8a920.svg" height = "17"/>| <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473393-ae1958bd-ab7d-4863-94b9-32863e600ba1.svg" height = "17"/> | | <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473393-ae1958bd-ab7d-4863-94b9-32863e600ba1.svg" height = "17"/> |
|RK3588等 | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473387-2559cc2a-024b-4452-806c-6105d8eb2339.svg" height = "17"/> | | | |
|RV1126等 | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | | | |
| 晶晨 | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | | | |
| 恩智浦 | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/> |<img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | | | |
</font>
## 🔮 文档教程
- [✴️ Python SDK快速开始](#fastdeploy-quick-start-python)
- [✴️ C++ SDK快速开始](#fastdeploy-quick-start-cpp)
- **安装文档**
- [预编译库下载安装](docs/cn/build_and_install/download_prebuilt_libraries.md)
- [GPU部署环境编译安装](docs/cn/build_and_install/gpu.md)
- [CPU部署环境编译安装](docs/cn/build_and_install/cpu.md)
- [IPU部署环境编译安装](docs/cn/build_and_install/ipu.md)
- [昆仑芯XPU部署环境编译安装](docs/cn/build_and_install/kunlunxin.md)
- [瑞芯微RV1126部署环境编译安装](docs/cn/build_and_install/rv1126.md)
- [瑞芯微RK3588部署环境编译安装](docs/cn/build_and_install/rknpu2.md)
- [晶晨A311D部署环境编译安装](docs/cn/build_and_install/a311d.md)
- [华为昇腾部署环境编译安装](docs/cn/build_and_install/huawei_ascend.md)
- [Jetson部署环境编译安装](docs/cn/build_and_install/jetson.md)
- [Android平台部署环境编译安装](docs/cn/build_and_install/android.md)
- **快速使用**
- [PP-YOLOE Python部署示例](docs/cn/quick_start/models/python.md)
- [PP-YOLOE C++部署示例](docs/cn/quick_start/models/cpp.md)
- **不同后端使用**
- [Runtime Python使用示例](docs/cn/quick_start/runtime/python.md)
- [Runtime C++使用示例](docs/cn/quick_start/runtime/cpp.md)
- [如何配置模型部署的推理后端](docs/cn/faq/how_to_change_backend.md)
- **服务化部署**
- [服务化部署镜像编译安装](serving/docs/zh_CN/compile.md)
- [服务化部署](serving)
- **API文档**
- [Python API文档](https://www.paddlepaddle.org.cn/fastdeploy-api-doc/python/html/)
- [C++ API文档](https://www.paddlepaddle.org.cn/fastdeploy-api-doc/cpp/html/)
- [Android Java API文档](java/android)
- **性能调优**
- [量化加速](docs/cn/quantize.md)
- [多线程多进程使用](/tutorials/multi_thread)
- **常见问题**
- [1. Windows上C++ SDK如何使用](docs/cn/faq/use_sdk_on_windows.md)
- [2. Android上如何使用FastDeploy C++ SDK](docs/cn/faq/use_cpp_sdk_on_android.md)
- [3. TensorRT使用中的一些技巧](docs/cn/faq/tensorrt_tricks.md)
- **更多FastDeploy部署模块**
- [Benchmark测试](benchmark)
- **模型支持列表**
- [🖥️ 服务端 模型支持列表](#fastdeploy-server-models)
- [📳 移动端和端侧 模型支持列表](#fastdeploy-edge-models)
- [⚛️ Web和小程序 模型支持列表](#fastdeploy-web-models)
- **💕开发者贡献**
- [增加新模型](docs/cn/faq/develop_a_new_model.md)
<div id="fastdeploy-quick-start-python"></div>
## 快速开始💨
<details Open>
<summary><b>Python SDK快速开始点开收缩</b></summary><div>
### 🎆 快速安装
#### 🔸 前置依赖
- CUDA >= 11.2、cuDNN >= 8.0、Python >= 3.6
- OS: Linux x86_64/macOS/Windows 10
#### 🔸 安装GPU版本
```bash
pip install numpy opencv-python fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```
#### [🔸 Conda安装(推荐✨)](docs/cn/build_and_install/download_prebuilt_libraries.md)
```bash
conda config --add channels conda-forge && conda install cudatoolkit=11.2 cudnn=8.2
```
#### 🔸 安装CPU版本
```bash
pip install numpy opencv-python fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```
### 🎇 Python 推理示例
* 准备模型和图片
```bash
wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz
tar xvf ppyoloe_crn_l_300e_coco.tgz
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
```
* 测试推理结果
```python
# GPU/TensorRT部署参考 examples/vision/detection/paddledetection/python
import cv2
import fastdeploy.vision as vision
model = vision.detection.PPYOLOE("ppyoloe_crn_l_300e_coco/model.pdmodel",
"ppyoloe_crn_l_300e_coco/model.pdiparams",
"ppyoloe_crn_l_300e_coco/infer_cfg.yml")
im = cv2.imread("000000014439.jpg")
result = model.predict(im)
print(result)
vis_im = vision.vis_detection(im, result, score_threshold=0.5)
cv2.imwrite("vis_image.jpg", vis_im)
```
</div></details>
<div id="fastdeploy-quick-start-cpp"></div>
<details close>
<summary><b>C++ SDK快速开始点开查看详情</b></summary><div>
### 🎆 安装
- 参考[C++预编译库下载](docs/cn/build_and_install/download_prebuilt_libraries.md)文档
#### 🎇 C++ 推理示例
* 准备模型和图片
```bash
wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz
tar xvf ppyoloe_crn_l_300e_coco.tgz
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
```
* 测试推理结果
```C++
// GPU/TensorRT部署参考 examples/vision/detection/paddledetection/cpp
#include "fastdeploy/vision.h"
int main(int argc, char* argv[]) {
namespace vision = fastdeploy::vision;
auto model = vision::detection::PPYOLOE("ppyoloe_crn_l_300e_coco/model.pdmodel",
"ppyoloe_crn_l_300e_coco/model.pdiparams",
"ppyoloe_crn_l_300e_coco/infer_cfg.yml");
auto im = cv::imread("000000014439.jpg");
vision::DetectionResult res;
model.Predict(im, &res);
auto vis_im = vision::VisDetection(im, res, 0.5);
cv::imwrite("vis_image.jpg", vis_im);
return 0;
}
```
</div></details>
更多部署案例请参考[模型部署示例](examples) .
<div id="fastdeploy-server-models"></div>
## ✴️ ✴️ 服务端模型支持列表 ✴️ ✴️
符号说明: (1) ✅ : 已经支持; (2) ❔: 正在进行中; (3) N/A : 暂不支持. <br>
<details open><summary><b> 服务端模型支持列表(点击可收缩)</b></summary><div>
<div align="center">
<img src="https://user-images.githubusercontent.com/115439700/212800663-894e9a7a-6d68-4b0b-bcd2-045732d08887.png" height ="40"/>
</div>
| 任务场景 | 模型 | Linux | Linux | Win | Win | Mac | Mac | Linux | Linux | Linux | Linux | Linux | Linux | Linux |
|:----------------------:|:--------------------------------------------------------------------------------------------:|:------------------------------------------------:|:----------:|:-------:|:----------:|:-------:|:-------:|:-----------:|:---------------:|:-------------:|:-------------:|:-------:|:-------:|:-------:|
| --- | --- | X86 CPU | NVIDIA GPU | X86 CPU | NVIDIA GPU | X86 CPU | Arm CPU | AArch64 CPU | 飞腾D2000 aarch64 | [NVIDIA Jetson](./docs/cn/build_and_install/jetson.md) | [Graphcore IPU](./docs/cn/build_and_install/ipu.md) | [昆仑芯 XPU](./docs/cn/build_and_install/kunlunxin.md) |[华为 昇腾](./docs/cn/build_and_install/huawei_ascend.md) | [Serving](./serving) |
| Classification | [PaddleClas/ResNet50](./examples/vision/classification/paddleclas) | [](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ |
| Classification | [TorchVison/ResNet](examples/vision/classification/resnet) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ❔ |
| Classification | [ultralytics/YOLOv5Cls](examples/vision/classification/yolov5cls) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ |❔ |
| Classification | [PaddleClas/PP-LCNet](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/PP-LCNetv2](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/EfficientNet](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/GhostNet](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/MobileNetV1](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/MobileNetV2](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/MobileNetV3](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/ShuffleNetV2](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/SqueeezeNetV1.1](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/Inceptionv3](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/PP-HGNet](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Detection | 🔥🔥[PaddleDetection/PP-YOLOE+](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ✅ |
| Detection | [🔥PaddleDetection/YOLOv8](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ❔ |
| Detection | [🔥ultralytics/YOLOv8](./examples/vision/detection/yolov8) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ |❔ | ❔ |
| Detection | [PaddleDetection/PicoDet](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ |
| Detection | [PaddleDetection/YOLOX](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ | ✅ |
| Detection | [PaddleDetection/YOLOv3](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ | ✅ |
| Detection | [PaddleDetection/PP-YOLO](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ | ✅ |
| Detection | [PaddleDetection/PP-YOLOv2](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ | ✅ |
| Detection | [PaddleDetection/Faster-RCNN](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |❔ | ✅ |
| Detection | [PaddleDetection/Mask-RCNN](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |❔ | ✅ |
| Detection | [Megvii-BaseDetection/YOLOX](./examples/vision/detection/yolox) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ❔ |
| Detection | [WongKinYiu/YOLOv7](./examples/vision/detection/yolov7) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ❔ |
| Detection | [WongKinYiu/YOLOv7end2end_trt](./examples/vision/detection/yolov7end2end_trt) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ |❔ | ❔ |
| Detection | [WongKinYiu/YOLOv7end2end_ort](./examples/vision/detection/yolov7end2end_ort) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ |❔ | ❔ |
| Detection | [meituan/YOLOv6](./examples/vision/detection/yolov6) | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ |
| Detection | [ultralytics/YOLOv5](./examples/vision/detection/yolov5) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ |✅ |
| Detection | [WongKinYiu/YOLOR](./examples/vision/detection/yolor) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ✅ | ❔ |
| Detection | [WongKinYiu/ScaledYOLOv4](./examples/vision/detection/scaledyolov4) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ |❔ | ❔ |
| Detection | [ppogg/YOLOv5Lite](./examples/vision/detection/yolov5lite) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ? | ❔ |❔ |❔ |
| Detection | [RangiLyu/NanoDetPlus](./examples/vision/detection/nanodet_plus) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ |❔ | ❔ |
| Perception | [Paddle3D/Smoke](./examples/vision/perception/paddle3d/smoke) | ❔ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ | ❔ | ❔ | ❔ | ❔ |❔ | ✅ |
| KeyPoint | [PaddleDetection/TinyPose](./examples/vision/keypointdetection/tiny_pose) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |❔ | ❔ |
| KeyPoint | [PaddleDetection/PicoDet + TinyPose](./examples/vision/keypointdetection/det_keypoint_unite) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ |❔ |
| HeadPose | [omasaht/headpose](examples/vision/headpose) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ |❔ |
| Tracking | [PaddleDetection/PP-Tracking](examples/vision/tracking/pptracking) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| OCR | [PaddleOCR/PP-OCRv2](./examples/vision/ocr) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ |✅ | ❔ |
| OCR | [PaddleOCR/PP-OCRv3](./examples/vision/ocr) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ✅ |
| Segmentation | [PaddleSeg/PP-LiteSeg](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ |❔ | ❔ |
| Segmentation | [PaddleSeg/PP-HumanSegLite](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ |✅ | ❔ |
| Segmentation | [PaddleSeg/HRNet](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ | ✅ |❔ |
| Segmentation | [PaddleSeg/PP-HumanSegServer](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ | ✅ |❔ |
| Segmentation | [PaddleSeg/Unet](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ | ✅ | ✅ |❔ |
| Segmentation | [PaddleSeg/Deeplabv3](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ | ✅ |❔ |
| FaceDetection | [biubug6/RetinaFace](./examples/vision/facedet/retinaface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ | ❔ |
| FaceDetection | [Linzaer/UltraFace](./examples/vision/facedet/ultraface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceDetection | [deepcam-cn/YOLOv5Face](./examples/vision/facedet/yolov5face) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceDetection | [insightface/SCRFD](./examples/vision/facedet/scrfd) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceAlign | [Hsintao/PFLD](examples/vision/facealign/pfld) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceAlign | [Single430/FaceLandmark1000](./examples/vision/facealign/face_landmark_1000) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ | ❔ |
| FaceAlign | [jhb86253817/PIPNet](./examples/vision/facealign) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceRecognition | [insightface/ArcFace](./examples/vision/faceid/insightface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceRecognition | [insightface/CosFace](./examples/vision/faceid/insightface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceRecognition | [insightface/PartialFC](./examples/vision/faceid/insightface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ | ❔ |
| FaceRecognition | [insightface/VPL](./examples/vision/faceid/insightface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ | ❔ |
| Matting | [ZHKKKe/MODNet](./examples/vision/matting/modnet) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ |❔ |
| Matting | [PeterL1n/RobustVideoMatting]() | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ | ❔ |
| Matting | [PaddleSeg/PP-Matting](./examples/vision/matting/ppmatting) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ❔ |
| Matting | [PaddleSeg/PP-HumanMatting](./examples/vision/matting/modnet) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ |❔ |
| Matting | [PaddleSeg/ModNet](./examples/vision/matting/modnet) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ |❔ | ❔ |
| Video Super-Resolution | [PaddleGAN/BasicVSR](./) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ |❔ |
| Video Super-Resolution | [PaddleGAN/EDVR](./examples/vision/sr/edvr) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ |❔ | ❔ |
| Video Super-Resolution | [PaddleGAN/PP-MSVSR](./examples/vision/sr/ppmsvsr) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ |❔ | ❔ |
| Information Extraction | [PaddleNLP/UIE](./examples/text/uie) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ |❔ | |
| NLP | [PaddleNLP/ERNIE-3.0](./examples/text/ernie-3.0) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ | ✅ |❔ | ✅ |
| Speech | [PaddleSpeech/PP-TTS](./examples/audio/pp-tts) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | -- |❔ |❔ | ✅ |
</div></details>
<div id="fastdeploy-edge-models"></div>
## 📳 移动端和端侧 模型支持列表
<details open><summary><b> 端侧模型支持列表(点击可收缩)</b></summary><div>
<div align="center">
<img src="https://user-images.githubusercontent.com/115439700/212800663-894e9a7a-6d68-4b0b-bcd2-045732d08887.png" height ="40"/>
</div>
| 任务场景 | 模型 | 大小(MB) | Linux | Android | Linux | Linux | Linux | Linux | Linux | 更新中... |
|:------------------:|:-----------------------------------------------------------------------------------------:|:--------:|:-------:|:-------:|:-------:|:-----------------------:|:------------------------------:|:---------------------------:|:--------------------------------:|:-------:|
| --- | --- | --- | ARM CPU | [ARM CPU](./java/android) | [瑞芯微NPU<br>RK3588/RK3568/RK3566](./docs/cn/build_and_install/rknpu2.md) | [瑞芯微NPU<br>RV1109/RV1126/RK1808](./docs/cn/build_and_install/rv1126.md) | [晶晨NPU <br>A311D/S905D/C308X](./docs/cn/build_and_install/a311d.md) | 恩智浦NPU<br>i.MX&nbsp;8M&nbsp;Plus | 更新中... |
| Classification | [PaddleClas/ResNet50](examples/vision/classification/paddleclas) | 98 | ✅ | ✅ | [](./examples/vision/classification/paddleclas/rknpu2) | ✅ | | | |
| Classification | [PaddleClas/PP-LCNet](examples/vision/classification/paddleclas) | 11.9 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/PP-LCNetv2](examples/vision/classification/paddleclas) | 26.6 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/EfficientNet](examples/vision/classification/paddleclas) | 31.4 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/GhostNet](examples/vision/classification/paddleclas) | 20.8 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/MobileNetV1](examples/vision/classification/paddleclas) | 17 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/MobileNetV2](examples/vision/classification/paddleclas) | 14.2 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/MobileNetV3](examples/vision/classification/paddleclas) | 22 | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | -- |
| Classification | [PaddleClas/ShuffleNetV2](examples/vision/classification/paddleclas) | 9.2 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/SqueezeNetV1.1](examples/vision/classification/paddleclas) | 5 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/Inceptionv3](examples/vision/classification/paddleclas) | 95.5 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/PP-HGNet](examples/vision/classification/paddleclas) | 59 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Detection | [PaddleDetection/PicoDet_s](examples/vision/detection/paddledetection) | 4.9 | ✅ | ✅ | [](./examples/vision/detection/paddledetection/rknpu2) | ✅ | ✅ | ✅ | -- |
| Detection | [YOLOv5](./examples/vision/detection/rkyolo) | | ❔ | ❔ | [](./examples/vision/detection/rkyolo) | ❔ | ❔ | ❔ | -- |
| Face Detection | [deepinsight/SCRFD](./examples/vision/facedet/scrfd) | 2.5 | ✅ | ✅ | [](./examples/vision/facedet/scrfd/rknpu2) | -- | -- | -- | -- |
| Keypoint Detection | [PaddleDetection/PP-TinyPose](examples/vision/keypointdetection/tiny_pose) | 5.5 | ✅ | ✅ | ❔ | ❔ | ❔ | ❔ | -- |
| Segmentation | [PaddleSeg/PP-LiteSeg(STDC1)](examples/vision/segmentation/paddleseg) | 32.2 | ✅ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | -- | -- | -- | -- |
| Segmentation | [PaddleSeg/PP-HumanSeg-Lite](examples/vision/segmentation/paddleseg) | 0.556 | ✅ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | -- | -- | -- | -- |
| Segmentation | [PaddleSeg/HRNet-w18](examples/vision/segmentation/paddleseg) | 38.7 | ✅ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | -- | -- | -- | -- |
| Segmentation | [PaddleSeg/PP-HumanSeg](examples/vision/segmentation/paddleseg) | 107.2 | ✅ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | -- | -- | -- | -- |
| Segmentation | [PaddleSeg/Unet](examples/vision/segmentation/paddleseg) | 53.7 | ✅ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | -- | -- | -- | -- |
| Segmentation | [PaddleSeg/Deeplabv3](examples/vision/segmentation/paddleseg) | 150 | ❔ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | | | | |
| OCR | [PaddleOCR/PP-OCRv2](examples/vision/ocr/PP-OCRv2) | 2.3+4.4 | ✅ | ✅ | ❔ | -- | -- | -- | -- |
| OCR | [PaddleOCR/PP-OCRv3](examples/vision/ocr/PP-OCRv3) | 2.4+10.6 | ✅ | ❔ | ❔ | ❔ | ❔ | ❔ | -- |
</div></details>
## ⚛️ Web和小程序 模型支持列表
<div id="fastdeploy-web-models"></div>
<details open><summary><b> Web和小程序部署支持列表点击可收缩</b></summary><div>
| 任务场景 | 模型 | [web_demo](examples/application/js/web_demo) |
|:------------------:|:-------------------------------------------------------------------------------------------:|:--------------------------------------------:|
| --- | --- | [Paddle.js](examples/application/js) |
| Detection | [FaceDetection](examples/application/js/web_demo/src/pages/cv/detection) | ✅ |
| Detection | [ScrewDetection](examples/application/js/web_demo/src/pages/cv/detection) | ✅ |
| Segmentation | [PaddleSeg/HumanSeg](./examples/application/js/web_demo/src/pages/cv/segmentation/HumanSeg) | ✅ |
| Object Recognition | [GestureRecognition](examples/application/js/web_demo/src/pages/cv/recognition) | ✅ |
| Object Recognition | [ItemIdentification](examples/application/js/web_demo/src/pages/cv/recognition) | ✅ |
| OCR | [PaddleOCR/PP-OCRv3](./examples/application/js/web_demo/src/pages/cv/ocr) | ✅ |
</div></details>
## 💐 Acknowledge
本项目中SDK生成和下载使用了[EasyEdge](https://ai.baidu.com/easyedge/app/openSource)中的免费开放能力,在此表示感谢。
## ©️ License
<div id="fastdeploy-license"></div>
FastDeploy遵循[Apache-2.0开源协议](./LICENSE)。
FastDeploy 依据 [Apache-2.0 开源许可证](./LICENSE). 进行授权。在开发过程中,我们参考并借鉴了 [vLLM](https://github.com/vllm-project/vllm) 的部分代码,以保持接口兼容性,在此表示衷心感谢。

View File

@@ -1,406 +0,0 @@
English | [简体中文](README_CN.md) | [हिन्दी](./docs/docs_i18n/README_हिन्दी.md) | [日本語](./docs/docs_i18n/README_日本語.md) | [한국인](./docs/docs_i18n/README_한국인.md) | [Pу́сский язы́к](./docs/docs_i18n/README_Pу́сский_язы́к.md)
![FastDeploy](https://user-images.githubusercontent.com/31974251/185771818-5d4423cd-c94c-4a49-9894-bc7a8d1c29d0.png)
</p>
<p align="center">
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-Apache%202-dfd.svg"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/releases"><img src="https://img.shields.io/github/v/release/PaddlePaddle/FastDeploy?color=ffa"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.7+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-pink.svg"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/graphs/contributors"><img src="https://img.shields.io/github/contributors/PaddlePaddle/FastDeploy?color=9ea"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/commits"><img src="https://img.shields.io/github/commit-activity/m/PaddlePaddle/FastDeploy?color=3af"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/issues"><img src="https://img.shields.io/github/issues/PaddlePaddle/FastDeploy?color=9cc"></a>
<a href="https://github.com/PaddlePaddle/FastDeploy/stargazers"><img src="https://img.shields.io/github/stars/PaddlePaddle/FastDeploy?color=ccf"></a>
</p>
<p align="center">
<a href="/docs/en/build_and_install"><b> Installation </b></a>
|
<a href="docs/README_EN.md"><b> Documents </b></a>
| <a href="/README_EN.md#Quick-Start"><b> Quick Start </b></a> |
<a href="https://baidu-paddle.github.io/fastdeploy-api/"><b> API Docs </b></a>
|
<a href="https://github.com/PaddlePaddle/FastDeploy/releases"><b> Release Notes </b></a>
</p>
<div align="center">
[<img src='https://user-images.githubusercontent.com/54695910/200465949-da478e1b-21ce-43b8-9f3f-287460e786bd.png' height="80px" width="110px">](examples/vision/classification)
[<img src='https://user-images.githubusercontent.com/54695910/188054680-2f8d1952-c120-4b67-88fc-7d2d7d2378b4.gif' height="80px" width="110px">](examples/vision/detection)
[<img src='https://user-images.githubusercontent.com/54695910/188054711-6119f0e7-d741-43b1-b273-9493d103d49f.gif' height="80px" width="110px">](examples/vision/segmentation/paddleseg)
[<img src='https://user-images.githubusercontent.com/54695910/188054718-6395321c-8937-4fa0-881c-5b20deb92aaa.gif' height="80px" width="110px">](examples/vision/segmentation/paddleseg)
[<img src='https://user-images.githubusercontent.com/54695910/188058231-a5fe1ce1-0a38-460f-9582-e0b881514908.gif' height="80px" width="110px">](examples/vision/matting)
[<img src='https://user-images.githubusercontent.com/54695910/188054691-e4cb1a70-09fe-4691-bc62-5552d50bd853.gif' height="80px" width="110px">](examples/vision/matting)
[<img src='https://user-images.githubusercontent.com/54695910/188054669-a85996ba-f7f3-4646-ae1f-3b7e3e353e7d.gif' height="80px" width="110px">](examples/vision/ocr)<br>
[<img src='https://user-images.githubusercontent.com/54695910/188059460-9845e717-c30a-4252-bd80-b7f6d4cf30cb.png' height="80px" width="110px">](examples/vision/facealign)
[<img src='https://user-images.githubusercontent.com/54695910/188054671-394db8dd-537c-42b1-9d90-468d7ad1530e.gif' height="80px" width="110px">](examples/vision/keypointdetection)
[<img src='https://user-images.githubusercontent.com/48054808/173034825-623e4f78-22a5-4f14-9b83-dc47aa868478.gif' height="80px" width="110px">](https://user-images.githubusercontent.com/54695910/200162475-f5d85d70-18fb-4930-8e7e-9ca065c1d618.gif)
[<img src='https://user-images.githubusercontent.com/54695910/200162475-f5d85d70-18fb-4930-8e7e-9ca065c1d618.gif' height="80px" width="110px">](examples/text)
[<img src='https://user-images.githubusercontent.com/54695910/212314909-77624bdd-1d12-4431-9cca-7a944ec705d3.png' height="80px" width="110px">](https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/parakeet_espnet_fs2_pwg_demo/tn_g2p/parakeet/001.wav)
</div>
**FastDeploy** is an **Easy-to-use** and **High Performance** AI model deployment toolkit for Cloud, Mobile and Edge with 📦**out-of-the-box and unified experience**, 🔚**end-to-end optimization** for over **🔥160+ Text, Vision, Speech and Cross-modal AI models**.
Including [image classification](examples/vision/classification), [object detection](examples/vision/detection), [OCR](./examples/vision/ocr), [face detection](./examples/vision/facedet), [matting](./examples/vision/matting), [pp-tracking](./examples/vision/tracking/pptracking), [NLP](./examples/text), [stable diffusion](./examples/multimodal/stable_diffusion), [TTS](./examples/audio/pp-tts) and other tasks to meet developers' industrial deployment needs for **multi-scenario**, **multi-hardware** and **multi-platform**.
<div align="center">
<img src="https://user-images.githubusercontent.com/54695910/213087724-7175953a-0e07-4af8-a4a1-5304163da2e0.png" >
</div>
## 🌠 Recent updates
- ✨✨✨ In **2023.01.17** we released [**YOLOv8**](./examples/vision/detection/paddledetection/) for deployment on FastDeploy series hardware, which includes [**Paddle YOLOv8**](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8) and [**ultralytics YOLOv8**](https://github.com/ultralytics/ultralytics)
- You can deploy [**Paddle YOLOv8**](https://github.com/PaddlePaddle/PaddleYOLO/tree/release/2.5/configs/yolov8) on [**Intel CPU**](./examples/vision/detection/paddledetection/python/infer_yolov8.py), [**NVIDIA GPU**](./examples/vision/detection/paddledetection/python/infer_yolov8.py), [**Jetson**](./examples/vision/detection/paddledetection/python/infer_yolov8.py), [**Phytium**](./examples/vision/detection/paddledetection/python/infer_yolov8.py), [**Kunlunxin**](./examples/vision/detection/paddledetection/python/infer_yolov8.py), [**HUAWEI Ascend**](./examples/vision/detection/paddledetection/python/infer_yolov8.py) ,[**ARM CPU**](./examples/vision/detection/paddledetection/cpp/infer_yolov8.cc) [**RK3588**](./examples/vision/detection/paddledetection/rknpu2) and [**Sophgo TPU**](./examples/vision/detection/paddledetection/sophgo). Both **Python** deployments and **C++** deployments are included.
- You can deploy [**ultralytics YOLOv8**](https://github.com/ultralytics/ultralytics) on [**Intel CPU**](./examples/vision/detection/yolov8), [**NVIDIA GPU**](./examples/vision/detection/yolov8), [**Jetson**](./examples/vision/detection/yolov8). Both **Python** deployments and **C++** deployments are included
- Fastdeploy supports quick deployment of multiple models, including **YOLOv8**, **PP-YOLOE+**, **YOLOv5** and other models
- Serving deployment combined with VisualDL supports visual deployment. After the VDL service is started in the FastDeploy container, you can modify the model configuration, start/manage the model service, view performance data, and send requests on the VDL interface. For details, see related documents
- [Serving deployment visualization](https://github.com/PaddlePaddle/FastDeploy/blob/develop/serving/docs/EN/vdl_management-en.md)
- [Serving request visualization](https://github.com/PaddlePaddle/FastDeploy/blob/develop/serving/docs/EN/client-en.md#use-visualdl-as-fastdeploy-client-for-request-visualization)
- **✨👥✨ Community**
- **Slack**Join our [Slack community](https://join.slack.com/t/fastdeployworkspace/shared_invite/zt-1m88mytoi-mBdMYcnTF~9LCKSOKXd6Tg) and chat with other community members about ideas
- **Wechat**Scan the QR code below using WeChat, follow the PaddlePaddle official account and fill out the questionnaire to join the WeChat group, and share the deployment industry implementation pain points with the community developers
<div align="center">
<img src="https://user-images.githubusercontent.com/54695910/216615983-bbb78319-0231-4635-86d1-f2ebf9eac85d.jpg" width = "150" height = "150" />
</div>
## 🌌 Inference Backend and Abilities
<font size=0.5em>
| | <img src="https://user-images.githubusercontent.com/54695910/213093175-052c3e47-75dc-4be8-9be9-6565532efa1c.png" width = "60" height = "40" /> | <img src="https://user-images.githubusercontent.com/54695910/213093173-27847120-bbb0-47b0-947f-8cf87142ed52.png" width = "75" height = "50" /> |<img src="https://user-images.githubusercontent.com/54695910/213096791-8b47c875-6c89-4e1d-8c67-e226636844e1.png" width = "85" height = "60" />| <img src="https://user-images.githubusercontent.com/54695910/212475826-f52b0ef3-e512-49fe-9b52-e1b9d1e8b6c2.png" height = "30" /> | <img src="https://user-images.githubusercontent.com/54695910/212475825-9686ae78-bad9-4be9-852e-6ad23be209da.png" height = "30" /> | <img src="https://user-images.githubusercontent.com/54695910/212475822-067349d2-8c4a-4431-bf02-05387e2962a8.png" height = "30" /> |<img src="https://user-images.githubusercontent.com/54695910/212475820-5210efe0-3e9a-429a-ad9d-48e8da2ffd0b.png" height = "30" /> |
|:----------|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
| X86_64&nbsp;CPU | |&nbsp;&nbsp;&nbsp;<img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/>&nbsp;&nbsp;&nbsp; | <img src="https://user-images.githubusercontent.com/54695910/212474104-d82f3545-04d4-4ddd-b240-ffac34d8a920.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473392-9df374d4-5daa-4e2b-856b-6e50ff1e4282.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473392-9df374d4-5daa-4e2b-856b-6e50ff1e4282.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> | | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473392-9df374d4-5daa-4e2b-856b-6e50ff1e4282.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> |
| NVDIA&nbsp;GPU | <img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/> | <img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/> | <img src="https://user-images.githubusercontent.com/54695910/212474106-a297aa0d-9225-458e-b5b7-e31aec7cfa79.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212474104-d82f3545-04d4-4ddd-b240-ffac34d8a920.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473556-d2ebb7cc-e72b-4b49-896b-83f95ae1fe63.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> |<img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473556-d2ebb7cc-e72b-4b49-896b-83f95ae1fe63.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> | | |
|Phytium CPU | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473393-ae1958bd-ab7d-4863-94b9-32863e600ba1.svg" height = "17"/> | | | |
| KunlunXin XPU | | | <img src="https://user-images.githubusercontent.com/54695910/212474104-d82f3545-04d4-4ddd-b240-ffac34d8a920.svg" height = "17"/> |<img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | | | |
| Huawei Ascend NPU | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212474104-d82f3545-04d4-4ddd-b240-ffac34d8a920.svg" height = "17"/>| <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | | | |
|Graphcore&nbsp;IPU | | <img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/> | | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/> | | | |
| Sophgo | | | | <img src="https://user-images.githubusercontent.com/54695910/212473382-e3e9063f-c298-4b61-ad35-a114aa6e6555.svg" height = "17"/> | | | |
|Intel graphics card | | | | <img src="https://user-images.githubusercontent.com/54695910/212473392-9df374d4-5daa-4e2b-856b-6e50ff1e4282.svg" height = "17"/> | | | |
| Jetson | <img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/> | <img src="https://user-images.githubusercontent.com/54695910/212545467-e64ee45d-bf12-492c-b263-b860cb1e172b.png" height = "25"/> |<img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212474106-a297aa0d-9225-458e-b5b7-e31aec7cfa79.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473556-d2ebb7cc-e72b-4b49-896b-83f95ae1fe63.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> |<img src="https://user-images.githubusercontent.com/54695910/212473391-92c9f289-a81a-4927-9f31-1ab3fa3c2971.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473556-d2ebb7cc-e72b-4b49-896b-83f95ae1fe63.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473190-fdf3cee2-5670-47b5-85e7-6853a8dd200a.svg" height = "17"/> | | |
|ARM&nbsp;CPU | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212474104-d82f3545-04d4-4ddd-b240-ffac34d8a920.svg" height = "17"/>| <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/><br><img src="https://user-images.githubusercontent.com/54695910/212473393-ae1958bd-ab7d-4863-94b9-32863e600ba1.svg" height = "17"/> | | <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473393-ae1958bd-ab7d-4863-94b9-32863e600ba1.svg" height = "17"/> |
|RK3588 etc. | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473387-2559cc2a-024b-4452-806c-6105d8eb2339.svg" height = "17"/> | | | |
|RV1126 etc. | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | | | |
| Amlogic | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/> | <img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | | | |
| NXP | | | <img src="https://user-images.githubusercontent.com/54695910/212474105-38051192-9a1c-4b24-8ad1-f842fb0bf39d.svg" height = "17"/> |<img src="https://user-images.githubusercontent.com/54695910/212473389-8c341bbe-30d4-4a28-b50a-074be4e98ce6.svg" height = "17"/> | | | |
</font>
## 🔮 Contents
- [✴️ A Quick Start for Python SDK](#fastdeploy-quick-start-python)
- [✴️ A Quick Start for C++ SDK](#fastdeploy-quick-start-cpp)
- **Installation**
- [How to Install Prebuilt Library](docs/en/build_and_install/download_prebuilt_libraries.md)
- [How to Build GPU Deployment Environment](docs/en/build_and_install/gpu.md)
- [How to Build CPU Deployment Environment](docs/en/build_and_install/cpu.md)
- [How to Build IPU Deployment Environment](docs/en/build_and_install/ipu.md)
- [How to Build KunlunXin XPU Deployment Environment](docs/en/build_and_install/kunlunxin.md)
- [How to Build RV1126 Deployment Environment](docs/en/build_and_install/rv1126.md)
- [How to Build RKNPU2 Deployment Environment](docs/en/build_and_install/rknpu2.md)
- [How to Build A311D Deployment Environment](docs/en/build_and_install/a311d.md)
- [How to build Huawei Ascend Deployment Environment](docs/en/build_and_install/huawei_ascend.md)
- [How to Build FastDeploy Library on Nvidia Jetson Platform](docs/en/build_and_install/jetson.md)
- [How to Build FastDeploy Android C++ SDK](docs/en/build_and_install/android.md)
- **Quick Start**
- [PP-YOLOE Python Deployment Example](docs/en/quick_start/models/python.md)
- [PP-YOLOE C++ Deployment Example](docs/en/quick_start/models/cpp.md)
- **Demos on Different Backends**
- [Runtime Python Inference](docs/en/quick_start/runtime/python.md)
- [Runtime C++ Inference](docs/en/quick_start/runtime/cpp.md)
- [How to Change Model Inference Backend](docs/en/faq/how_to_change_backend.md)
- **Serving Deployment**
- [FastDeploy Serving Deployment Image Compilation](serving/docs/EN/compile-en.md)
- [Serving Deployment](serving)
- **API Documents**
- [Python API](https://www.paddlepaddle.org.cn/fastdeploy-api-doc/python/html/)
- [C++ API](https://www.paddlepaddle.org.cn/fastdeploy-api-doc/cpp/html/)
- [Android Java API](java/android)
- **Performance Tune-up**
- [Quantization Acceleration](docs/en/quantize.md)
- [Multi thread](/tutorials/multi_thread)
- **FAQ**
- [1. Using the FastDeploy C++ SDK on Windows Platform](docs/en/faq/use_sdk_on_windows.md)
- [2. FastDeploy to deploy on Android Platform](docs/en/faq/use_cpp_sdk_on_android.md)
- [3. TensorRT Q&As](docs/en/faq/tensorrt_tricks.md)
- **More FastDeploy Deploy Modules**
- [Benchmark Testing](benchmark)
- **Model list**
- [🖥️ Supported Server-side and Cloud Model List](#fastdeploy-server-models)
- [📳 Supported Mobile and Edge Model List](#fastdeploy-edge-models)
- [⚛️ Supported Web and Mini Program Model List](#fastdeploy-web-models)
- **💕 Developer Contributions**
- [Develop a new model](docs/en/faq/develop_a_new_model.md)
## Quick Start💨
<div id="fastdeploy-quick-start-python"></div>
<details Open>
<summary><b>A Quick Start for Python SDK(click to fold)</b></summary><div>
#### 🎆 Installation
##### 🔸 Prerequisites
- CUDA >= 11.2 、cuDNN >= 8.0 、 Python >= 3.6
- OS: Linux x86_64/macOS/Windows 10
##### 🔸 Install FastDeploy SDK with both CPU and GPU support
```bash
pip install fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```
##### [🔸 Conda Installation (Recommended✨)](docs/en/build_and_install/download_prebuilt_libraries.md)
```bash
conda config --add channels conda-forge && conda install cudatoolkit=11.2 cudnn=8.2
```
##### 🔸 Install FastDeploy SDK with only CPU support
```bash
pip install fastdeploy-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
```
#### 🎇 Python Inference Example
* Prepare model and picture
```bash
wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz
tar xvf ppyoloe_crn_l_300e_coco.tgz
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
```
* Test inference results
```python
# For deployment of GPU/TensorRT, please refer to examples/vision/detection/paddledetection/python
import cv2
import fastdeploy.vision as vision
im = cv2.imread("000000014439.jpg")
model = vision.detection.PPYOLOE("ppyoloe_crn_l_300e_coco/model.pdmodel",
"ppyoloe_crn_l_300e_coco/model.pdiparams",
"ppyoloe_crn_l_300e_coco/infer_cfg.yml")
result = model.predict(im)
print(result)
vis_im = vision.vis_detection(im, result, score_threshold=0.5)
cv2.imwrite("vis_image.jpg", vis_im)
```
</div></details>
<div id="fastdeploy-quick-start-cpp"></div>
<details>
<summary><b>A Quick Start for C++ SDK(click to expand)</b></summary><div>
#### 🎆 Installation
- Please refer to [C++ Prebuilt Libraries Download](docs/en/build_and_install/download_prebuilt_libraries.md)
#### 🎇 C++ Inference Example
* Prepare models and pictures
```bash
wget https://bj.bcebos.com/paddlehub/fastdeploy/ppyoloe_crn_l_300e_coco.tgz
tar xvf ppyoloe_crn_l_300e_coco.tgz
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
```
* Test inference results
```C++
// For GPU/TensorRT deployment, please refer to examples/vision/detection/paddledetection/cpp
#include "fastdeploy/vision.h"
int main(int argc, char* argv[]) {
namespace vision = fastdeploy::vision;
auto im = cv::imread("000000014439.jpg");
auto model = vision::detection::PPYOLOE("ppyoloe_crn_l_300e_coco/model.pdmodel",
"ppyoloe_crn_l_300e_coco/model.pdiparams",
"ppyoloe_crn_l_300e_coco/infer_cfg.yml");
vision::DetectionResult res;
model.Predict(&im, &res);
auto vis_im = vision::VisDetection(im, res, 0.5);
cv::imwrite("vis_image.jpg", vis_im);
return 0;
}
```
</div></details>
For more deployment models, please refer to [Vision Model Deployment Examples](examples/vision) .
<div id="fastdeploy-server-models"></div>
## ✴️ ✴️ Server-side and Cloud Model List ✴️ ✴️
Notes: ✅: already supported; ❔: to be supported in the future; N/A: Not Available;
<details open><summary><b> Server-side and cloud model list(click to fold)</b></summary><div>
<div align="center">
<img src="https://user-images.githubusercontent.com/115439700/212801271-5621419f-3997-4f00-94d5-63c8b6474aa8.png" height = "40"/>
</div>
| Task | Model | Linux | Linux | Win | Win | Mac | Mac | Linux | Linux | Linux | Linux | Linux | Linux | Linux |
|:----------------------:|:--------------------------------------------------------------------------------------------:|:------------------------------------------------:|:----------:|:-------:|:----------:|:-------:|:-------:|:-----------:|:---------------:|:-------------:|:-------------:|:-------:|:-------:|:-------:|
| --- | --- | X86 CPU | NVIDIA GPU | X86 CPU | NVIDIA GPU | X86 CPU | Arm CPU | AArch64 CPU | Phytium D2000 aarch64 | [NVIDIA Jetson](./docs/en/build_and_install/jetson.md) | [Graphcore IPU](./docs/en/build_and_install/ipu.md) | [kunlunxin XPU](./docs/en/build_and_install/kunlunxin.md) |[Huawei Ascend](./docs/en/build_and_install/huawei_ascend.md) | [Serving](./serving) |
| Classification | [PaddleClas/ResNet50](./examples/vision/classification/paddleclas) | [](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ |
| Classification | [TorchVison/ResNet](examples/vision/classification/resnet) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ❔ |
| Classification | [ltralytics/YOLOv5Cls](examples/vision/classification/yolov5cls) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ |❔ |
| Classification | [PaddleClas/PP-LCNet](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/PP-LCNetv2](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/EfficientNet](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/GhostNet](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/MobileNetV1](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/MobileNetV2](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/MobileNetV3](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/ShuffleNetV2](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/SqueeezeNetV1.1](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/Inceptionv3](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ✅ |
| Classification | [PaddleClas/PP-HGNet](./examples/vision/classification/paddleclas) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ |
| Detection | 🔥🔥[PaddleDetection/PP-YOLOE+](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ✅ |
| Detection | [🔥PaddleDetection/YOLOv8](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ❔ |
| Detection | [🔥ultralytics/YOLOv8](./examples/vision/detection/yolov8) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ |❔ | ❔ |
| Detection | [PaddleDetection/PicoDet](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ |
| Detection | [PaddleDetection/YOLOX](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ | ✅ |
| Detection | [PaddleDetection/YOLOv3](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ | ✅ |
| Detection | [PaddleDetection/PP-YOLO](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ | ✅ |
| Detection | [PaddleDetection/PP-YOLOv2](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ | ✅ |
| Detection | [PaddleDetection/Faster-RCNN](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |❔ | ✅ |
| Detection | [PaddleDetection/Mask-RCNN](./examples/vision/detection/paddledetection) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |❔ | ✅ |
| Detection | [Megvii-BaseDetection/YOLOX](./examples/vision/detection/yolox) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ❔ |
| Detection | [WongKinYiu/YOLOv7](./examples/vision/detection/yolov7) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ❔ |
| Detection | [WongKinYiu/YOLOv7end2end_trt](./examples/vision/detection/yolov7end2end_trt) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ |❔ | ❔ |
| Detection | [WongKinYiu/YOLOv7end2end_ort](./examples/vision/detection/yolov7end2end_ort) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ |❔ | ❔ |
| Detection | [meituan/YOLOv6](./examples/vision/detection/yolov6) | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ |
| Detection | [ultralytics/YOLOv5](./examples/vision/detection/yolov5) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ |✅ |
| Detection | [WongKinYiu/YOLOR](./examples/vision/detection/yolor) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ✅ | ❔ |
| Detection | [WongKinYiu/ScaledYOLOv4](./examples/vision/detection/scaledyolov4) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ |❔ | ❔ |
| Detection | [ppogg/YOLOv5Lite](./examples/vision/detection/yolov5lite) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ? | ❔ |❔ |❔ |
| Detection | [RangiLyu/NanoDetPlus](./examples/vision/detection/nanodet_plus) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ |❔ | ❔ |
| Perception | [Paddle3D/Smoke](./examples/vision/perception/paddle3d/smoke) | ❔ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ | ❔ | ❔ | ❔ | ❔ |❔ | ✅ |
| KeyPoint | [PaddleDetection/TinyPose](./examples/vision/keypointdetection/tiny_pose) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |❔ | ❔ |
| KeyPoint | [PaddleDetection/PicoDet + TinyPose](./examples/vision/keypointdetection/det_keypoint_unite) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ |❔ |
| HeadPose | [omasaht/headpose](examples/vision/headpose) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ |❔ |
| Tracking | [PaddleDetection/PP-Tracking](examples/vision/tracking/pptracking) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| OCR | [PaddleOCR/PP-OCRv2](./examples/vision/ocr) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ |✅ | ❔ |
| OCR | [PaddleOCR/PP-OCRv3](./examples/vision/ocr) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ✅ |
| Segmentation | [PaddleSeg/PP-LiteSeg](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ |❔ | ❔ |
| Segmentation | [PaddleSeg/PP-HumanSegLite](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ |✅ | ❔ |
| Segmentation | [PaddleSeg/HRNet](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ | ✅ |❔ |
| Segmentation | [PaddleSeg/PP-HumanSegServer](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ | ✅ |❔ |
| Segmentation | [PaddleSeg/Unet](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ✅ | ✅ | ✅ |❔ |
| Segmentation | [PaddleSeg/Deeplabv3](./examples/vision/segmentation/paddleseg) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ✅ | ✅ |❔ |
| FaceDetection | [biubug6/RetinaFace](./examples/vision/facedet/retinaface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ | ❔ |
| FaceDetection | [Linzaer/UltraFace](./examples/vision/facedet/ultraface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceDetection | [deepcam-cn/YOLOv5Face](./examples/vision/facedet/yolov5face) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceDetection | [insightface/SCRFD](./examples/vision/facedet/scrfd) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceAlign | [Hsintao/PFLD](examples/vision/facealign/pfld) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceAlign | [Single430/FaceLandmark1000](./examples/vision/facealign/face_landmark_1000) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ | ❔ |
| FaceAlign | [jhb86253817/PIPNet](./examples/vision/facealign) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceRecognition | [insightface/ArcFace](./examples/vision/faceid/insightface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceRecognition | [insightface/CosFace](./examples/vision/faceid/insightface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ |❔ |
| FaceRecognition | [insightface/PartialFC](./examples/vision/faceid/insightface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ | ❔ |
| FaceRecognition | [insightface/VPL](./examples/vision/faceid/insightface) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ | ❔ |
| Matting | [ZHKKKe/MODNet](./examples/vision/matting/modnet) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ |❔ |
| Matting | [PeterL1n/RobustVideoMatting]() | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ | ❔ |
| Matting | [PaddleSeg/PP-Matting](./examples/vision/matting/ppmatting) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ | ❔ |
| Matting | [PaddleSeg/PP-HumanMatting](./examples/vision/matting/modnet) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ |✅ |❔ |
| Matting | [PaddleSeg/ModNet](./examples/vision/matting/modnet) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ |❔ | ❔ |
| Video Super-Resolution | [PaddleGAN/BasicVSR](./) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | ❔ |❔ |
| Video Super-Resolution | [PaddleGAN/EDVR](./examples/vision/sr/edvr) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ |❔ | ❔ |
| Video Super-Resolution | [PaddleGAN/PP-MSVSR](./examples/vision/sr/ppmsvsr) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ |❔ | ❔ |
| Information Extraction | [PaddleNLP/UIE](./examples/text/uie) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ |❔ | |
| NLP | [PaddleNLP/ERNIE-3.0](./examples/text/ernie-3.0) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | ❔ | ✅ |❔ | ✅ |
| Speech | [PaddleSpeech/PP-TTS](./examples/audio/pp-tts) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❔ | -- |❔ |❔ | ✅ |
</div></details>
<div id="fastdeploy-edge-doc"></div>
## 📳 Mobile and Edge Device Deployment
<div id="fastdeploy-edge-models"></div>
<details open><summary><b> Mobile and Edge Model Listclick to fold</b></summary><div>
<div align="center">
<img src="https://user-images.githubusercontent.com/115439700/212801271-5621419f-3997-4f00-94d5-63c8b6474aa8.png" height = "40"/>
</div>
| Task | Model | Size(MB) | Linux | Android | Linux | Linux | Linux | Linux | Linux | TBD ... |
|:------------------:|:-----------------------------------------------------------------------------------------:|:--------:|:-------:|:-------:|:-------:|:-----------------------:|:------------------------------:|:---------------------------:|:--------------------------------:|:-------:|
| --- | --- | --- | ARM CPU | [ARM CPU](./java/android) | [Rockchip NPU<br>RK3588/RK3568/RK3566](./docs/en/build_and_install/rknpu2.md) | [Rockchip NPU<br>RV1109/RV1126/RK1808](./docs/en/build_and_install/rv1126.md) | [Amlogic NPU <br>A311D/S905D/C308X](./docs/en/build_and_install/a311d.md) | NXP NPU<br>i.MX&nbsp;8M&nbsp;Plus | TBD... |
| Classification | [PaddleClas/ResNet50](examples/vision/classification/paddleclas) | 98 | ✅ | ✅ | [](./examples/vision/classification/paddleclas/rknpu2) | ✅ | | | |
| Classification | [PaddleClas/PP-LCNet](examples/vision/classification/paddleclas) | 11.9 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/PP-LCNetv2](examples/vision/classification/paddleclas) | 26.6 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/EfficientNet](examples/vision/classification/paddleclas) | 31.4 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/GhostNet](examples/vision/classification/paddleclas) | 20.8 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/MobileNetV1](examples/vision/classification/paddleclas) | 17 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/MobileNetV2](examples/vision/classification/paddleclas) | 14.2 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/MobileNetV3](examples/vision/classification/paddleclas) | 22 | ✅ | ✅ | ❔ | ✅ | ❔ | ❔ | -- |
| Classification | [PaddleClas/ShuffleNetV2](examples/vision/classification/paddleclas) | 9.2 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/SqueezeNetV1.1](examples/vision/classification/paddleclas) | 5 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/Inceptionv3](examples/vision/classification/paddleclas) | 95.5 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Classification | [PaddleClas/PP-HGNet](examples/vision/classification/paddleclas) | 59 | ✅ | ✅ | ❔ | ✅ | -- | -- | -- |
| Detection | [PaddleDetection/PicoDet_s](examples/vision/detection/paddledetection) | 4.9 | ✅ | ✅ | [](./examples/vision/detection/paddledetection/rknpu2) | ✅ | ✅ | ✅ | -- |
| Detection | [YOLOv5](./examples/vision/detection/rkyolo) | | ❔ | ❔ | [](./examples/vision/detection/rkyolo) | ❔ | ❔ | ❔ | -- |
| Face Detection | [deepinsight/SCRFD](./examples/vision/facedet/scrfd) | 2.5 | ✅ | ✅ | [](./examples/vision/facedet/scrfd/rknpu2) | -- | -- | -- | -- |
| Keypoint Detection | [PaddleDetection/PP-TinyPose](examples/vision/keypointdetection/tiny_pose) | 5.5 | ✅ | ✅ | ❔ | ❔ | ❔ | ❔ | -- |
| Segmentation | [PaddleSeg/PP-LiteSeg(STDC1)](examples/vision/segmentation/paddleseg) | 32.2 | ✅ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | -- | -- | -- | -- |
| Segmentation | [PaddleSeg/PP-HumanSeg-Lite](examples/vision/segmentation/paddleseg) | 0.556 | ✅ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | -- | -- | -- | -- |
| Segmentation | [PaddleSeg/HRNet-w18](examples/vision/segmentation/paddleseg) | 38.7 | ✅ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | -- | -- | -- | -- |
| Segmentation | [PaddleSeg/PP-HumanSeg](examples/vision/segmentation/paddleseg) | 107.2 | ✅ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | -- | -- | -- | -- |
| Segmentation | [PaddleSeg/Unet](examples/vision/segmentation/paddleseg) | 53.7 | ✅ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | -- | -- | -- | -- |
| Segmentation | [PaddleSeg/Deeplabv3](examples/vision/segmentation/paddleseg) | 150 | ❔ | ✅ | [](./examples/vision/segmentation/paddleseg/rknpu2) | | | | |
| OCR | [PaddleOCR/PP-OCRv2](examples/vision/ocr/PP-OCRv2) | 2.3+4.4 | ✅ | ✅ | ❔ | -- | -- | -- | -- |
| OCR | [PaddleOCR/PP-OCRv3](examples/vision/ocr/PP-OCRv3) | 2.4+10.6 | ✅ | ❔ | ❔ | ❔ | ❔ | ❔ | -- |
</div></details>
## ⚛️ Web and Mini Program Model List
<div id="fastdeploy-web-models"></div>
<details open><summary><b> Web and mini program model list(click to fold)</b></summary><div>
| Task | Model | [web_demo](examples/application/js/web_demo) |
|:------------------:|:-------------------------------------------------------------------------------------------:|:--------------------------------------------:|
| --- | --- | [Paddle.js](examples/application/js) |
| Detection | [FaceDetection](examples/application/js/web_demo/src/pages/cv/detection) | ✅ |
| Detection | [ScrewDetection](examples/application/js/web_demo/src/pages/cv/detection) | ✅ |
| Segmentation | [PaddleSeg/HumanSeg](./examples/application/js/web_demo/src/pages/cv/segmentation/HumanSeg) | ✅ |
| Object Recognition | [GestureRecognition](examples/application/js/web_demo/src/pages/cv/recognition) | ✅ |
| Object Recognition | [ItemIdentification](examples/application/js/web_demo/src/pages/cv/recognition) | ✅ |
| OCR | [PaddleOCR/PP-OCRv3](./examples/application/js/web_demo/src/pages/cv/ocr) | ✅ |
</div></details>
## 💐 Acknowledge
<div id="fastdeploy-acknowledge"></div>
We sincerely appreciate the open-sourced capabilities in [EasyEdge](https://ai.baidu.com/easyedge/app/openSource) as we adopt it for the SDK generation and download in this project.
## ©️ License
<div id="fastdeploy-license"></div>
FastDeploy is provided under the [Apache-2.0](./LICENSE).

File diff suppressed because it is too large Load Diff

View File

@@ -1 +0,0 @@
0.0.0

View File

@@ -1,121 +0,0 @@
PROJECT(infer_demo C CXX)
CMAKE_MINIMUM_REQUIRED (VERSION 3.10)
# specify the decompress directory of FastDeploy SDK
option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
include(${FASTDEPLOY_INSTALL_DIR}/utils/gflags.cmake)
include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
include_directories(${FASTDEPLOY_INCS})
add_executable(benchmark ${PROJECT_SOURCE_DIR}/benchmark.cc)
add_executable(benchmark_yolov5 ${PROJECT_SOURCE_DIR}/benchmark_yolov5.cc)
add_executable(benchmark_ppyolov5 ${PROJECT_SOURCE_DIR}/benchmark_ppyolov5.cc)
add_executable(benchmark_ppyolov6 ${PROJECT_SOURCE_DIR}/benchmark_ppyolov6.cc)
add_executable(benchmark_ppyolov7 ${PROJECT_SOURCE_DIR}/benchmark_ppyolov7.cc)
add_executable(benchmark_ppyolov8 ${PROJECT_SOURCE_DIR}/benchmark_ppyolov8.cc)
add_executable(benchmark_ppyolox ${PROJECT_SOURCE_DIR}/benchmark_ppyolox.cc)
add_executable(benchmark_ppyoloe ${PROJECT_SOURCE_DIR}/benchmark_ppyoloe.cc)
add_executable(benchmark_picodet ${PROJECT_SOURCE_DIR}/benchmark_picodet.cc)
add_executable(benchmark_ppcls ${PROJECT_SOURCE_DIR}/benchmark_ppcls.cc)
add_executable(benchmark_ppseg ${PROJECT_SOURCE_DIR}/benchmark_ppseg.cc)
add_executable(benchmark_ppmatting ${PROJECT_SOURCE_DIR}/benchmark_ppmatting.cc)
add_executable(benchmark_ppocr_det ${PROJECT_SOURCE_DIR}/benchmark_ppocr_det.cc)
add_executable(benchmark_ppocr_cls ${PROJECT_SOURCE_DIR}/benchmark_ppocr_cls.cc)
add_executable(benchmark_ppocr_rec ${PROJECT_SOURCE_DIR}/benchmark_ppocr_rec.cc)
add_executable(benchmark_structurev2_table ${PROJECT_SOURCE_DIR}/benchmark_structurev2_table.cc)
add_executable(benchmark_structurev2_layout ${PROJECT_SOURCE_DIR}/benchmark_structurev2_layout.cc)
add_executable(benchmark_ppyoloe_r ${PROJECT_SOURCE_DIR}/benchmark_ppyoloe_r.cc)
add_executable(benchmark_ppyoloe_r_sophgo ${PROJECT_SOURCE_DIR}/benchmark_ppyoloe_r_sophgo.cc)
add_executable(benchmark_ppyolo ${PROJECT_SOURCE_DIR}/benchmark_ppyolo.cc)
add_executable(benchmark_yolov3 ${PROJECT_SOURCE_DIR}/benchmark_yolov3.cc)
add_executable(benchmark_fasterrcnn ${PROJECT_SOURCE_DIR}/benchmark_fasterrcnn.cc)
add_executable(benchmark_maskrcnn ${PROJECT_SOURCE_DIR}/benchmark_maskrcnn.cc)
add_executable(benchmark_ssd ${PROJECT_SOURCE_DIR}/benchmark_ssd.cc)
add_executable(benchmark_rtmdet ${PROJECT_SOURCE_DIR}/benchmark_rtmdet.cc)
add_executable(benchmark_cascadercnn ${PROJECT_SOURCE_DIR}/benchmark_cascadercnn.cc)
add_executable(benchmark_fcos ${PROJECT_SOURCE_DIR}/benchmark_fcos.cc)
add_executable(benchmark_gfl ${PROJECT_SOURCE_DIR}/benchmark_gfl.cc)
add_executable(benchmark_retinanet ${PROJECT_SOURCE_DIR}/benchmark_retinanet.cc)
add_executable(benchmark_tood ${PROJECT_SOURCE_DIR}/benchmark_tood.cc)
add_executable(benchmark_ttfnet ${PROJECT_SOURCE_DIR}/benchmark_ttfnet.cc)
add_executable(benchmark_ppdet ${PROJECT_SOURCE_DIR}/benchmark_ppdet.cc)
add_executable(benchmark_dino ${PROJECT_SOURCE_DIR}/benchmark_dino.cc)
add_executable(benchmark_ppshituv2_rec ${PROJECT_SOURCE_DIR}/benchmark_ppshituv2_rec.cc)
add_executable(benchmark_ppshituv2_det ${PROJECT_SOURCE_DIR}/benchmark_ppshituv2_det.cc)
if(UNIX AND (NOT APPLE) AND (NOT ANDROID))
target_link_libraries(benchmark ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_yolov5 ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppyolov5 ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppyolov6 ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppyolov7 ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppyolov8 ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppyolox ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppyoloe ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppyoloe_r ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppyoloe_r_sophgo ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_picodet ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppcls ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppseg ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppmatting ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppocr_det ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppocr_cls ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppocr_rec ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_structurev2_table ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_structurev2_layout ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppyolo ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_yolov3 ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_fasterrcnn ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_maskrcnn ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ssd ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_rtmdet ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_cascadercnn ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_fcos ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_gfl ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_retinanet ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_tood ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ttfnet ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppdet ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_dino ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppshituv2_rec ${FASTDEPLOY_LIBS} gflags pthread)
target_link_libraries(benchmark_ppshituv2_det ${FASTDEPLOY_LIBS} gflags pthread)
else()
target_link_libraries(benchmark ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_yolov5 ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppyolov5 ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppyolov6 ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppyolov7 ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppyolov8 ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppyolox ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppyoloe ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppyoloe_r ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppyoloe_r_sophgo ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_picodet ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppcls ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppseg ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppmatting ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppocr_det ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppocr_cls ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppocr_rec ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_structurev2_table ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_structurev2_layout ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppyolo ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_yolov3 ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_fasterrcnn ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_maskrcnn ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ssd ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_rtmdet ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_cascadercnn ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_fcos ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_gfl ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_retinanet ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_tood ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ttfnet ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppdet ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_dino ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppshituv2_rec ${FASTDEPLOY_LIBS} gflags)
target_link_libraries(benchmark_ppshituv2_det ${FASTDEPLOY_LIBS} gflags)
endif()
# only for Android ADB test
if(ANDROID)
install_fastdeploy_libraries(${CMAKE_CURRENT_BINARY_DIR})
endif()

View File

@@ -1,217 +0,0 @@
# FastDeploy C++ Benchmarks
## 1. 编译选项
以下选项为benchmark相关的编译选项在编译用来跑benchmark的sdk时必须开启。
|选项|需要设置的值|说明|
|---|---|---|
| ENABLE_BENCHMARK | ON | 默认OFF, 是否打开BENCHMARK模式 |
| ENABLE_VISION | ON | 默认OFF是否编译集成视觉模型的部署模块 |
| ENABLE_TEXT | ON | 默认OFF是否编译集成文本NLP模型的部署模块 |
运行FastDeploy C++ Benchmark需先准备好相应的环境并在ENABLE_BENCHMARK=ON模式下从源码编译FastDeploy C++ SDK. 以下将按照硬件维度,来说明相应的系统环境要求。不同环境下的详细要求,请参考[FastDeploy环境要求](../../docs/cn/build_and_install)
## 2. Benchmark 设置说明
具体flags.h提供选项如下:
<div id="选项设置说明"></div>
| 选项 | 作用 |
| -------------------- | ------------------------------------------ |
| --model | 模型路径 |
| --image | 图片路径 |
| --config_path | config.txt路径包含具体设备、后端等信息 |
具体config.txt包含信息含义如下:
<div id="参数设置说明"></div>
| 参数 | 作用 |
| -------------------- | ------------------------------------------ |
| device | 选择 CPU/GPU/XPU默认为 CPU |
| device_id | GPU/XPU 卡号,默认为 0 |
| cpu_thread_nums | CPU 线程数,默认为 1 |
| warmup | 跑benchmark的warmup次数默认为 200 |
| repeat | 跑benchmark的循环次数默认为 1000 |
| backend | 指定后端类型有default, ort, ov, trt, paddle, paddle_trt, lite 等为default时会自动选择最优后端推荐设置为显式设置明确的backend。默认为 default |
| profile_mode | 指定需要测试性能的模式,可选值为`[runtime, end2end]`,默认为 runtime |
| include_h2d_d2h | 是否把H2D+D2H的耗时统计在内该参数只在profile_mode为runtime时有效默认为 false |
| use_fp16 | 是否开启fp16当前只对 trt, paddle-trt, lite后端有效默认为 false |
| collect_memory_info | 是否记录 cpu/gpu memory信息默认 false |
| sampling_interval | 记录 cpu/gpu memory信息采样时间间隔单位ms默认为 50 |
| precision_compare | 是否进行精度比较,默认为 false |
| result_path | 记录 Benchmark 数据的 txt 文件路径 |
| xpu_l3_cache | 设置XPU L3 Cache大小默认值为0。设置策略对于 昆仑2 XPU R200L3 Cache可用的最大值为 62914560对于 昆仑1 XPU 则为 16776192 |
## 3. X86_64 CPU 和 NVIDIA GPU 环境下运行 Benchmark
### 3.1 环境准备
Linux上编译需满足:
- gcc/g++ >= 5.4(推荐8.2)
- cmake >= 3.18.0
- CUDA >= 11.2
- cuDNN >= 8.2
- TensorRT >= 8.5
在GPU上编译FastDeploy需要准备好相应的CUDA环境以及TensorRT详细文档请参考[GPU编译文档](https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/cn/build_and_install/gpu.md)。
### 3.2 编译FastDeploy C++ SDK
```bash
# 源码编译SDK
git clone https://github.com/PaddlePaddle/FastDeploy.git -b develop
cd FastDeploy
mkdir build && cd build
cmake .. -DWITH_GPU=ON \
-DENABLE_ORT_BACKEND=ON \
-DENABLE_PADDLE_BACKEND=ON \
-DENABLE_OPENVINO_BACKEND=ON \
-DENABLE_TRT_BACKEND=ON \
-DENABLE_VISION=ON \
-DENABLE_TEXT=ON \
-DENABLE_BENCHMARK=ON \ # 开启benchmark模式
-DTRT_DIRECTORY=/Paddle/TensorRT-8.5.2.2 \
-DCUDA_DIRECTORY=/usr/local/cuda \
-DCMAKE_INSTALL_PREFIX=${PWD}/compiled_fastdeploy_sdk
make -j12
make install
# 配置SDK路径
cd ..
export FD_GPU_SDK=${PWD}/build/compiled_fastdeploy_sdk
```
### 3.3 编译 Benchmark 示例
```bash
cd benchmark/cpp
mkdir build && cd build
cmake .. -DFASTDEPLOY_INSTALL_DIR=${FD_GPU_SDK}
make -j4
```
### 3.4 运行 Benchmark 示例
在X86 CPU + NVIDIA GPU下FastDeploy 目前支持多种推理后端,下面以 PaddleYOLOv8 为例,跑出多后端在 CPU/GPU 对应 benchmark 数据。
- 下载模型文件和测试图片
```bash
wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov8_s_500e_coco.tgz
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg
tar -zxvf yolov8_s_500e_coco.tgz
```
- 运行 yolov8 benchmark 示例
```bash
# 统计性能用户根据需求修改config.txt文件具体含义参考上表
# eg如果想测paddle gpu backend将device改为gpubackend修改为paddle即可
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --config_path config.txt
```
注意为避免对性能统计产生影响测试性能时最好不要开启内存显存统计的功能当把collect_memory_info参数设置为true时只有内存显存参数是稳定可靠的。更多参数设置请参考[参数设置说明](#参数设置说明)
## 4. 各个硬件上的一键运行脚本
在准备好相关的环境配置和SDK后可以使用本目录提供的脚本一键运行后的benchmark数据。
- 获取模型和资源文件
```bash
./get_models.sh
```
- 运行benchmark脚本
```bash
# x86 CPU Paddle backend fp32
./benchmark_x86.sh config/config.x86.paddle.fp32.txt
# x86 CPU ONNXRuntime backend fp32
./benchmark_x86.sh config/config.x86.ort.fp32.txt
# x86 CPU OpenVIVO backend fp32
./benchmark_x86.sh config/config.x86.ov.fp32.txt
# NVIDIA GPU Paddle backend fp32
./benchmark_gpu.sh config/config.gpu.paddle.fp32.txt
# NVIDIA GPU ONNXRuntime backend fp32
./benchmark_gpu.sh config/config.gpu.ort.fp32.txt
# NVIDIA GPU Paddle-TRT backend fp32
./benchmark_gpu_trt.sh config/config.gpu.paddle_trt.fp32.txt
# NVIDIA GPU Paddle-TRT backend fp16
./benchmark_gpu_trt.sh config/config.gpu.paddle_trt.fp16.txt
# NVIDIA GPU TRT backend fp32
./benchmark_gpu_trt.sh config/config.gpu.trt.fp32.txt
# NVIDIA GPU TRT backend fp16
./benchmark_gpu_trt.sh config/config.gpu.trt.fp16.txt
# Arm CPU Paddle Lite backend fp32
./benchmark_arm.sh config/config.arm.lite.fp32.txt
# Arm CPU Paddle Lite backend fp16
./benchmark_arm.sh config/config.arm.lite.fp16.txt
# XPU Paddle Lite backend fp32
./benchmark_xpu.sh config/config.xpu.lite.fp32.txt
```
## 5. Benchmark工具用法
FastDeploy除了提供包含模型前后处理在内的benchmark_xxx外也提供常规的benchmark工具以支持对任意模型进行benchmark。在编译benchmark目录的源码之后会生成一个benchmark可执行文件该工具支持[选项设置说明](#选项设置说明)中的所有参数,并且提供一些额外参数,便于使用,额外的参数说明如下。注意:该工具仅支持测试纯模型推理时间和推理+H2D+D2H耗时当config.txt中include_h2d_d2h为true时不支持测试包含前后处理在内的时间。
| 参数 | 作用 |
| -------------------- | ------------------------------------------ |
| shapes | Set input shape for model, default "1,3,224,224" |
| names | Set input names for model, default "DEFAULT" |
| dtypes | Set input dtypes for model, default "FP32" |
| trt_shapes | Set min/opt/max shape for trt/paddle_trt backend. eg:--trt_shape 1,3,224,224:1,3,224,224:1,3,224,224", default "1,3,224,224:1,3,224,224:1,3,224,224" |
| batch | TensorRT max batch size, default=1 |
| dump | Whether to dump output tensors, default false. |
| info | Only check the input infos of model, default false. |
| diff | Check the diff between two tensors, default false. |
| tensors | The paths to dumped tensors, should look like "tensor_a.txt:tensor_b.txt"|
| mem | Whether to force to collect memory info, default false. |
| model_file | Optional, set specific model file, eg, model.pdmodel, model.onnx, default "UNKNOWN" |
| params_file | Optional, set specific params file, eg, model.pdiparams, default "" |
| model_format | Optional, set specific model format, eg, PADDLE/ONNX/RKNN/TORCHSCRIPT/SOPHGO, default "PADDLE" |
| disable_mkldnn | Whether to disable mkldnn for paddle backend, default false. |
### 5.1 benchmark工具使用示例
- 用法说明
```bash
./benchmark --helpshort
benchmark: ./benchmark -[info|diff|check|dump|mem] -model xxx -config_path xxx -[shapes|dtypes|names|tensors] -[model_file|params_file|model_format]
...
```
- 单输入示例:--model指定模型文件夹其中包括*.pdmodel/pdiparams文件
```bash
./benchmark --model ResNet50_vd_infer --config_path config/config.x86.ov.fp32.txt --shapes 1,3,224,224 --names inputs --dtypes FP32
```
- 单输入示例:--model_file, --params_file指定具体的模型文件和参数文件
```bash
./benchmark --model_file MobileNetV1_ssld_infer/inference.pdmodel --params_file MobileNetV1_ssld_infer/inference.pdiparams --config_path config/config.x86.ov.fp32.txt --shapes 1,3,224,224 --names inputs --dtypes FP32
```
- 多输入示例:
```bash
./benchmark --model yolov5_s_300e_coco --config_path config/config.arm.lite.fp32.txt --shapes 1,3,640,640:1,2 --names image:scale_factor --dtypes FP32:FP32
```
- Paddle-TRT示例
```bash
./benchmark --model ResNet50_vd_infer --config_path config/config.gpu.paddle_trt.fp16.txt --trt_shapes 1,3,224,224:1,3,224,224:1,3,224,224 --names inputs --dtypes FP32
```
- TensorRT/Paddle-TRT多输入示例
```bash
./benchmark --model rtdetr_r50vd_6x_coco --trt_shapes 1,2:1,2:1,2:1,3,640,640:1,3,640,640:1,3,640,640:1,2:1,2:1,2 --names im_shape:image:scale_factor --shapes 1,2:1,3,640,640:1,2 --config_path config/config.gpu.paddle_trt.fp32.txt --dtypes FP32:FP32:FP32
```
- 支持FD全部后端和全部模型格式--model_file, --params_file(optional), --model_format
```bash
# ONNX模型示例
./benchmark --model ResNet50_vd_infer --model_file inference.onnx --model_format ONNX --config_path config/config.gpu.trt.fp16.txt --trt_shapes 1,3,224,224:1,3,224,224:1,3,224,224 --names inputs --dtypes FP32
```
- 统计内显存占用:--mem 或 在config.txt中指定
```bash
./benchmark --mem --model ResNet50_vd_infer --config_path config/config.x86.ov.fp32.txt --shapes 1,3,224,224 --names inputs --dtypes FP32
```
- 推理并dump 输出 tensor用作对比 --dump
```bash
./benchmark --dump --model ResNet50_vd_infer --config_path config/config.x86.ov.fp32.txt --shapes 1,3,224,224 --names inputs --dtypes FP32
```
- 对比两个 dumped 的tensor : --diff
```bash
./benchmark --diff --tensors ov_linear_77.tmp_1.txt:lite_linear_77.tmp_1.txt
```
- 显示模型的输入信息: --info
```bash
./benchmark --info --model picodet_l_640_coco_lcnet --config_path config/config.arm.lite.fp32.txt
```

View File

@@ -1,404 +0,0 @@
// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "fastdeploy/function/functions.h"
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_string(shapes, "1,3,224,224",
"Required, set input shape for model."
"default 1,3,224,224");
DEFINE_string(names, "DEFAULT", "Required, set input names for model.");
DEFINE_string(dtypes, "FP32",
"Required, set input dtypes for model."
"default FP32.");
DEFINE_string(trt_shapes, "1,3,224,224:1,3,224,224:1,3,224,224",
"Optional, set min/opt/max shape for trt/paddle_trt."
"default 1,3,224,224:1,3,224,224:1,3,224,224");
DEFINE_int32(batch, 1,
"Optional, set trt max batch size, "
"default 1");
DEFINE_bool(dump, false,
"Optional, whether to dump output tensors, "
"default false.");
DEFINE_bool(info, false,
"Optional, only check the input infos of model."
"default false.");
DEFINE_bool(diff, false,
"Optional, check the diff between two tensors."
"default false.");
DEFINE_string(tensors, "tensor_a.txt:tensor_b.txt",
"Optional, the paths to dumped tensors, "
"default tensor_a.txt:tensor_b.txt");
DEFINE_bool(mem, false,
"Optional, whether to force to collect memory info, "
"default false.");
DEFINE_int32(interval, -1,
"Optional, sampling interval for collect memory info, "
"default false.");
DEFINE_string(model_format, "PADDLE",
"Optional, set specific model format,"
"eg, PADDLE/ONNX/RKNN/TORCHSCRIPT/SOPHGO"
"default PADDLE.");
DEFINE_bool(disable_mkldnn, false,
"Optional, disable mkldnn for paddle backend. "
"default false.");
DEFINE_string(optimized_model_dir, "",
"Optional, set optimized model dir for lite."
"eg: model.opt.nb, "
"default ''");
DEFINE_bool(collect_trt_shape_by_device, false,
"Optional, whether collect trt shape by device. "
"default false.");
DEFINE_double(custom_tensor_value, 1.0,
"Optional, set the value for fd tensor, "
"default 1.0");
DEFINE_bool(collect_trt_shape_by_custom_tensor_value, false,
"Optional, whether collect trt shape by custom tensor value. "
"default false.");
#if defined(ENABLE_BENCHMARK)
static std::vector<int64_t> GetInt64Shape(const std::vector<int>& shape) {
std::vector<int64_t> new_shape;
new_shape.resize(shape.size());
for (int i = 0; i < shape.size(); ++i) {
new_shape[i] = static_cast<int64_t>(shape[i]);
}
return new_shape;
}
static fastdeploy::ModelFormat GetModelFormat(const std::string& model_format) {
if (model_format == "PADDLE") {
return fastdeploy::ModelFormat::PADDLE;
} else if (model_format == "ONNX") {
return fastdeploy::ModelFormat::ONNX;
} else if (model_format == "RKNN") {
return fastdeploy::ModelFormat::RKNN;
} else if (model_format == "TORCHSCRIPT") {
return fastdeploy::ModelFormat::TORCHSCRIPT;
} else if (model_format == "SOPHGO") {
return fastdeploy::ModelFormat::SOPHGO;
} else {
return fastdeploy::ModelFormat::PADDLE;
}
}
static void CheckTensorDiff(int argc, char* argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
std::cout << "Check tensor diff ..." << std::endl;
std::vector<std::string> tensor_paths =
benchmark::ResultManager::SplitStr(FLAGS_tensors);
assert(tensor_paths.size() == 2);
fastdeploy::FDTensor tensor_a, tensor_b;
benchmark::ResultManager::LoadFDTensor(&tensor_a, tensor_paths[0]);
benchmark::ResultManager::LoadFDTensor(&tensor_b, tensor_paths[1]);
auto tensor_diff =
benchmark::ResultManager::CalculateDiffStatis(tensor_a, tensor_b);
std::cout << "Tensor diff: mean=" << tensor_diff.data.mean
<< ", max=" << tensor_diff.data.max
<< ", min=" << tensor_diff.data.min << std::endl;
}
static void RuntimeProfiling(int argc, char* argv[]) {
// Init runtime option
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return;
}
if (FLAGS_disable_mkldnn) {
option.paddle_infer_option.enable_mkldnn = false;
}
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
UpdateBaseCustomFlags(config_info); // see flags.h
// Init log recorder
std::stringstream ss;
ss.precision(6);
// Memory resource moniter
int sampling_interval = FLAGS_interval >= 1
? FLAGS_interval
: std::stoi(config_info["sampling_interval"]);
benchmark::ResourceUsageMonitor resource_moniter(
sampling_interval, std::stoi(config_info["device_id"]));
// Check model path and model format
std::string model_name, params_name, config_name;
std::string model_file, params_file;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (FLAGS_model_file != "UNKNOWN") {
// Set model file/param/format via command line
if (FLAGS_model != "") {
model_file = FLAGS_model + sep + FLAGS_model_file;
params_file = FLAGS_model + sep + FLAGS_params_file;
} else {
model_file = FLAGS_model_file;
params_file = FLAGS_params_file;
}
model_format = GetModelFormat(FLAGS_model_format);
if (model_format == fastdeploy::ModelFormat::PADDLE &&
FLAGS_params_file == "") {
if (config_info["backend"] != "lite") {
std::cout << "[ERROR] params_file can not be empty for PADDLE"
<< " format, Please, set your custom params_file manually."
<< std::endl;
return;
} else {
std::cout << "[INFO] Will using the lite light api for: " << model_file
<< std::endl;
}
}
} else {
// Set model file/param/format via model dir (only support
// for Paddle model format now)
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info, false)) {
return;
}
model_file = FLAGS_model + sep + model_name;
params_file = FLAGS_model + sep + params_name;
}
option.SetModelPath(model_file, params_file, model_format);
// Set opt model dir
if (config_info["backend"] == "lite") {
if (FLAGS_optimized_model_dir != "") {
option.paddle_lite_option.optimized_model_dir = FLAGS_optimized_model_dir;
} else {
option.paddle_lite_option.optimized_model_dir = FLAGS_model;
}
}
// Get input shapes/names/dtypes
std::vector<std::vector<int32_t>> input_shapes =
benchmark::ResultManager::GetInputShapes(FLAGS_shapes);
std::vector<std::string> input_names =
benchmark::ResultManager::GetInputNames(FLAGS_names);
std::vector<fastdeploy::FDDataType> input_dtypes =
benchmark::ResultManager::GetInputDtypes(FLAGS_dtypes);
// Set tensorrt shapes
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
option.paddle_infer_option.collect_trt_shape_by_device =
FLAGS_collect_trt_shape_by_device;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.max_batch_size = FLAGS_batch;
std::vector<std::vector<int32_t>> trt_shapes =
benchmark::ResultManager::GetInputShapes(FLAGS_trt_shapes);
if (input_names[0] == "DEFAULT") {
std::cout << "Please set the input names for TRT/Paddle-TRT backend!"
<< std::endl;
return;
}
assert(input_names.size() == (trt_shapes.size() / 3));
for (int i = 0; i < input_shapes.size(); ++i) {
option.trt_option.SetShape(input_names[i], trt_shapes[i * 3],
trt_shapes[i * 3 + 1], trt_shapes[i * 3 + 2]);
// Set custom input data for collect trt shapes
if (FLAGS_collect_trt_shape_by_custom_tensor_value) {
int min_shape_num = std::accumulate(trt_shapes[i * 3].begin(),
trt_shapes[i * 3].end(), 1,
std::multiplies<int>());
int opt_shape_num = std::accumulate(trt_shapes[i * 3 + 1].begin(),
trt_shapes[i * 3 + 1].end(), 1,
std::multiplies<int>());
int max_shape_num = std::accumulate(trt_shapes[i * 3 + 2].begin(),
trt_shapes[i * 3 + 2].end(), 1,
std::multiplies<int>());
std::vector<float> min_input_data(min_shape_num, FLAGS_custom_tensor_value);
std::vector<float> opt_input_data(opt_shape_num, FLAGS_custom_tensor_value);
std::vector<float> max_input_data(max_shape_num, FLAGS_custom_tensor_value);
option.trt_option.SetInputData(input_names[i], min_input_data,
opt_input_data, max_input_data);
}
}
}
// Init runtime
fastdeploy::Runtime runtime;
if (!runtime.Init(option)) {
std::cout << "Initial Runtime failed!" << std::endl;
}
// Check default input names
if (input_names[0] == "DEFAULT") {
input_names.clear();
for (int i = 0; i < runtime.NumInputs(); ++i) {
input_names.push_back(runtime.GetInputInfo(i).name);
}
}
assert(runtime.NumInputs() == input_shapes.size());
assert(runtime.NumInputs() == input_names.size());
assert(runtime.NumInputs() == input_dtypes.size());
// Feed inputs, all values set as 1.
std::vector<fastdeploy::FDTensor> inputs(runtime.NumInputs());
for (int i = 0; i < inputs.size(); ++i) {
fastdeploy::function::Full(
FLAGS_custom_tensor_value, GetInt64Shape(input_shapes[i]),
&inputs[i], input_dtypes[i]);
inputs[i].name = input_names[i];
}
// Start memory resource moniter
if (config_info["collect_memory_info"] == "true" || FLAGS_mem) {
resource_moniter.Start();
}
// Run runtime profiling
std::vector<fastdeploy::FDTensor> outputs;
if (!runtime.Infer(inputs, &outputs)) {
std::cerr << "Failed to predict." << std::endl;
ss << "Runtime(ms): Failed" << std::endl;
if (config_info["collect_memory_info"] == "true") {
ss << "cpu_rss_mb: Failed" << std::endl;
ss << "gpu_rss_mb: Failed" << std::endl;
ss << "gpu_util: Failed" << std::endl;
resource_moniter.Stop();
}
benchmark::ResultManager::SaveBenchmarkResult(ss.str(),
config_info["result_path"]);
return;
}
double profile_time = runtime.GetProfileTime() * 1000.0;
std::cout << "Runtime(ms): " << profile_time << "ms." << std::endl;
ss << "Runtime(ms): " << profile_time << "ms." << std::endl;
// Collect memory info
if (config_info["collect_memory_info"] == "true" || FLAGS_mem) {
float cpu_mem = resource_moniter.GetMaxCpuMem();
float gpu_mem = resource_moniter.GetMaxGpuMem();
float gpu_util = resource_moniter.GetMaxGpuUtil();
std::cout << "cpu_rss_mb: " << cpu_mem << "MB." << std::endl;
ss << "cpu_rss_mb: " << cpu_mem << "MB." << std::endl;
std::cout << "gpu_rss_mb: " << gpu_mem << "MB." << std::endl;
ss << "gpu_rss_mb: " << gpu_mem << "MB." << std::endl;
std::cout << "gpu_util: " << gpu_util << std::endl;
ss << "gpu_util: " << gpu_util << "MB." << std::endl;
resource_moniter.Stop();
}
benchmark::ResultManager::SaveBenchmarkResult(ss.str(),
config_info["result_path"]);
// Dump output tensors
if (FLAGS_dump) {
for (int i = 0; i < outputs.size(); ++i) {
auto name_tokens =
benchmark::ResultManager::SplitStr(outputs[i].name, '/');
std::string out_name = name_tokens[0];
for (int j = 1; j < name_tokens.size(); ++j) {
out_name += "_";
out_name += name_tokens[j];
}
std::string out_file = config_info["backend"] + "_" + out_name + ".txt";
benchmark::ResultManager::SaveFDTensor(outputs[i], out_file);
outputs[i].PrintInfo();
std::cout << "Saved: " << out_file << std::endl;
}
}
}
static void showInputInfos(int argc, char* argv[]) {
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return;
}
if (FLAGS_disable_mkldnn) {
option.paddle_infer_option.enable_mkldnn = false;
}
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
std::string model_file, params_file;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (FLAGS_model_file != "UNKNOWN") {
// Set model file/param/format via command line
if (FLAGS_model != "") {
model_file = FLAGS_model + sep + FLAGS_model_file;
params_file = FLAGS_model + sep + FLAGS_params_file;
} else {
model_file = FLAGS_model_file;
params_file = FLAGS_params_file;
}
model_format = GetModelFormat(FLAGS_model_format);
if (model_format == fastdeploy::ModelFormat::PADDLE &&
FLAGS_params_file == "") {
if (config_info["backend"] != "lite") {
std::cout << "[ERROR] params_file can not be empty for PADDLE"
<< " format, Please, set your custom params_file manually."
<< std::endl;
return;
} else {
std::cout << "[INFO] Will using the lite light api for: " << model_file
<< std::endl;
}
}
} else {
// Set model file/param/format via model dir (only support
// for Paddle model format now)
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info, false)) {
return;
}
model_file = FLAGS_model + sep + model_name;
params_file = FLAGS_model + sep + params_name;
}
option.SetModelPath(model_file, params_file, model_format);
// Init runtime
fastdeploy::Runtime runtime;
if (!runtime.Init(option)) {
std::cout << "Initial Runtime failed!" << std::endl;
}
// Show input tensor infos
auto input_infos = runtime.GetInputInfos();
for (int i = 0; i < input_infos.size(); ++i) {
std::cout << input_infos[i] << std::endl;
}
}
#endif
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK)
google::SetVersionString("0.0.0");
google::SetUsageMessage(
"./benchmark -[info|diff|check|dump|mem] -model xxx -config_path xxx "
"-[shapes|dtypes|names|tensors] -[model_file|params_file|model_format]");
google::ParseCommandLineFlags(&argc, &argv, true);
if (FLAGS_diff) {
CheckTensorDiff(argc, argv);
return 0;
} else if (FLAGS_info) {
showInputInfos(argc, argv);
return 0;
} else {
RuntimeProfiling(argc, argv);
return 0;
}
#endif
return 0;
}

View File

@@ -1,122 +0,0 @@
# Run all models specify hardware and specify backend
export LD_LIBRARY_PATH=${PWD}:$LD_LIBRARY_PATH
CONFIG_PATH="config.arm.lite.fp32.txt"
if [ ! "$1" = "$CONFIG_PATH" ]; then
if [ -f "$1" ]; then
CONFIG_PATH="$1"
fi
fi
DEFAULT_INTERLVAL=30
if [ ! "$2" = "" ]; then
DEFAULT_INTERLVAL=$2
fi
sleep_seconds() {
sleep_interval=$DEFAULT_INTERLVAL
if [ ! "$1" = "" ]; then
sleep_interval=$1
fi
echo "[INFO][SLEEP] --- Sleep $sleep_interval seconds for Arm CPU mobile to prevent the phone from overheating ..."
sleep $sleep_interval
}
# PaddleClas
./benchmark_ppcls --model PPLCNet_x1_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model PPLCNetV2_base_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model EfficientNetB7_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model EfficientNetB0_small_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model GhostNet_x0_5_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model GhostNet_x1_3_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model GhostNet_x1_3_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model MobileNetV1_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model MobileNetV2_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model MobileNetV2_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model MobileNetV3_small_x0_35_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model MobileNetV3_large_x1_0_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model ShuffleNetV2_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model ShuffleNetV2_x2_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model SqueezeNet1_1_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model InceptionV3_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model ResNet50_vd_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model PPHGNet_tiny_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model PPHGNet_base_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model ResNet50_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model EfficientNetB0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model MobileNetV2_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model MobileNetV3_small_x1_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model ViT_large_patch16_224_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model ResNeXt50_32x4d_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model DenseNet121_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model PPHGNet_small_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppcls --model person_exists_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH && sleep_seconds
# PaddleOCR
./benchmark_ppocr_det --model ch_PP-OCRv3_det_infer --image 12.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppocr_cls --model ch_ppocr_mobile_v2.0_cls_infer --image rec_img.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppocr_rec --model ch_PP-OCRv3_rec_infer --image rec_img.jpg --rec_label_file ppocr_keys_v1.txt --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppocr_det --model ch_PP-OCRv2_det_infer --image 12.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppocr_rec --model ch_PP-OCRv2_rec_infer --image rec_img.jpg --rec_label_file ppocr_keys_v1.txt --config_path $CONFIG_PATH && sleep_seconds
# PaddleDetection
./benchmark_ppyolov5 --model yolov5_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppyolov6 --model yolov6_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppyolox --model yolox_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco --image 000000014439.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_picodet --model picodet_l_640_coco_lcnet --image 000000014439.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppyolov7 --model yolov7_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH && sleep_seconds 60
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH && sleep_seconds 60
./benchmark_ppyolov5 --model yolov5_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov6 --model yolov6_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov7 --model yolov7_l_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov8 --model yolov8_s_500e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolox --model yolox_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_picodet --model picodet_l_640_coco_lcnet_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolo --model ppyolo_r50vd_dcn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_yolov3 --model yolov3_darknet53_270e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolo --model ppyolov2_r101vd_dcn_365e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_picodet --model picodet_l_320_coco_lcnet --image 000000014439.jpg $CONFIG_PATH
./benchmark_fasterrcnn --model faster_rcnn_r50_vd_fpn_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_maskrcnn --model mask_rcnn_r50_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_rtmdet --model rtmdet_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_rtmdet --model rtmdet_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_cascadercnn --model cascade_rcnn_r50_fpn_1x_coco --image 000000014439.jpg $CONFIG_PATH
./benchmark_cascadercnn --model cascade_rcnn_r50_vd_fpn_ssld_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fcos --model fcos_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_gfl --model gfl_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_retinanet --model retinanet_r101_fpn_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_retinanet --model retinanet_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_tood --model tood_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ttfnet --model ttfnet_darknet53_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov5 --model yolov5_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_s_400e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_x_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fasterrcnn --model faster_rcnn_enhance_3x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_crn_l_80e_sliced_visdrone_640_025 --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssd_mobilenet_v1_300_120e_voc --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssd_vgg16_300_240e_voc --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssdlite_mobilenet_v1_300_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_x_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_l_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_m_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_n_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
# PaddleSeg
./benchmark_ppseg --model Portrait_PP_HumanSegV2_Lite_256x144_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppseg --model PP_HumanSegV2_Lite_192x192_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppseg --model PP_HumanSegV1_Lite_infer --image portrait_heng.jpg --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppseg --model PP_LiteSeg_B_STDC2_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppseg --model FCN_HRNet_W18_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH && sleep_seconds
./benchmark_ppseg --model SegFormer_B0-cityscapes-with-argmax --image cityscapes_demo.png --config_path $CONFIG_PATH && sleep_seconds 60
./benchmark_ppseg --model Deeplabv3_ResNet101_OS8_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH && sleep_seconds 60
./benchmark_ppseg --model Unet_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH && sleep_seconds 60
./benchmark_ppseg --model PP_HumanSegV1_Server_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH && sleep_seconds 60
./benchmark_ppmatting --model PP-Matting-512 --image matting_input.jpg --config_path $CONFIG_PATH && sleep_seconds 60
./benchmark_ppmatting --model PPHumanMatting --image matting_input.jpg --config_path $CONFIG_PATH && sleep_seconds 60
./benchmark_ppmatting --model PPModnet_MobileNetV2 --image matting_input.jpg --config_path $CONFIG_PATH && sleep_seconds 60

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + "model.pdmodel";
auto params_file = FLAGS_model + sep + "model.pdiparams";
auto config_file = FLAGS_model + sep + "infer_cfg.yml";
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_cascade_rcnn = vision::detection::CascadeRCNN(model_file, params_file,
config_file, option,model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_cascade_rcnn.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "cascade_rcnn_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_cascade_rcnn.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_cascade_rcnn, model_cascade_rcnn.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,118 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("im_shape",{1,2},{1,2},{1,2});
option.trt_option.SetShape("image", {1, 3, 320,320},{1, 3, 640, 640},
{1, 3, 1280, 1280});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ppdet = vision::detection::PaddleDetectionModel(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppdet.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ppdet_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
// 2. Test tensor diff
std::cout << "=============== Test tensor diff =================\n";
std::vector<vision::DetectionResult> batch_res;
std::vector<fastdeploy::FDTensor> input_tensors, output_tensors;
std::vector<cv::Mat> imgs;
imgs.push_back(im);
std::vector<vision::FDMat> fd_images = vision::WrapMat(imgs);
model_ppdet.GetPreprocessor().Run(&fd_images, &input_tensors);
input_tensors[0].name = "image";
input_tensors[1].name = "scale_factor";
input_tensors[2].name = "im_shape";
input_tensors.pop_back();
model_ppdet.Infer(input_tensors, &output_tensors);
model_ppdet.GetPostprocessor().Run(output_tensors, &batch_res);
// Save tensor to -> disk.
auto& tensor_dump = output_tensors[0];
std::string det_tensor_path = "ppdet_tensor.txt";
benchmark::ResultManager::SaveFDTensor(tensor_dump, det_tensor_path);
// Load tensor from <- disk.
fastdeploy::FDTensor tensor_loaded;
benchmark::ResultManager::LoadFDTensor(&tensor_loaded, det_tensor_path);
// Calculate diff between two tensors.
auto det_tensor_diff = benchmark::ResultManager::CalculateDiffStatis(
tensor_dump, tensor_loaded);
std::cout << "Tensor diff: mean=" << det_tensor_diff.data.mean
<< ", max=" << det_tensor_diff.data.max
<< ", min=" << det_tensor_diff.data.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_ppdet.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_ppdet, model_ppdet.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res,0.3);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_fasterrcnn = vision::detection::FasterRCNN(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_fasterrcnn.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "fasterrcnn_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_fasterrcnn.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_fasterrcnn, model_fasterrcnn.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_fcos = vision::detection::FCOS(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_fcos.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "fcos_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_fcos.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_fcos, model_fcos.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_gfl = vision::detection::GFL(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_gfl.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "gfl_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_gfl.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_gfl, model_gfl.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,111 +0,0 @@
# Run all models specify hardware and specify backend
CONFIG_PATH="config.gpu.paddle.fp32.txt"
if [ ! "$1" = "$CONFIG_PATH" ]; then
if [ -f "$1" ]; then
CONFIG_PATH="$1"
fi
fi
# PaddleClas
./benchmark_ppcls --model PPLCNet_x1_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPLCNetV2_base_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model EfficientNetB7_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model EfficientNetB0_small_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model GhostNet_x0_5_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model GhostNet_x1_3_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model GhostNet_x1_3_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV1_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV2_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV2_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV3_small_x0_35_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV3_large_x1_0_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ShuffleNetV2_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ShuffleNetV2_x2_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model SqueezeNet1_1_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model InceptionV3_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ResNet50_vd_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPHGNet_tiny_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPHGNet_base_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ResNet50_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model EfficientNetB0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV2_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV3_small_x1_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ViT_large_patch16_224_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ResNeXt50_32x4d_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model DenseNet121_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPHGNet_small_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model person_exists_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
# PaddleOCR
./benchmark_ppocr_det --model ch_PP-OCRv3_det_infer --image 12.jpg --config_path $CONFIG_PATH
./benchmark_ppocr_cls --model ch_ppocr_mobile_v2.0_cls_infer --image rec_img.jpg --config_path $CONFIG_PATH
./benchmark_ppocr_rec --model ch_PP-OCRv3_rec_infer --image rec_img.jpg --rec_label_file ppocr_keys_v1.txt --config_path $CONFIG_PATH
./benchmark_ppocr_det --model ch_PP-OCRv2_det_infer --image 12.jpg --config_path $CONFIG_PATH
./benchmark_ppocr_rec --model ch_PP-OCRv2_rec_infer --image rec_img.jpg --rec_label_file ppocr_keys_v1.txt --config_path $CONFIG_PATH
./benchmark_ppocr_table --model en_ppstructure_mobile_v2.0_SLANet_infer --image table.jpg --table_char_dict_path table_structure_dict.txt --config_path $CONFIG_PATH
# PaddleDetection
./benchmark_ppyolov5 --model yolov5_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolox --model yolox_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_picodet --model picodet_l_640_coco_lcnet --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov5 --model yolov5_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov6 --model yolov6_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov7 --model yolov7_l_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov8 --model yolov8_s_500e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolox --model yolox_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_picodet --model picodet_l_640_coco_lcnet_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolo --model ppyolo_r50vd_dcn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_yolov3 --model yolov3_darknet53_270e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolo --model ppyolov2_r101vd_dcn_365e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_picodet --model picodet_l_320_coco_lcnet --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fasterrcnn --model faster_rcnn_r50_vd_fpn_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_maskrcnn --model mask_rcnn_r50_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_rtmdet --model rtmdet_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_rtmdet --model rtmdet_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_cascadercnn --model cascade_rcnn_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_cascadercnn --model cascade_rcnn_r50_vd_fpn_ssld_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fcos --model fcos_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_gfl --model gfl_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_retinanet --model retinanet_r101_fpn_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_retinanet --model retinanet_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_tood --model tood_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ttfnet --model ttfnet_darknet53_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov5 --model yolov5_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_s_400e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_x_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fasterrcnn --model faster_rcnn_enhance_3x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_crn_l_80e_sliced_visdrone_640_025 --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssd_mobilenet_v1_300_120e_voc --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssd_vgg16_300_240e_voc --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssdlite_mobilenet_v1_300_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_x_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_l_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_m_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_n_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
# PaddleSeg
./benchmark_ppseg --model Portrait_PP_HumanSegV2_Lite_256x144_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model PP_HumanSegV2_Lite_192x192_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model PP_HumanSegV1_Lite_infer --image portrait_heng.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model PP_HumanSegV2_Mobile_192x192_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model Deeplabv3_ResNet101_OS8_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH
./benchmark_ppseg --model PP_LiteSeg_B_STDC2_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH
./benchmark_ppseg --model SegFormer_B0-cityscapes-with-argmax --image cityscapes_demo.png --config_path $CONFIG_PATH
./benchmark_ppmatting --model PP-Matting-512 --image matting_input.jpg --config_path $CONFIG_PATH
./benchmark_ppmatting --model PPHumanMatting --image matting_input.jpg --config_path $CONFIG_PATH
./benchmark_ppmatting --model PPModnet_MobileNetV2 --image matting_input.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model Unet_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH
./benchmark_ppseg --model PP_HumanSegV1_Server_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model FCN_HRNet_W18_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH

View File

@@ -1,103 +0,0 @@
# Run all models specify hardware and specify backend
CONFIG_PATH="config.gpu.paddle_trt.fp32.txt"
if [ ! "$1" = "$CONFIG_PATH" ]; then
if [ -f "$1" ]; then
CONFIG_PATH="$1"
fi
fi
# PaddleClas
./benchmark_ppcls --model PPLCNet_x1_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "x"
./benchmark_ppcls --model PPLCNetV2_base_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "x"
./benchmark_ppcls --model EfficientNetB7_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,600,600:1,3,600,600:1,3,600,600" --input_name "x"
./benchmark_ppcls --model EfficientNetB0_small_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model GhostNet_x0_5_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model GhostNet_x1_3_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "x"
./benchmark_ppcls --model GhostNet_x1_3_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model MobileNetV1_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model MobileNetV2_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model MobileNetV2_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model MobileNetV3_small_x0_35_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "x"
./benchmark_ppcls --model MobileNetV3_large_x1_0_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model ShuffleNetV2_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model ShuffleNetV2_x2_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model SqueezeNet1_1_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model InceptionV3_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,299,299:1,3,299,299:1,3,299,299" --input_name "x"
./benchmark_ppcls --model ResNet50_vd_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model PPHGNet_tiny_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "x"
./benchmark_ppcls --model PPHGNet_base_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "x"
./benchmark_ppcls --model ResNet50_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model EfficientNetB0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "x"
./benchmark_ppcls --model MobileNetV2_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model MobileNetV3_small_x1_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model ViT_large_patch16_224_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model ResNeXt50_32x4d_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model DenseNet121_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "inputs"
./benchmark_ppcls --model PPHGNet_small_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "x"
./benchmark_ppcls --model person_exists_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH --trt_shape "1,3,224,224:1,3,224,224:1,3,224,224" --input_name "x"
# PaddleOCR
./benchmark_ppocr_det --model ch_PP-OCRv3_det_infer --image 12.jpg --config_path $CONFIG_PATH --trt_shape 1,3,960,608:1,3,960,608:1,3,960,608
./benchmark_ppocr_cls --model ch_ppocr_mobile_v2.0_cls_infer --image rec_img.jpg --config_path $CONFIG_PATH --trt_shape 1,3,48,192:1,3,48,192:1,3,48,192
./benchmark_ppocr_rec --model ch_PP-OCRv3_rec_infer --image rec_img.jpg --rec_label_file ppocr_keys_v1.txt --config_path $CONFIG_PATH --trt_shape 1,3,48,572:1,3,48,572:1,3,48,572
./benchmark_ppocr_det --model ch_PP-OCRv2_det_infer --image 12.jpg --config_path $CONFIG_PATH --trt_shape 1,3,960,608:1,3,960,608:1,3,960,608
./benchmark_ppocr_rec --model ch_PP-OCRv2_rec_infer --image rec_img.jpg --rec_label_file ppocr_keys_v1.txt --config_path $CONFIG_PATH --trt_shape 1,3,32,10:1,3,32,320:1,3,32,2304
# PaddleDetection
./benchmark_ppyolov5 --model yolov5_s_300e_coco_trt_nms --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_s_300e_coco_trt_nms --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_s_500e_coco_trt_nms --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolox --model yolox_s_300e_coco_trt_nms --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco_trt_nms --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_picodet --model picodet_l_640_coco_lcnet_trt_nms --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_l_300e_coco_trt_nms --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco_trt_nms --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov5 --model yolov5_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolox --model yolox_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_picodet --model picodet_l_640_coco_lcnet --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov5 --model yolov5_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov6 --model yolov6_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov7 --model yolov7_l_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov8 --model yolov8_s_500e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolox --model yolox_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_picodet --model picodet_l_640_coco_lcnet_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_yolov3 --model yolov3_darknet53_270e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_picodet --model picodet_l_320_coco_lcnet --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_rtmdet --model rtmdet_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_rtmdet --model rtmdet_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov5 --model yolov5_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_s_400e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_x_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_crn_l_80e_sliced_visdrone_640_025 --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_x_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_l_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_m_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_n_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
# PaddleSeg
./benchmark_ppseg --model Portrait_PP_HumanSegV2_Lite_256x144_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH --trt_shape 1,3,144,256:1,3,144,256:1,3,144,256
./benchmark_ppseg --model PP_HumanSegV2_Lite_192x192_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH --trt_shape 1,3,192,192:1,3,192,192:1,3,192,192
./benchmark_ppseg --model PP_HumanSegV1_Lite_infer --image portrait_heng.jpg --config_path $CONFIG_PATH --trt_shape 1,3,192,192:1,3,192,192:1,3,192,192
./benchmark_ppseg --model PP_HumanSegV2_Mobile_192x192_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH --trt_shape 1,3,192,192:1,3,192,192:1,3,192,192
./benchmark_ppseg --model Deeplabv3_ResNet101_OS8_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH --trt_shape 1,3,512,512:1,3,512,512:1,3,512,512
./benchmark_ppseg --model PP_LiteSeg_B_STDC2_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH --trt_shape 1,3,512,512:1,3,512,512:1,3,512,512
./benchmark_ppseg --model SegFormer_B0-cityscapes-with-argmax --image cityscapes_demo.png --config_path $CONFIG_PATH --trt_shape 1,3,512,512:1,3,512,512:1,3,512,512
./benchmark_ppmatting --model PP-Matting-512 --image matting_input.jpg --config_path $CONFIG_PATH --trt_shape 1,3,512,512:1,3,512,512:1,3,512,512
./benchmark_ppmatting --model PPHumanMatting --image matting_input.jpg --config_path $CONFIG_PATH --trt_shape 1,3,2048,2048:1,3,2048,2048:1,3,2048,2048
./benchmark_ppmatting --model PPModnet_MobileNetV2 --image matting_input.jpg --config_path $CONFIG_PATH --trt_shape 1,3,512,512:1,3,512,512:1,3,512,512
./benchmark_ppseg --model Unet_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH --trt_shape 1,3,512,512:1,3,512,512:1,3,512,512
./benchmark_ppseg --model PP_HumanSegV1_Server_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH --trt_shape 1,3,512,512:1,3,512,512:1,3,512,512
./benchmark_ppseg --model FCN_HRNet_W18_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH --trt_shape 1,3,512,512:1,3,512,512:1,3,512,512

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_maskrcnn = vision::detection::MaskRCNN(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_maskrcnn.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "maskrcnn_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_maskrcnn.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_maskrcnn, model_maskrcnn.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,88 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2}, {1, 2});
}
auto model_picodet = vision::detection::PicoDet(
model_file, params_file, config_file, option, model_format);
if (FLAGS_no_nms) {
model_picodet.GetPostprocessor().ApplyNMS();
}
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_picodet.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "picodet_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
BENCHMARK_MODEL(model_picodet, model_picodet.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res, 0.5f);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,93 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_string(trt_shape, "1,3,224,224:1,3,224,224:1,3,224,224",
"Set min/opt/max shape for trt/paddle_trt backend."
"eg:--trt_shape 1,3,224,224:1,3,224,224:1,3,224,224");
DEFINE_string(input_name, "x",
"Set input name for trt/paddle_trt backend."
"eg:--input_names x");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
// Set max_batch_size 1 for best performance
if (config_info["backend"] == "paddle_trt") {
option.trt_option.max_batch_size = 1;
}
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
std::vector<std::vector<int32_t>> trt_shapes =
benchmark::ResultManager::GetInputShapes(FLAGS_trt_shape);
option.trt_option.SetShape(FLAGS_input_name, trt_shapes[0],
trt_shapes[1], trt_shapes[2]);
}
auto model_ppcls = vision::classification::PaddleClasModel(
model_file, params_file, config_file, option, model_format);
vision::ClassifyResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppcls.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string cls_result_path = "ppcls_result.txt";
benchmark::ResultManager::SaveClassifyResult(res, cls_result_path);
// Load result from <- disk.
vision::ClassifyResult res_loaded;
benchmark::ResultManager::LoadClassifyResult(&res_loaded, cls_result_path);
// Calculate diff between two results.
auto cls_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Labels diff: mean=" << cls_diff.labels.mean
<< ", max=" << cls_diff.labels.max
<< ", min=" << cls_diff.labels.min << std::endl;
std::cout << "Scores diff: mean=" << cls_diff.scores.mean
<< ", max=" << cls_diff.scores.max
<< ", min=" << cls_diff.scores.min << std::endl;
}
BENCHMARK_MODEL(model_ppcls, model_ppcls.Predict(im, &res))
#endif
return 0;
}

View File

@@ -1,117 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ppdet = vision::detection::PaddleDetectionModel(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppdet.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ppdet_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
// 2. Test tensor diff
std::cout << "=============== Test tensor diff =================\n";
std::vector<vision::DetectionResult> batch_res;
std::vector<fastdeploy::FDTensor> input_tensors, output_tensors;
std::vector<cv::Mat> imgs;
imgs.push_back(im);
std::vector<vision::FDMat> fd_images = vision::WrapMat(imgs);
model_ppdet.GetPreprocessor().Run(&fd_images, &input_tensors);
input_tensors[0].name = "image";
input_tensors[1].name = "scale_factor";
input_tensors[2].name = "im_shape";
input_tensors.pop_back();
model_ppdet.Infer(input_tensors, &output_tensors);
model_ppdet.GetPostprocessor().Run(output_tensors, &batch_res);
// Save tensor to -> disk.
auto& tensor_dump = output_tensors[0];
std::string det_tensor_path = "ppdet_tensor.txt";
benchmark::ResultManager::SaveFDTensor(tensor_dump, det_tensor_path);
// Load tensor from <- disk.
fastdeploy::FDTensor tensor_loaded;
benchmark::ResultManager::LoadFDTensor(&tensor_loaded, det_tensor_path);
// Calculate diff between two tensors.
auto det_tensor_diff = benchmark::ResultManager::CalculateDiffStatis(
tensor_dump, tensor_loaded);
std::cout << "Tensor diff: mean=" << det_tensor_diff.data.mean
<< ", max=" << det_tensor_diff.data.max
<< ", min=" << det_tensor_diff.data.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_ppdet.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_ppdet, model_ppdet.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res,0.3);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,90 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_string(trt_shape, "1,3,512,512:1,3,512,512:1,3,512,512",
"Set min/opt/max shape for trt/paddle_trt backend."
"eg:--trt_shape 1,3,512,512:1,3,512,512:1,3,512,512");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
std::vector<std::vector<int32_t>> trt_shapes =
benchmark::ResultManager::GetInputShapes(FLAGS_trt_shape);
option.trt_option.SetShape("img", trt_shapes[0], trt_shapes[1],
trt_shapes[2]);
}
auto model_ppmatting = vision::matting::PPMatting(
model_file, params_file, config_file, option, model_format);
vision::MattingResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppmatting.Predict(&im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string matting_result_path = "ppmatting_result.txt";
benchmark::ResultManager::SaveMattingResult(res, matting_result_path);
// Load result from <- disk.
vision::MattingResult res_loaded;
benchmark::ResultManager::LoadMattingResult(&res_loaded,
matting_result_path);
// Calculate diff between two results.
auto matting_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Alpha diff: mean=" << matting_diff.alpha.mean
<< ", max=" << matting_diff.alpha.max
<< ", min=" << matting_diff.alpha.min << std::endl;
if (res_loaded.contain_foreground) {
std::cout << "Foreground diff: mean=" << matting_diff.foreground.mean
<< ", max=" << matting_diff.foreground.max
<< ", min=" << matting_diff.foreground.min << std::endl;
}
}
BENCHMARK_MODEL(model_ppmatting, model_ppmatting.Predict(&im, &res))
auto vis_im = vision::VisMatting(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,78 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_string(trt_shape, "1,3,48,10:4,3,48,320:8,3,48,1024",
"Set min/opt/max shape for trt/paddle_trt backend."
"eg:--trt_shape 1,3,48,10:4,3,48,320:8,3,48,1024");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info, false)) {
return -1;
}
// Classification Model
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
std::vector<std::vector<int32_t>> trt_shapes =
benchmark::ResultManager::GetInputShapes(FLAGS_trt_shape);
option.trt_option.SetShape("x", trt_shapes[0], trt_shapes[1],
trt_shapes[2]);
}
auto model_ppocr_cls =
vision::ocr::Classifier(model_file, params_file, option, model_format);
int32_t res_label;
float res_score;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppocr_cls.Predict(im, &res_label, &res_score);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
int32_t res_label_expect = 0;
float res_score_expect = 1.0;
// Calculate diff between two results.
auto ppocr_cls_label_diff = res_label - res_label_expect;
auto ppocr_cls_score_diff = res_score - res_score_expect;
std::cout << "PPOCR Cls label diff: " << ppocr_cls_label_diff << std::endl;
std::cout << "PPOCR Cls score diff: " << abs(ppocr_cls_score_diff)
<< std::endl;
}
BENCHMARK_MODEL(model_ppocr_cls,
model_ppocr_cls.Predict(im, &res_label, &res_score));
#endif
return 0;
}

View File

@@ -1,82 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_string(trt_shape, "1,3,64,64:1,3,640,640:1,3,960,960",
"Set min/opt/max shape for trt/paddle_trt backend."
"eg:--trt_shape 1,3,64,64:1,3,640,640:1,3,960,960");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
// Detection Model
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info, false)) {
return -1;
}
// Classification Model
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
std::vector<std::vector<int32_t>> trt_shapes =
benchmark::ResultManager::GetInputShapes(FLAGS_trt_shape);
option.trt_option.SetShape("x", trt_shapes[0], trt_shapes[1],
trt_shapes[2]);
}
auto model_ppocr_det =
vision::ocr::DBDetector(model_file, params_file, option, model_format);
std::vector<std::array<int, 8>> res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppocr_det.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string ppocr_det_result_path = "ppocr_det_result.txt";
benchmark::ResultManager::SaveOCRDetResult(res, ppocr_det_result_path);
// Load result from <- disk.
std::vector<std::array<int, 8>> res_loaded;
benchmark::ResultManager::LoadOCRDetResult(&res_loaded,
ppocr_det_result_path);
// Calculate diff between two results.
auto ppocr_det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "PPOCR Boxes diff: mean=" << ppocr_det_diff.boxes.mean
<< ", max=" << ppocr_det_diff.boxes.max
<< ", min=" << ppocr_det_diff.boxes.min << std::endl;
}
BENCHMARK_MODEL(model_ppocr_det, model_ppocr_det.Predict(im, &res));
#endif
return 0;
}

View File

@@ -1,83 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_string(rec_label_file, "", "Path of Recognization label file of PPOCR.");
DEFINE_string(trt_shape, "1,3,48,10:4,3,48,320:8,3,48,2304",
"Set min/opt/max shape for trt/paddle_trt backend."
"eg:--trt_shape 1,3,48,10:4,3,48,320:8,3,48,2304");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info, false)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
std::vector<std::vector<int32_t>> trt_shapes =
benchmark::ResultManager::GetInputShapes(FLAGS_trt_shape);
option.trt_option.SetShape("x", trt_shapes[0], trt_shapes[1],
trt_shapes[2]);
}
auto model_ppocr_rec = vision::ocr::Recognizer(
model_file, params_file, FLAGS_rec_label_file, option, model_format);
std::vector<std::string> model_names;
fastdeploy::benchmark::Split(FLAGS_model, model_names, sep);
if (model_names[model_names.size() - 1] == "ch_PP-OCRv2_rec_infer") {
model_ppocr_rec.GetPreprocessor().SetRecImageShape({3, 32, 320});
}
std::string text;
float rec_score;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppocr_rec.Predict(im, &text, &rec_score);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
std::string text_expect = "上海斯格威铂尔大酒店";
float res_score_expect = 0.993308;
// Calculate diff between two results.
auto ppocr_rec_text_diff = text.compare(text_expect);
auto ppocr_rec_score_diff = rec_score - res_score_expect;
std::cout << "PPOCR Rec text diff: " << ppocr_rec_text_diff << std::endl;
std::cout << "PPOCR Rec score diff: " << abs(ppocr_rec_score_diff)
<< std::endl;
}
BENCHMARK_MODEL(model_ppocr_rec,
model_ppocr_rec.Predict(im, &text, &rec_score));
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_string(trt_shape, "1,3,192,192:1,3,192,192:1,3,192,192",
"Set min/opt/max shape for trt/paddle_trt backend."
"eg:--trt_shape 1,3,192,192:1,3,192,192:1,3,192,192");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
std::vector<std::vector<int32_t>> trt_shapes =
benchmark::ResultManager::GetInputShapes(FLAGS_trt_shape);
option.trt_option.SetShape("x", trt_shapes[0], trt_shapes[1],
trt_shapes[2]);
}
auto model_ppseg = vision::segmentation::PaddleSegModel(
model_file, params_file, config_file, option, model_format);
vision::SegmentationResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppseg.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string seg_result_path = "ppseg_result.txt";
benchmark::ResultManager::SaveSegmentationResult(res, seg_result_path);
// Load result from <- disk.
vision::SegmentationResult res_loaded;
benchmark::ResultManager::LoadSegmentationResult(&res_loaded,
seg_result_path);
// Calculate diff between two results.
auto seg_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Labels diff: mean=" << seg_diff.labels.mean
<< ", max=" << seg_diff.labels.max
<< ", min=" << seg_diff.labels.min << std::endl;
if (res_loaded.contain_score_map) {
std::cout << "Scores diff: mean=" << seg_diff.scores.mean
<< ", max=" << seg_diff.scores.max
<< ", min=" << seg_diff.scores.min << std::endl;
}
}
BENCHMARK_MODEL(model_ppseg, model_ppseg.Predict(im, &res))
auto vis_im = vision::VisSegmentation(im, res, 0.5);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2}, {1, 2});
option.trt_option.SetShape("im_shape", {1, 2}, {1, 2}, {1, 2});
}
auto model = vision::classification::PPShiTuV2Detector(
model_file, params_file, config_file, option, model_format);
if (FLAGS_no_nms) {
model.GetPostprocessor().ApplyNMS();
}
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ppshituv2_det_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
BENCHMARK_MODEL(model, model.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res, 0.5f);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,93 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_string(trt_shape, "1,3,224,224:1,3,224,224:1,3,224,224",
"Set min/opt/max shape for trt/paddle_trt backend."
"eg:--trt_shape 1,3,224,224:1,3,224,224:1,3,224,224");
DEFINE_string(input_name, "x",
"Set input name for trt/paddle_trt backend."
"eg:--input_names x");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
// Set max_batch_size 1 for best performance
if (config_info["backend"] == "paddle_trt") {
option.trt_option.max_batch_size = 1;
}
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
std::vector<std::vector<int32_t>> trt_shapes =
benchmark::ResultManager::GetInputShapes(FLAGS_trt_shape);
option.trt_option.SetShape(FLAGS_input_name, trt_shapes[0], trt_shapes[1],
trt_shapes[2]);
}
auto model = vision::classification::PPShiTuV2Recognizer(
model_file, params_file, config_file, option, model_format);
vision::ClassifyResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string cls_result_path = "ppcls_result.txt";
benchmark::ResultManager::SaveClassifyResult(res, cls_result_path);
// Load result from <- disk.
vision::ClassifyResult res_loaded;
benchmark::ResultManager::LoadClassifyResult(&res_loaded, cls_result_path);
// Calculate diff between two results.
auto cls_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Labels diff: mean=" << cls_diff.labels.mean
<< ", max=" << cls_diff.labels.max
<< ", min=" << cls_diff.labels.min << std::endl;
std::cout << "Scores diff: mean=" << cls_diff.scores.mean
<< ", max=" << cls_diff.scores.max
<< ", min=" << cls_diff.scores.min << std::endl;
}
BENCHMARK_MODEL(model, model.Predict(im, &res))
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ppyolo = vision::detection::PPYOLO(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppyolo.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ppyolo_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_ppyolo.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_ppyolo, model_ppyolo.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,88 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ppyoloe = vision::detection::PPYOLOE(
model_file, params_file, config_file, option, model_format);
if (FLAGS_no_nms) {
model_ppyoloe.GetPostprocessor().ApplyNMS();
}
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppyoloe.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ppyoloe_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
BENCHMARK_MODEL(model_ppyoloe, model_ppyoloe.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,60 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <fstream>
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
auto model_ppyoloe_r = vision::detection::PPYOLOER(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
// Run profiling
BENCHMARK_MODEL(model_ppyoloe_r, model_ppyoloe_r.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,61 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <fstream>
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::SOPHGO;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
auto model_ppyoloe_r = vision::detection::PPYOLOER(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
// Run profiling
BENCHMARK_MODEL(model_ppyoloe_r, model_ppyoloe_r.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ppyolov5 = vision::detection::PaddleYOLOv5(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppyolov5.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ppyolov5_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_ppyolov5.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_ppyolov5, model_ppyolov5.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ppyolov6 = vision::detection::PaddleYOLOv6(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppyolov6.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ppyolov6_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_ppyolov6.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_ppyolov6, model_ppyolov6.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ppyolov7 = vision::detection::PaddleYOLOv7(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppyolov7.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ppyolov7_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_ppyolov7.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_ppyolov7, model_ppyolov7.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,117 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ppyolov8 = vision::detection::PaddleYOLOv8(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppyolov8.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ppyolov8_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
// 2. Test tensor diff
std::cout << "=============== Test tensor diff =================\n";
std::vector<vision::DetectionResult> batch_res;
std::vector<fastdeploy::FDTensor> input_tensors, output_tensors;
std::vector<cv::Mat> imgs;
imgs.push_back(im);
std::vector<vision::FDMat> fd_images = vision::WrapMat(imgs);
model_ppyolov8.GetPreprocessor().Run(&fd_images, &input_tensors);
input_tensors[0].name = "image";
input_tensors[1].name = "scale_factor";
input_tensors[2].name = "im_shape";
input_tensors.pop_back();
model_ppyolov8.Infer(input_tensors, &output_tensors);
model_ppyolov8.GetPostprocessor().Run(output_tensors, &batch_res);
// Save tensor to -> disk.
auto& tensor_dump = output_tensors[0];
std::string det_tensor_path = "ppyolov8_tensor.txt";
benchmark::ResultManager::SaveFDTensor(tensor_dump, det_tensor_path);
// Load tensor from <- disk.
fastdeploy::FDTensor tensor_loaded;
benchmark::ResultManager::LoadFDTensor(&tensor_loaded, det_tensor_path);
// Calculate diff between two tensors.
auto det_tensor_diff = benchmark::ResultManager::CalculateDiffStatis(
tensor_dump, tensor_loaded);
std::cout << "Tensor diff: mean=" << det_tensor_diff.data.mean
<< ", max=" << det_tensor_diff.data.max
<< ", min=" << det_tensor_diff.data.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_ppyolov8.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_ppyolov8, model_ppyolov8.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ppyolox = vision::detection::PaddleYOLOX(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ppyolox.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ppyolox_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_ppyolox.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_ppyolox, model_ppyolox.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_retinanet = vision::detection::RetinaNet(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_retinanet.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "retinanet_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_retinanet.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_retinanet, model_retinanet.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_rtmdet = vision::detection::RTMDet(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_rtmdet.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "rtmdet_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_rtmdet.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_rtmdet, model_rtmdet.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ssd = vision::detection::SSD(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ssd.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ssd_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_ssd.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_ssd, model_ssd.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,93 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info, false)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 800, 608}, {1, 3, 800, 608},
{1, 3, 800, 608});
}
auto layout_model = vision::ocr::StructureV2Layout(model_file, params_file,
option, model_format);
// 5 for publaynet, 10 for cdla
layout_model.GetPostprocessor().SetNumClass(5);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
layout_model.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string layout_result_path = "layout_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, layout_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded,
layout_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
BENCHMARK_MODEL(layout_model, layout_model.Predict(im, &res))
std::vector<std::string> labels = {"text", "title", "list", "table",
"figure"};
if (layout_model.GetPostprocessor().GetNumClass() == 10) {
labels = {"text", "title", "figure", "figure_caption",
"table", "table_caption", "header", "footer",
"reference", "equation"};
}
auto vis_im =
vision::VisDetection(im, res, labels, 0.3, 2, .5f, {255, 0, 0}, 2);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,161 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_string(table_char_dict_path, "",
"Path of table character dict of PPOCR.");
DEFINE_string(trt_shape, "1,3,48,10:4,3,48,320:8,3,48,2304",
"Set min/opt/max shape for trt/paddle_trt backend."
"eg:--trt_shape 1,3,48,10:4,3,48,320:8,3,48,2304");
int main(int argc, char *argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info, false)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
std::vector<std::vector<int32_t>> trt_shapes =
benchmark::ResultManager::GetInputShapes(FLAGS_trt_shape);
option.trt_option.SetShape("x", trt_shapes[0], trt_shapes[1],
trt_shapes[2]);
}
auto model_ppocr_table = vision::ocr::StructureV2Table(
model_file, params_file, FLAGS_table_char_dict_path, option,
model_format);
fastdeploy::vision::OCRResult result;
if (config_info["precision_compare"] == "true") {
std::string expect_structure_html =
"<html><body><table><thead><tr><td></td><td></td><td></td><td></"
"td><td></td></tr></thead><tbody><tr><td></td><td></td><td></td><td></"
"td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></"
"tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></"
"td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></"
"td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></"
"td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></"
"td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></"
"tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td></"
"td><td></td><td></td><td></td><td></td></tr><tr><td></td><td></"
"td><td></td><td></td><td></td></tr><tr><td></td><td></td><td></"
"td><td></td><td></td></tr><tr><td></td><td></td><td></td><td></"
"td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></"
"tr><tr><td></td><td></td><td></td><td></td><td></td></tr></tbody></"
"table></body></html>";
std::vector<int> expect_box_coord{
41, 4, 97, 18, 161, 4, 173, 18, 216, 4, 225, 17, 272, 4,
283, 17, 321, 4, 348, 18, 33, 20, 106, 38, 150, 22, 180, 38,
202, 22, 235, 38, 262, 21, 293, 38, 326, 23, 343, 37, 27, 38,
109, 56, 150, 39, 179, 56, 204, 39, 236, 56, 263, 39, 292, 55,
329, 40, 343, 54, 22, 57, 118, 74, 152, 58, 176, 74, 204, 58,
236, 75, 262, 58, 291, 74, 326, 58, 344, 74, 27, 75, 119, 92,
150, 75, 177, 92, 204, 75, 235, 92, 260, 75, 292, 92, 326, 75,
346, 92, 44, 92, 102, 110, 150, 92, 177, 110, 205, 92, 236, 110,
262, 92, 290, 110, 329, 93, 339, 110, 41, 109, 102, 128, 151, 110,
175, 128, 205, 110, 236, 128, 262, 110, 291, 127, 329, 110, 338, 127,
42, 128, 102, 146, 149, 128, 177, 146, 205, 128, 237, 146, 262, 128,
291, 146, 329, 128, 339, 145, 31, 145, 110, 163, 150, 145, 178, 163,
206, 145, 237, 164, 262, 145, 292, 163, 324, 145, 342, 162, 40, 162,
108, 180, 154, 162, 175, 180, 209, 162, 231, 180, 266, 162, 286, 180,
325, 162, 341, 179, 38, 180, 105, 197, 152, 180, 177, 197, 207, 180,
236, 197, 262, 180, 291, 197, 329, 181, 339, 196, 42, 196, 102, 214,
151, 197, 179, 214, 205, 197, 236, 214, 263, 197, 291, 214, 320, 197,
349, 214, 46, 215, 100, 233, 149, 216, 179, 233, 204, 216, 238, 233,
262, 216, 291, 233, 321, 216, 345, 232, 42, 233, 104, 251, 147, 234,
179, 251, 203, 233, 237, 251, 260, 233, 294, 251, 326, 234, 341, 250,
19, 251, 120, 269, 148, 253, 180, 270, 202, 252, 240, 270, 259, 252,
294, 270, 324, 252, 347, 268, 16, 270, 123, 286, 146, 270, 182, 287,
200, 270, 238, 287, 256, 270, 294, 286, 319, 270, 353, 286};
// Run once at least
if (!model_ppocr_table.Predict(im, &result)) {
std::cerr << "Failed to predict." << std::endl;
return -1;
}
// 1. Test result diff
std::cout << "=============== Test Table Result diff =================\n";
// Calculate diff between two results.
std::string result_table_structure;
for (auto &structure : result.table_structure) {
result_table_structure += structure;
}
if (expect_structure_html == result_table_structure) {
std::cout << "PPOCR Table structure has no diff" << std::endl;
} else {
std::cout << "PPOCR Table structure has diff" << std::endl;
std::cout << "expected: " << expect_structure_html << std::endl;
std::cout << "result: " << result_table_structure << std::endl;
}
std::vector<int> table_box_coord;
for (auto &box : result.table_boxes) {
// x1 y1 x2 y1 x2 y2 x1 y2 => x1 y1 x2 y2
table_box_coord.push_back(box[0]);
table_box_coord.push_back(box[1]);
table_box_coord.push_back(box[2]);
table_box_coord.push_back(box[5]);
}
if (expect_box_coord.size() == table_box_coord.size()) {
std::cout << "table boxes num matched with expected: "
<< table_box_coord.size() << std::endl;
int max_diff = 0;
int total_diff = 0;
for (int i = 0; i < table_box_coord.size(); i++) {
int diff = std::abs(table_box_coord[i] - expect_box_coord[i]);
if (diff > max_diff) {
max_diff = diff;
}
total_diff += diff;
}
std::cout << "box coords, max_diff: " << max_diff << ", "
<< ", total diff: " << total_diff << ", average diff: "
<< total_diff / float(table_box_coord.size()) << std::endl;
} else {
std::cout << "boxes num has diff, expect box num: "
<< expect_box_coord.size() / 4
<< ", result box num:" << table_box_coord.size() / 4
<< std::endl;
}
}
BENCHMARK_MODEL(model_ppocr_table, model_ppocr_table.Predict(im, &result));
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_tood = vision::detection::TOOD(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_tood.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "tood_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_tood.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_tood, model_tood.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,89 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
if (config_info["backend"] == "paddle_trt") {
option.paddle_infer_option.collect_trt_shape = true;
}
if (config_info["backend"] == "paddle_trt" ||
config_info["backend"] == "trt") {
option.trt_option.SetShape("image", {1, 3, 640, 640}, {1, 3, 640, 640},
{1, 3, 640, 640});
option.trt_option.SetShape("scale_factor", {1, 2}, {1, 2},
{1, 2});
}
auto model_ttfnet = vision::detection::TTFNet(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_ttfnet.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "ttfnet_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_ttfnet.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_ttfnet, model_ttfnet.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,109 +0,0 @@
# Run all models specify hardware and specify backend
CONFIG_PATH="config.x86.paddle.fp32.txt"
if [ ! "$1" = "$CONFIG_PATH" ]; then
if [ -f "$1" ]; then
CONFIG_PATH="$1"
fi
fi
# PaddleClas
./benchmark_ppcls --model PPLCNet_x1_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPLCNetV2_base_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model EfficientNetB7_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model EfficientNetB0_small_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model GhostNet_x0_5_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model GhostNet_x1_3_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model GhostNet_x1_3_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV1_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV2_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV2_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV3_small_x0_35_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV3_large_x1_0_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ShuffleNetV2_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ShuffleNetV2_x2_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model SqueezeNet1_1_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model InceptionV3_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ResNet50_vd_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPHGNet_tiny_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPHGNet_base_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ResNet50_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model EfficientNetB0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV2_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV3_small_x1_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ViT_large_patch16_224_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ResNeXt50_32x4d_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model DenseNet121_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPHGNet_small_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model person_exists_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
# PaddleOCR
./benchmark_ppocr_det --model ch_PP-OCRv3_det_infer --image 12.jpg --config_path $CONFIG_PATH
./benchmark_ppocr_cls --model ch_ppocr_mobile_v2.0_cls_infer --image rec_img.jpg --config_path $CONFIG_PATH
./benchmark_ppocr_rec --model ch_PP-OCRv3_rec_infer --image rec_img.jpg --rec_label_file ppocr_keys_v1.txt --config_path $CONFIG_PATH
./benchmark_ppocr_det --model ch_PP-OCRv2_det_infer --image 12.jpg --config_path $CONFIG_PATH
./benchmark_ppocr_rec --model ch_PP-OCRv2_rec_infer --image rec_img.jpg --rec_label_file ppocr_keys_v1.txt --config_path $CONFIG_PATH
# PaddleDetection
./benchmark_ppyolov5 --model yolov5_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov6 --model yolov6_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov7 --model yolov7_l_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov8 --model yolov8_s_500e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolox --model yolox_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_picodet --model picodet_l_640_coco_lcnet_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov6 --model yolov6_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolox --model yolox_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_picodet --model picodet_l_640_coco_lcnet --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolo --model ppyolo_r50vd_dcn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_yolov3 --model yolov3_darknet53_270e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolo --model ppyolov2_r101vd_dcn_365e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_picodet --model picodet_l_320_coco_lcnet --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fasterrcnn --model faster_rcnn_r50_vd_fpn_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_maskrcnn --model mask_rcnn_r50_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_rtmdet --model rtmdet_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_rtmdet --model rtmdet_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_cascadercnn --model cascade_rcnn_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_cascadercnn --model cascade_rcnn_r50_vd_fpn_ssld_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fcos --model fcos_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_gfl --model gfl_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_retinanet --model retinanet_r101_fpn_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_retinanet --model retinanet_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_tood --model tood_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ttfnet --model ttfnet_darknet53_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov5 --model yolov5_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov5 --model yolov5_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_s_400e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_x_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fasterrcnn --model faster_rcnn_enhance_3x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_crn_l_80e_sliced_visdrone_640_025 --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssd_mobilenet_v1_300_120e_voc --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssd_vgg16_300_240e_voc --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssdlite_mobilenet_v1_300_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_x_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_l_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_m_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_n_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
# PaddleSeg
./benchmark_ppseg --model Portrait_PP_HumanSegV2_Lite_256x144_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model PP_HumanSegV2_Lite_192x192_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model PP_HumanSegV1_Lite_infer --image portrait_heng.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model PP_HumanSegV2_Mobile_192x192_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model PP_LiteSeg_B_STDC2_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH
./benchmark_ppseg --model SegFormer_B0-cityscapes-with-argmax --image cityscapes_demo.png --config_path $CONFIG_PATH
./benchmark_ppseg --model Deeplabv3_ResNet101_OS8_cityscapes_with_argmax_infer --image cityscapes_demo.png --warmup 10 --repeat 50 --config_path $CONFIG_PATH
./benchmark_ppseg --model Unet_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH
./benchmark_ppseg --model PP_HumanSegV1_Server_with_argmax_infer --image portrait_heng.jpg --warmup 10 --repeat 50 --config_path $CONFIG_PATH
./benchmark_ppmatting --model PP-Matting-512 --image matting_input.jpg --warmup 10 --repeat 50 --config_path $CONFIG_PATH
./benchmark_ppmatting --model PPHumanMatting --image matting_input.jpg --warmup 10 --repeat 50 --config_path $CONFIG_PATH
./benchmark_ppmatting --model PPModnet_MobileNetV2 --image matting_input.jpg --config_path $CONFIG_PATH
./benchmark_ppseg --model FCN_HRNet_W18_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH

View File

@@ -1,109 +0,0 @@
# Run all models specify hardware and specify backend
CONFIG_PATH="config.xpu.lite.fp32.txt"
if [ ! "$1" = "$CONFIG_PATH" ]; then
if [ -f "$1" ]; then
CONFIG_PATH="$1"
fi
fi
# PaddleClas
./benchmark_ppcls --model PPLCNet_x1_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPLCNetV2_base_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model EfficientNetB7_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model EfficientNetB0_small_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model GhostNet_x0_5_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model GhostNet_x1_3_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model GhostNet_x1_3_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV1_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV1_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV2_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV2_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV3_small_x0_35_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV3_large_x1_0_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ShuffleNetV2_x0_25_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ShuffleNetV2_x2_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model SqueezeNet1_1_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model InceptionV3_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ResNet50_vd_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPHGNet_tiny_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPHGNet_base_ssld_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ResNet50_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model EfficientNetB0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV2_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model MobileNetV3_small_x1_0_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ViT_large_patch16_224_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model ResNeXt50_32x4d_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model DenseNet121_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model PPHGNet_small_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
./benchmark_ppcls --model person_exists_infer --image ILSVRC2012_val_00000010.jpeg --config_path $CONFIG_PATH
# PaddleOCR
./benchmark_ppocr_det --model ch_PP-OCRv3_det_infer --image 12.jpg --config_path $CONFIG_PATH
./benchmark_ppocr_cls --model ch_ppocr_mobile_v2.0_cls_infer --image rec_img.jpg --config_path $CONFIG_PATH
./benchmark_ppocr_rec --model ch_PP-OCRv3_rec_infer --image rec_img.jpg --rec_label_file ppocr_keys_v1.txt --config_path $CONFIG_PATH
./benchmark_ppocr_det --model ch_PP-OCRv2_det_infer --image 12.jpg --config_path $CONFIG_PATH
./benchmark_ppocr_rec --model ch_PP-OCRv2_rec_infer --image rec_img.jpg --rec_label_file ppocr_keys_v1.txt --config_path $CONFIG_PATH
# PaddleDetection
./benchmark_ppyolov5 --model yolov5_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov6 --model yolov6_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov7 --model yolov7_l_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov8 --model yolov8_s_500e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolox --model yolox_s_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_picodet --model picodet_l_640_coco_lcnet_no_nms --image 000000014439.jpg --config_path $CONFIG_PATH --no_nms
./benchmark_ppyolov5 --model yolov5_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_s_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolox --model yolox_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_plus_crn_m_80e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_picodet --model picodet_l_640_coco_lcnet --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_crn_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolo --model ppyolo_r50vd_dcn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_yolov3 --model yolov3_darknet53_270e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolo --model ppyolov2_r101vd_dcn_365e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_picodet --model picodet_l_320_coco_lcnet --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fasterrcnn --model faster_rcnn_r50_vd_fpn_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_maskrcnn --model mask_rcnn_r50_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_rtmdet --model rtmdet_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_rtmdet --model rtmdet_s_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_cascadercnn --model cascade_rcnn_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_cascadercnn --model cascade_rcnn_r50_vd_fpn_ssld_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fcos --model fcos_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_gfl --model gfl_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_retinanet --model retinanet_r101_fpn_2x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_retinanet --model retinanet_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_tood --model tood_r50_fpn_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ttfnet --model ttfnet_darknet53_1x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov5 --model yolov5_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov6 --model yolov6_s_400e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_l_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov7 --model yolov7_x_300e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_fasterrcnn --model faster_rcnn_enhance_3x_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyoloe --model ppyoloe_crn_l_80e_sliced_visdrone_640_025 --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssd_mobilenet_v1_300_120e_voc --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssd_vgg16_300_240e_voc --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ssd --model ssdlite_mobilenet_v1_300_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_x_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_l_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_m_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
./benchmark_ppyolov8 --model yolov8_n_500e_coco --image 000000014439.jpg --config_path $CONFIG_PATH
# PaddleSeg
./benchmark_ppseg --model Portrait_PP_HumanSegV2_Lite_256x144_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH --xpu_l3_cache 0
./benchmark_ppseg --model PP_HumanSegV2_Lite_192x192_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH --xpu_l3_cache 0
./benchmark_ppseg --model PP_HumanSegV1_Lite_infer --image portrait_heng.jpg --config_path $CONFIG_PATH --xpu_l3_cache 0
./benchmark_ppseg --model PP_HumanSegV2_Mobile_192x192_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH --xpu_l3_cache 0
./benchmark_ppseg --model PP_LiteSeg_B_STDC2_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH --xpu_l3_cache 0
./benchmark_ppseg --model FCN_HRNet_W18_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH --xpu_l3_cache 0
./benchmark_ppseg --model SegFormer_B0-cityscapes-with-argmax --image cityscapes_demo.png --config_path $CONFIG_PATH --xpu_l3_cache 0
./benchmark_ppseg --model Deeplabv3_ResNet101_OS8_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH --xpu_l3_cache 0
./benchmark_ppseg --model Unet_cityscapes_with_argmax_infer --image cityscapes_demo.png --config_path $CONFIG_PATH --xpu_l3_cache 0
./benchmark_ppseg --model PP_HumanSegV1_Server_with_argmax_infer --image portrait_heng.jpg --config_path $CONFIG_PATH --xpu_l3_cache 0
./benchmark_ppmatting --model PP-Matting-512 --image matting_input.jpg --config_path $CONFIG_PATH
./benchmark_ppmatting --model PPHumanMatting --image matting_input.jpg --config_path $CONFIG_PATH
./benchmark_ppmatting --model PPModnet_MobileNetV2 --image matting_input.jpg --config_path $CONFIG_PATH

View File

@@ -1,79 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
DEFINE_bool(no_nms, false, "Whether the model contains nms.");
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
std::string model_name, params_name, config_name;
auto model_format = fastdeploy::ModelFormat::PADDLE;
if (!UpdateModelResourceName(&model_name, &params_name, &config_name,
&model_format, config_info)) {
return -1;
}
auto model_file = FLAGS_model + sep + model_name;
auto params_file = FLAGS_model + sep + params_name;
auto config_file = FLAGS_model + sep + config_name;
auto model_yolov3 = vision::detection::YOLOv3(
model_file, params_file, config_file, option, model_format);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_yolov3.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "yolov3_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
// Run profiling
if (FLAGS_no_nms) {
model_yolov3.GetPostprocessor().ApplyNMS();
}
BENCHMARK_MODEL(model_yolov3, model_yolov3.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,62 +0,0 @@
// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "flags.h"
#include "macros.h"
#include "option.h"
namespace vision = fastdeploy::vision;
namespace benchmark = fastdeploy::benchmark;
int main(int argc, char* argv[]) {
#if defined(ENABLE_BENCHMARK) && defined(ENABLE_VISION)
// Initialization
auto option = fastdeploy::RuntimeOption();
if (!CreateRuntimeOption(&option, argc, argv, true)) {
return -1;
}
auto im = cv::imread(FLAGS_image);
std::unordered_map<std::string, std::string> config_info;
benchmark::ResultManager::LoadBenchmarkConfig(FLAGS_config_path,
&config_info);
auto model_yolov5 = vision::detection::YOLOv5(FLAGS_model, "", option);
vision::DetectionResult res;
if (config_info["precision_compare"] == "true") {
// Run once at least
model_yolov5.Predict(im, &res);
// 1. Test result diff
std::cout << "=============== Test result diff =================\n";
// Save result to -> disk.
std::string det_result_path = "yolov5_result.txt";
benchmark::ResultManager::SaveDetectionResult(res, det_result_path);
// Load result from <- disk.
vision::DetectionResult res_loaded;
benchmark::ResultManager::LoadDetectionResult(&res_loaded, det_result_path);
// Calculate diff between two results.
auto det_diff =
benchmark::ResultManager::CalculateDiffStatis(res, res_loaded);
std::cout << "Boxes diff: mean=" << det_diff.boxes.mean
<< ", max=" << det_diff.boxes.max
<< ", min=" << det_diff.boxes.min << std::endl;
std::cout << "Label_ids diff: mean=" << det_diff.labels.mean
<< ", max=" << det_diff.labels.max
<< ", min=" << det_diff.labels.min << std::endl;
}
BENCHMARK_MODEL(model_yolov5, model_yolov5.Predict(im, &res))
auto vis_im = vision::VisDetection(im, res);
cv::imwrite("vis_result.jpg", vis_im);
std::cout << "Visualized result saved in ./vis_result.jpg" << std::endl;
#endif
return 0;
}

View File

@@ -1,14 +0,0 @@
device: cpu
device_id: 0
cpu_thread_nums: 1
warmup: 10
repeat: 20
backend: lite
profile_mode: end2end
include_h2d_d2h: false
use_fp16: true
collect_memory_info: true
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_arm_lite_fp16_e2e_mem.txt

View File

@@ -1,14 +0,0 @@
device: cpu
device_id: 0
cpu_thread_nums: 1
warmup: 20
repeat: 100
backend: lite
profile_mode: end2end
include_h2d_d2h: false
use_fp16: true
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_arm_lite_fp16_e2e.txt

View File

@@ -1,14 +0,0 @@
device: cpu
device_id: 0
cpu_thread_nums: 1
warmup: 20
repeat: 100
backend: lite
profile_mode: runtime
include_h2d_d2h: false
use_fp16: true
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_arm_lite_fp16.txt

View File

@@ -1,14 +0,0 @@
device: cpu
device_id: 0
cpu_thread_nums: 1
warmup: 10
repeat: 20
backend: lite
profile_mode: end2end
include_h2d_d2h: false
use_fp16: false
collect_memory_info: true
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_arm_lite_fp32_e2e_mem.txt

View File

@@ -1,14 +0,0 @@
device: cpu
device_id: 0
cpu_thread_nums: 1
warmup: 20
repeat: 100
backend: lite
profile_mode: end2end
include_h2d_d2h: false
use_fp16: false
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_arm_lite_fp32_e2e.txt

View File

@@ -1,14 +0,0 @@
device: cpu
device_id: 0
cpu_thread_nums: 1
warmup: 20
repeat: 100
backend: lite
profile_mode: runtime
include_h2d_d2h: false
use_fp16: false
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_arm_lite_fp32.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 20
repeat: 100
backend: ort
profile_mode: end2end
include_h2d_d2h: false
use_fp16: false
collect_memory_info: true
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_ort_fp32_e2e_mem.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 200
repeat: 1000
backend: ort
profile_mode: end2end
include_h2d_d2h: false
use_fp16: false
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_ort_fp32_e2e.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 200
repeat: 1000
backend: ort
profile_mode: runtime
include_h2d_d2h: false
use_fp16: false
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_ort_fp32.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 20
repeat: 100
backend: paddle
profile_mode: end2end
include_h2d_d2h: false
use_fp16: false
collect_memory_info: true
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_paddle_fp32_e2e_mem.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 200
repeat: 1000
backend: paddle
profile_mode: end2end
include_h2d_d2h: false
use_fp16: false
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_paddle_fp32_e2e.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 200
repeat: 1000
backend: paddle
profile_mode: runtime
include_h2d_d2h: false
use_fp16: false
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_paddle_fp32.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 20
repeat: 100
backend: paddle_trt
profile_mode: end2end
include_h2d_d2h: false
use_fp16: true
collect_memory_info: true
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_paddle_trt_fp16_e2e_mem.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 200
repeat: 1000
backend: paddle_trt
profile_mode: end2end
include_h2d_d2h: false
use_fp16: true
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_paddle_trt_fp16_e2e.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 200
repeat: 1000
backend: paddle_trt
profile_mode: runtime
include_h2d_d2h: true
use_fp16: true
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_paddle_trt_fp16_h2d.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 200
repeat: 1000
backend: paddle_trt
profile_mode: runtime
include_h2d_d2h: false
use_fp16: true
collect_memory_info: false
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_paddle_trt_fp16.txt

View File

@@ -1,14 +0,0 @@
device: gpu
device_id: 0
cpu_thread_nums: 1
warmup: 20
repeat: 100
backend: paddle_trt
profile_mode: end2end
include_h2d_d2h: false
use_fp16: false
collect_memory_info: true
sampling_interval: 1
precision_compare: false
xpu_l3_cache: 0
result_path: benchmark_gpu_paddle_trt_fp32_e2e_mem.txt

Some files were not shown because too many files have changed in this diff Show More