Commit Graph

1024 Commits

Author SHA1 Message Date
Zhang Yulong
5532e8a323 [FD CLI] Add bench cli (#4160)
* add bench cli

* Update test_main.py
2025-09-22 20:37:30 +08:00
chenjian
918ccdb123 [Feature] Support pd ep deployment with yiyan adapter (#4029)
* [Feature] Support mixed deployment with yiyan adapter in release2.2

* fix metrics

* add unit test

* add unit test

* add unit test

* Support pd ep deployment with yiyan adapter

* Support pd ep deployment with yiyan adapter

* refactor cache messager

* support scheduler v1 in PD

* suppport pd v1 + chunk prefill

* suppport pd v1 + chunk prefill

* add eplb

* support eplb

* support eplb

* support eplb

* support v1

* fix

* fix

* fix bug

* remove eplb support

* support prefix cache in P

* fix bug

* fix bug

* support one stop in V1

* fix bug

* fix ci

* fix ci

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-09-22 16:41:38 +08:00
lizexu123
c86945ef49 [Feature] support pool (#3827)
* support pool

* update pooling

* add pooler_config and check

* update

* support AutoWeightsLoader load weight

* fix

* update

* delete print

* update pre-commit

* fix

* fix xpu

* fix ModelRegistry->model_registry

* fix Copilot review

* fix pooler.py

* delete StepPooler

* fix abstract

* fix default_loader_v1

* fix Pre Commit

* support torch qwen3 dense

* add test and fix torch-qwen

* fix

* fix

* adapter ci:

* fix review

* fix pooling_params.py

* fix

* fix tasks.py 2025

* fix print and logger

* Modefy ModelRegistry and delete AutoWeightsLoader

* fix logger

* fix test_embedding

* fix ci bug

* ernie4_5 model_registry

* fix test

* support Qwen3-Embedding-0.6B tp=1 load

* fix extra code

* fix

* delete fix vocab_size

* delete prepare_params_dict

* fix:
2025-09-22 14:09:09 +08:00
chen
da74a5f0b3 fix glm all_reduce tp group (#4187) 2025-09-22 10:56:55 +08:00
RichardWooSJTU
91912cc2e1 fix t2i (#4163)
Some checks failed
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-09-19 18:07:13 +08:00
YuanRisheng
24180fba0a [FDConfig]Remove splitwise_role and engine_worker_queue_port in FDConfig (#4147)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* remove splitwise_role and engine_worker_queue_port

* fix xpu

* fix xpu

* fix xpu

* fix unittest

* resolve conflct
2025-09-19 17:01:52 +08:00
luukunn
ee9d8a840a [fix]Modify follow-up push parameters and Modify the verification method for thinking length (#4086)
* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* 续推参数  generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式

* add completion_token_ids

* add logger

* fix reasoning_max_tokens ParameterError

* add unittest

* add unittest

* add unittest

* add unittest

* add unittest

* add unit test
2025-09-19 14:26:01 +08:00
chen
66a98b44ed ep support logprob (#4089) (#4151) 2025-09-19 14:07:31 +08:00
Yuanle Liu
a685e5ad35 Each module should have its own plugins_loaded (#4164) 2025-09-19 14:06:10 +08:00
xiaolei373
ddf5606263 Bugfix test exception (#4171)
* feat(log):add_request_and_response_log

* modify default error type
2025-09-19 11:48:49 +08:00
Sunny-bot1
c3b8ebeb18 [Optimize] Machete using group scale default (#4121)
Some checks failed
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-18 13:51:11 +08:00
xiaolei373
98447beb4d Add param valid log (#4113)
* feat(log):add_request_and_response_log

* [bugfix] add param valid log

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-18 10:39:24 +08:00
chenjian
618ccdbfba [Feature] Support mixed deployment with yiyan adapter in develop (#3976)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] Support mixed deployment with yiyan adapter in release2.2

* fix metrics

* add unit test

* add unit test

* add unit test

* fix ci

* fix for eb5

* fix ci

* fix ci

* fix ci

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-18 01:52:20 +08:00
gaoziyuan
896e3bb606 [NewFeture]add ep rollout model init and update/clear ep buffer (#4039)
* fix gid

* merge

* fix test

* fix bug

* fix

* fix ci
2025-09-17 20:24:53 +08:00
qw86972190
b52971749c Print KV Cache available memory and block memory usage in GB format (#4148) 2025-09-17 20:01:55 +08:00
RichardWooSJTU
2adca04f1f Reconstruct streaming data transfer with zmq (#3836)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* reconstruct USE_GET_SAVE_OUTPUT_V1

* fix ut

* use dp rank

* fix ci
2025-09-17 14:30:39 +08:00
Jiang-Jia-Jun
f9766f917b [BugFix] Forbiden FD_DISABLED_RECOVER while ENABLE_V1_KVCACHE_SCHEDULER (#4142)
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-09-17 14:11:44 +08:00
YuanRisheng
2e9e53ff7e [FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config (#4116)
* remove max_num_batched_tokens in parallel config

* remove max_num_seqs

* update test case

* fix test

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-17 10:43:35 +08:00
chenjian
67e6d8c691 [Feature] Set prefix caching as default (#3814)
* Set prefix caching as default

* Set prefix caching as default

* Set prefix caching as default

* skip dynamic load scene

* fix kill bug

* fix kill bug

* fix kill bug

* fix

* fix

* fix ci
2025-09-16 20:34:27 +08:00
Jiang-Jia-Jun
a04365a0c7 Update api_server.py 2025-09-15 21:31:33 +08:00
YuanRisheng
03b3d6175d fix mtp (#4105) 2025-09-15 20:26:07 +08:00
co63oc
17a27170bc fix typos (#4093) 2025-09-15 18:33:30 +08:00
bukejiyu
113e330030 fix bf16 and add comments (#4106) 2025-09-15 17:23:07 +08:00
freeliuzc
69aa2781a1 [MTP]Support mtp reshard (#4099)
* support rl reshard

* modify model name
2025-09-15 17:13:53 +08:00
freeliuzc
46911f903d [MTP]update hybrid-mtp-with-ngram (#4047) 2025-09-15 17:13:31 +08:00
Yuanle Liu
b1b33211e8 [CUDAGraph] Support multi output buffers and merge some fixes from feature/exp_0908 (#4062)
* refine cudagraph

* refine cudagraph

* typo

* fix

* fix plugins

* fix

* update

* update

* update
2025-09-15 16:21:30 +08:00
zhupengyang
9409665713 [xpu] support ep (#4067)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-15 13:53:11 +08:00
bukejiyu
29ed617f0f [v1 loader]qwen Offline fp8 (#4036)
* support offline fp8

* update ut

* update ut

* update ut

* fix

* update

* update
2025-09-15 13:44:11 +08:00
Sunny-bot1
b1a5b756a3 [Optimize] Support WINT8 and group scale for Machete (#3905) 2025-09-15 12:01:34 +08:00
Zero Rains
f213ae1e86 [Bug Fix]fix the bug for cache_messager signal loss (#3879)
* fix the bug for real size 0 in cudagraph

* fix cache_messager
2025-09-15 11:16:24 +08:00
qwes5s5
553adb299e 【FastDeploy CLI】collect-env subcommand (#4044)
* collect-env subcommand

* trigger ci

---------

Co-authored-by: K11OntheBoat <your_email@example.com>
2025-09-15 10:31:23 +08:00
zhouchong
958abebeab Support offline inference with streaming output (#4071)
* Support offline inference with streaming output

* add unit test

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-15 10:27:03 +08:00
Ayakouji
987609c894 [BugFix] Fix image_feature 0-Size causing insert failed (#4042)
* update

* fix image_feature
2025-09-12 19:13:08 +08:00
xiaolei373
9ac539471d [format] Valid para format error info (#4035)
* feat(log):add_request_and_response_log

* 报错信息与OpenAI对齐
2025-09-12 19:05:17 +08:00
YuanRisheng
88ea565aba [BugFix]Fix load kv cache quant scale (#4077)
* fix kv cache

* fix kv_cache

* fix kv cache
2025-09-12 17:44:03 +08:00
SuperNova
805f29a06c [Feature] refactor metax_gpu attention and moe and remove some useless code (#3688)
Co-authored-by: yongqiangma <xing.wo@163.com>
2025-09-12 14:40:25 +08:00
qwes5s5
58e0785bab [metrics] update metrics markdown file (#4061)
* adjust md

* trigger ci

---------

Co-authored-by: K11OntheBoat <your_email@example.com>
2025-09-12 11:13:43 +08:00
co63oc
8466219ec8 fix typos (#3840)
* fix typos

* ci

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-09-12 11:04:38 +08:00
RichardWooSJTU
82dab8a91a Add token processor plugin support (#4059)
* Add token processor plugin support

* fix import

* fix import
2025-09-12 10:17:23 +08:00
chenjian
37f1632732 [Optimize] optimize prefix cache in develop (#3890)
* optimize prefix cache in release22

* fix

* fix

* fix

* add ci for v1

* add unit test

---------

Co-authored-by: xiegegege <46314656+xiegegege@users.noreply.github.com>
2025-09-12 10:15:59 +08:00
chen
4859f40b20 [Feature] GLM-45-AIR Support Mix Quantization(Dense wfp8afp8 and wint8 triton_moe_backend) (#4051)
Some checks failed
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-11 20:08:09 +08:00
lddfym
2056a428bd [bug fix] Fix the placeholder in qwen prompt and add some unittests (#4065)
* fix the placeholder in qwen prompt

* fix the placeholder in qwen prompt

* add soem unittests for qwen_vl_processor
2025-09-11 20:00:02 +08:00
memoryCoderC
850465e8ed [Feature] add cli command chat,complete (#4037) 2025-09-11 19:53:14 +08:00
zhuzixuan
a47976e82d [Echo] Support more types of prompt echo (#4022)
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.

---------

Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com>
2025-09-11 19:34:44 +08:00
xiaoxiaohehe001
abdcef30aa [BugFix] mm_post_fix (#4005)
* mm_post_fix

* mm_post_fix_1
2025-09-11 19:09:46 +08:00
CSWYF3634076
e4c64a71cc [BugFix] qwen2.5vl enable_thinking=true and image_patch_id bug fix (#3921) 2025-09-11 15:08:24 +08:00
bukejiyu
2650f58740 [docs] Update environment variables documentation (#3957)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-10 21:17:06 -07:00
AIbin
a7392a0ff9 【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886)
* support MLA chunk_size auto search & cuda_graph
2025-09-11 10:46:09 +08:00
chen
637d96c6ae [Feature] Support zai-org/GLM-4.5-Air BF16 model (#3928)
Some checks failed
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support glm45_air
2025-09-10 19:36:10 +08:00
ltd0924
684e93269b [Fix] fix multi api server log dir (#3967)
* [BugFix] fix max streaming tokens invalid

* fix scheduler bug

* fix scheduler bug

* Update multi_api_server.py
2025-09-10 17:15:30 +08:00