Commit Graph

511 Commits

Author SHA1 Message Date
Ryan
e58fed3665 [Graph Optimization][BugFix][CI] Fix 0size bug && add unitest (#5495) 2025-12-11 16:25:26 +08:00
YuBaoku
9f4512c932 [CI] disable test_cuda_graph_dynamic_subgraph.py in unit_test 2025-12-11 14:12:49 +08:00
qwes5s5
d79438bb86 add detoken switch (#5463)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-10 21:44:02 +08:00
zccjjj
03819f30c3 [CI][XPU] ep+prefix cache+chunk prefill (#5489) 2025-12-10 19:39:49 +08:00
luukunn
fbc9bce1e9 [Feature]Optimization of Thinking Pattern Framework (#4302)
* add model status in vl

* add x1 parser

* add model_status

* fix parser

* fix parser

* fix parser

* fix parser

* Revert "fix parser"

This reverts commit 300f446d8a.

* fix parser

* fix

* fix

* fix

* fix

* fix parser

* fix unit test

* fix unit test

* add unit test

* fix

* fix

* add unit test

* fix unit test

* add unit test

* add unit test

* fix unit test

* fix unit test

* fix bug

* fix unit test

* x1 tool parser

* fix unit test

* fix unit test

* fix unit test

* fix n

* fix unit test

* add unit test

* add unit test

* remove pring
2025-12-10 16:17:06 +08:00
ming1753
9e15191cce [BugFix] fix audio end bug (#5464) 2025-12-10 13:37:26 +08:00
Echo-Nie
1b1bfab341 [CI] Add unittest (#5328)
* add test_worker_eplb

* remove tesnsor_wise_fp8

* add copyright
2025-12-09 19:19:42 +08:00
lizexu123
95eab9f9ee [Feature] support stop_token_ids (#5399)
* support stop_token_ids

* fix

* delete chinese

* support both

* delete print
2025-12-09 17:49:12 +08:00
Haonan Luo
e397c4fba6 [Others] remove add_bias option (#5425) 2025-12-09 17:39:35 +08:00
lizexu123
b0cf2c4b7a [Feature] Support prefill batch inference for pooling models. (#5436)
* fix multi-inputs

* fix threshold

* fix threshold

* fix

* support multi-batch

* add tests

* fix test

* test

* fix
2025-12-09 16:21:00 +08:00
Juncai
83ea9646f9 [PD Disaggregation] Unify the disaggregation info and the pd communication (#5438)
* Unify the disaggregation info and the pd communication

* up

* up

* fix

* fix conflict

* fix unittest
2025-12-09 14:44:59 +08:00
Nyakku Shigure
e1c4a12e34 [Graph Optimization][CINN] Use CINN in PaddleOCR-VL ViT part (#5223)
---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-09 14:37:00 +08:00
K11OntheBoat
8d99bac532 Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-12-09 14:17:30 +08:00
kevin
f7e832efaf [BugFix] fix mm cudagraph (#5266)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mm cudagraph

* fix test_prompt_ids bug

* update code

* update ci code

* update ci code

* update ci code
2025-12-09 11:51:00 +08:00
zhouchong
5d9b5e4a5b [Engine] [Feature] Refactor async_llm:cross-process with EngineService,based on zmq communication (#4868)
* Refactor async_llm:cross-process with EngineService

* fix: async_llm output process

* fix: return prompt_token_ids and prompt_tokens in first res

* optimize common_engine start func
2025-12-09 10:53:40 +08:00
SunLei
5fb93d84f5 [Feature] [Benchmark]: add ZMQ-based FMQ implementation and benchmark tools (#5418)
* feat(fmq): add ZMQ-based FMQ implementation and benchmark tools

* move FMQ_CONFIG_JSON to envs

* fix top_p_candidates (#5400)

Co-authored-by: freeliuzc <lzc842650834@gmail.com>

* [RL] Support Rollout Routing Replay (#5321)

* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>

* [Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374)

* fix multi-inputs

* fix threshold

* fix threshold

* fix

* [BugFix]remove _execute_empty_input (#5396)

* Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)

This reverts commit 96d2d4877b.

* [New][RL] Support Rollout Routing Replay (#5405)

* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

* Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)"

This reverts commit c45e064f3d.

* Fix XPU and NPU bug

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>

* bf16 deepseek (#5379)

* fix deepseek (#5410)

* Update tests/inter_communicator/test_fmq_factory.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update benchmarks/benchmark_fmq.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/inter_communicator/fmq.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: GoldPancake <56388518+Deleter-D@users.noreply.github.com>
Co-authored-by: freeliuzc <lzc842650834@gmail.com>
Co-authored-by: RAM <gstian5555@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: 周周周 <39978853+zhoutianzi666@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
2025-12-08 22:04:49 +08:00
kesmeey
d1bd40d44c [CI]【Hackathon 9th Sprint Example NO 16】功能模块 fastdeploy/input/ernie4_5_vl_processor/process.py 单测补充 (#5264)
* test: add unit tests for process.py (NO.16)

* update

* update filename

* update filename

* update

* update

* fix failed  testcases

* simplify the code

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-08 14:30:15 +08:00
周周周
2aea8a3a60 [Others] Remove useless code (#5404) 2025-12-08 13:59:46 +08:00
Juncai
80efe98f8d [PD Disaggregation] Add timestamp for analyzing splitwise deployment (#5317)
* Add timestamp for analyzing splitwise deployment

* up

* up

* up

* up

* up

* up

* fix format

* fix
2025-12-08 10:08:44 +08:00
RAM
b2908b8e82 [New][RL] Support Rollout Routing Replay (#5405)
* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

* Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)"

This reverts commit c45e064f3d.

* Fix XPU and NPU bug

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-12-05 22:06:26 +08:00
Jiang-Jia-Jun
c45e064f3d Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)
This reverts commit 96d2d4877b.
2025-12-05 20:19:39 +08:00
lizexu123
d4979347ca [Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374)
* fix multi-inputs

* fix threshold

* fix threshold

* fix
2025-12-05 20:18:17 +08:00
RAM
96d2d4877b [RL] Support Rollout Routing Replay (#5321)
* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-12-05 20:01:33 +08:00
kevin
c9d7f9e7c3 [BugFix] fix async download bug (#5349)
* fix async download bug

* update log

* Revert "update log"

This reverts commit 5816e602f4.

* update code

* fix mtp bug
2025-12-05 18:59:12 +08:00
zccjjj
5b900667e3 [XPU] support ep4tp1+v1 loader (#5398) 2025-12-05 18:51:15 +08:00
zccjjj
e927c65742 [XPU] [Optimization] [EP] EP communication optimization. (#5145)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-05 10:03:45 +08:00
YuBaoku
1b5fd79d6b [CI] disable test_schedule_output.py in unit_test (#5377) 2025-12-04 23:18:23 +08:00
chenjian
3878a99b69 [Fearture] Support cache kv cache for output tokens (#4535)
* [Fearture] Support cache kv cache for output tokens

* fix bug

* fix ci bug

* improve coverage

* enable output caching by default

* fix ci

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-12-04 20:53:08 +08:00
Longzhi Wang
5cd17fd662 [Models] Add forward_meta to moe models' forward function (#5138)
* [Models] Add forward_meta to moe models' forward function

* fix missing param

* fix

* fix

* fix forward_meta

* fix test and remove chunked MoE releated in config

* fix test

* fix

* fix
2025-12-04 13:26:58 +08:00
Juncai
f5bdb36e9b Reduce timeout in unittest (#5366) 2025-12-04 13:19:02 +08:00
lizexu123
946025480e [Bug fix] fix pooling models (#5358)
* fix

* fix

* fix test

* fix gpu_model_runner

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-04 11:06:30 +08:00
qwes5s5
a52aea073c fix logprobs (#5335) 2025-12-04 10:38:51 +08:00
ming1753
5f8d4aedea [Feature] support audio tts (#5333) 2025-12-03 21:06:48 +08:00
Daci
83dbc4e5dd [Feature] Guided Decoding add LLguidance backend (#5124)
* llguidance

* add requirements_guided_decoding.txt and doc

* fix test_guidance_*.py

* fix test_guidance_*.py && mv

* fix llguidance choice

* test_guidance_*

* rm lazy loader

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-03 20:23:57 +08:00
lzy
f458cc5ba4 [Optimization]1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5353)
* [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM

* fix test_chunked_moe

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-03 16:42:10 +08:00
YuBaoku
dfeabee123 [CI] Allow occasional distributed worker exit_code (#5341) 2025-12-03 10:56:59 +08:00
YuBaoku
3e2c13d8c5 [CI] Disable queue state assertion temporarily (#5329) 2025-12-02 18:57:29 +08:00
Sunny-bot1
3629db4129 [Quantization] Support w4afp8 MoE dynamic quantization (#5282)
* support dynamic activation quant for w4afp8

* support dynamic w4afp8

* add test

* fix

* fix

---------

Co-authored-by: zhoutianzi666 <17801055074@163.com>
2025-12-02 18:56:16 +08:00
周周周
fb7f951612 [UNITEST] add test (#5305) 2025-12-02 17:59:01 +08:00
Jiaxin Sui
8e0f4dfd0c [XPU] [CI] Xpu Ci Refactor (#5252)
* add xpu ci

* add case

* add case

* fix ci bug

* Update Docker image tag to 'latest' in CI workflow

* Fix set -e usage in run_xpu_ci_pytest.sh

* add pd case

* add case

* Configure pip to use Tsinghua mirror for dependencies

Set the global pip index URL to Tsinghua mirror.

* fix ci bug

* fix bug

* fix bug

---------

Co-authored-by: suijiaxin <suijiaxin@Suis-MacBook-Pro.local>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511964.gajl.baidu.com>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
2025-12-02 17:15:51 +08:00
YuBaoku
69e003abcb [CI] Fix return_code check in test_chunked_moe.py (#5326) 2025-12-02 15:41:26 +08:00
lizexu123
c563eca791 [Feature] support reward model (#5301)
* Your commit message here

* add test

* update develop

* support reward

* support enable_chunk_prefill

* support bingfa

* support convert is reward

* update test

* delete print

* fix enable_thinking

* add document

* fix place

* fix test

* fix

* support enable_prefix_caching

* add no-enable_prefix-caching test

* fix

* support enable_prefix_caching

* delete print

* fix document

* fix

* fix test

* fix document and delete chinese

* udpate

* enable_thinking

* fix test
2025-12-02 14:55:31 +08:00
qwes5s5
117980dd4e [LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. (#5089)
* add prompt logprobs

* Merge prompt_logprobs_tensors and prompt_logprobs

* fix param check

* trigger ci

* fix unitest

* fix logprobs bug
2025-12-02 13:49:51 +08:00
YuanRisheng
af39819fcd Revert "[CI] 【Hackathon 9th Sprint No.18】NO.18 功能模块单测补充 (#5064)" (#5290)
This reverts commit 7bac016c77.
2025-12-02 13:43:36 +08:00
YuanRisheng
ded7765dec Revert "[CI] 【Hackathon 9th Sprint No.41】NO.41 功能模块单测补充 (#5062)" (#5291)
This reverts commit 373b5c3807.
2025-12-02 13:43:13 +08:00
YuBaoku
68533ebd95 [CI] disable test_chunked_moe.py in unit_test (#5322) 2025-12-02 10:39:50 +08:00
xiaolei373
84e2f6aa75 [CI]add clear to run-batch ci (#5307)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-12-01 21:18:19 +08:00
Jiaxin Sui
b0113cb0fc [XPU][CI] Change XPU CI Base Value (#5318)
* Add '小度' keyword to assertion in run_w4a8.py

* Add keywords to assertion in run_ep_online.py

* Add keywords to assertion in run_w4a8.py

* Update run_45T.py

* Update run_ep_online.py

* Refactor assertion for response content keywords

* Update run_w4a8.py

* Update run_w4a8.py
2025-12-01 21:02:09 +08:00
Juncai
0925d44f18 [PD Disaggregation] support different tp_size for prefill and decode (#5296)
* up

* up

* up

* fix
2025-12-01 17:50:20 +08:00
Jiaxin Sui
b467e9dadc [XPU][CI]Change W4A8 Case Base Value (#5309) 2025-12-01 15:25:33 +08:00