kxz2002
24b85b752b
[Cherry-Pick] Unify the registration name recognition for tool_parser and reasoning_parser to “-” ( #4668 ) ( #4737 )
...
CE Compile Job / ce_job_pre_check (push) Waiting to run
CE Compile Job / print_ce_job_pre_check_outputs (push) Blocked by required conditions
CE Compile Job / FD-Clone-Linux (push) Blocked by required conditions
CE Compile Job / Show Code Archive Output (push) Blocked by required conditions
CE Compile Job / BUILD_SM8090 (push) Blocked by required conditions
CE Compile Job / BUILD_SM8689 (push) Blocked by required conditions
CE Compile Job / CE_UPLOAD (push) Blocked by required conditions
* [Feature] add a new reasoning parser (#4571 )
* add new reasoning_parser initial commit
* add parser file content
* add register
* ernie_test_reasoning_parser
* support <tool_call> token and add tool_parser
* add and fix unit tests
* modify reasoning_parser
* modify reasoning parser and tool parser
* modify unit tests
* modify reasoning_parser and tool_parser
* modify unit tests
* fix tool_parser
* modify the logic of reasoning_parser and tool_parser
* add and modify unit tests
* standardize code style
* simplify reasoning_parser and tool_parser
* modify unit test
* [BugFix] Fix finish reason in _create_chat_completion_choice (#4582 )
* fix n_param _create_chat_completion_choicel
* fix unit test
* fix final_res
* modify unit tests
* [BugFix] fix offline llm chat "enable_thinking" is always "False" (#4686 )
* fix enable_thinking
* recover ernie4_5_vl_processor
* [Feature] Unify the registration name recognition for tool_parser and reasoning_parser to “-” (#4668 )
* parser register name unify
* change ernie_x1 to ernie-x1
* change ernie4_5_vl to ernie-45-vl
* fix unit test
2025-10-31 23:27:21 +08:00
Longzhi Wang
d11e27a188
[Bugfix] fix test_get_save_output_v1 ( #4732 )
2025-10-31 22:52:04 +08:00
ming1753
96a44c8574
Skip building native architecture when specifying arch list ( #4728 )
2025-10-31 20:45:49 +08:00
plusNew001
e4463c37fe
[XPU][CI] Release ci fix bug ( #4742 )
...
* Clean up XVLLM_PATH setup in run_ci_xpu.sh
Removed XVLLM_PATH setup and related wget command.
* Update xvllm version in download_dependencies.sh
* Update run_ci_xpu.sh
2025-10-31 20:30:35 +08:00
ddchenhao66
ce53cdccd2
[XPU] xpu support neox style ROPE ( #4723 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-31 18:17:21 +08:00
RAM
00d0da0c18
[Graph Optimization] Add the CUDAGraph usage switch for Draft Model ( #4669 )
...
* add draft model using cudagraph switch
* set default as false
* capture draft model in ci
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-31 17:34:09 +08:00
Ryan
9a647cb61c
[OP] Add InferShape&InferDtype for per_token_quant_padding ( #4667 ) ( #4683 )
...
* add InferShape&InferDtype for per_token_quant_padding
* fix codestyle
2025-10-31 16:42:52 +08:00
plusNew001
7b013c63e2
[XPU ][CI] Release XPU ci update ( #4722 )
...
* Refactor CI script for paddlepaddle-xpu installation
Updated the CI script to install specific paddlepaddle-xpu version and modified the testing commands for better performance and error handling.
* Add test script for OpenAI client interaction
* Remove empty line at the beginning of run_45vl.py
2025-10-31 15:36:14 +08:00
kevin
139342d953
fix bug ( #4680 )
2025-10-31 15:23:33 +08:00
李泳桦
9cf4005e62
[Cherry-pick] Fix profile run in pd-disaggregated deployment ( #4693 )
...
* [fix] fix pd+dp+ep bug
* [fix] fix again
* [ci] fix code style
2025-10-31 14:41:35 +08:00
chen
802dfa6524
fix --logprobs-mode raw_logits ( #4681 ) ( #4712 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-10-31 10:50:31 +08:00
ddchenhao66
2e7b7a42c2
[XPU] xpu currently disable prefix cache for VL model ( #4694 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-31 10:37:41 +08:00
ming1753
40b87065cc
fix docs bug ( #4703 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-10-30 19:53:40 +08:00
ming1753
9defdaed6b
[BugFix] Fix PaddleOCRVL bug ( #4678 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-10-30 13:49:08 +08:00
ApplEOFDiscord
52a6e0be41
[Cherry-Pick] add mm token usage ( #4648 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Feature] add mm token usage (#4570 )
* add mm token usage
* fix unit test
* fix unit test
* fix unit test
* fix model path
* fix unit test
* fix unit test
* fix unit test
* remove uncomment
* change var name
* fix code style
* fix code style
* fix code style
* fix code style
* fix unit test
* update doc
* update doc
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-30 09:58:07 +08:00
kxz2002
895ca7694e
[Feature] add a new reasoning parser ( #4571 ) ( #4664 )
...
* add new reasoning_parser initial commit
* add parser file content
* add register
* ernie_test_reasoning_parser
* support <tool_call> token and add tool_parser
* add and fix unit tests
* modify reasoning_parser
* modify reasoning parser and tool parser
* modify unit tests
* modify reasoning_parser and tool_parser
* modify unit tests
* fix tool_parser
* modify the logic of reasoning_parser and tool_parser
* add and modify unit tests
* standardize code style
* simplify reasoning_parser and tool_parser
* modify unit test
2025-10-30 09:49:53 +08:00
Lucas
df72033adb
[XPU] fix pos_emb_type bug ( #4639 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-10-29 17:14:47 +08:00
ming1753
7b275efc59
[Docs] Add PaddleOCR-VL-0.9B best practices ( #4661 )
2025-10-29 16:58:38 +08:00
ddchenhao66
21bb2d69d1
[XPU] Update the return value of TextImageGatherScatter ( #4646 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-29 16:17:23 +08:00
RAM
a0d5426ab6
fix ci test case ( #4635 )
2025-10-29 13:26:38 +08:00
xiaolei373
14e7d88ea4
[feature] support reward api ( #4518 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: SunLei <sunlei5788@gmail.com >
2025-10-29 00:20:28 +08:00
李泳桦
a012e3608b
[Feature] support logits processors ( #4515 )
...
* [feat] provide an interface for logits processors and a builtin LogitBiasLogitsProcessor
* [chore] fix code style
* [fix] add unit test & fix existing bugs
* [feat] add engine/worker arg --logits-processors
* [fix] redefine user args as logits_processors_args and fix some bugs
* [fix] fix test_sampler
* Update fastdeploy/model_executor/logits_processor/builtin.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/model_executor/logits_processor/__init__.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/model_executor/test_logits_processor.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [fix] fix typo
* Update fastdeploy/engine/sampling_params.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [fix] fix bracelet
* [chore] redefine logits processor interface: pass the entire share_inputs into LP, do not copy share_inputs and logits
* [doc] add docs
* [fix] fix logit bias processor not applied when decoding is too fast & add docs and tests
* [fix] fix redundant code
* [feat] skip apply() if no bias is specified
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-10-29 00:08:53 +08:00
zhang-prog
24b9505971
add einops dependency ( #4633 )
2025-10-28 22:17:13 +08:00
Yuanle Liu
20756cd2bb
fix import jit.marker.unified ( #4622 )
2025-10-28 22:11:03 +08:00
ming1753
561b9f38d3
[BugFix] fix paddleocr prefix cache bug ( #4625 )
...
* fix paddleocr prefix cache bug
* disable prefix-caching in ocr
2025-10-28 21:38:12 +08:00
RAM
fff5fb5e39
[Graph Optimization] Refactor default capture list ( #4617 )
...
* fix bug and refine code
* add debug count
* refine code
2025-10-28 21:31:02 +08:00
Lucas
0a0c74e717
[XPU] Support PaddleOCR-VL model for XPU ( #4529 )
...
* [XPU] support PaddleOCR-VL in XPU
* [XPU] fix PaddleOCR-VL pos_emb_type
2025-10-28 20:35:04 +08:00
SunLei
2a9ed72533
feat: add support for API usage with multimodal models ( #4548 )
...
* feat: add support for API usage with multimodal models
* completion_tokens contains num_image_tokens
* remove test_request.py
* fix: paddle.device.is_compiled_with_cuda()
* fix test_unstream_without_logprobs
2025-10-28 20:23:46 +08:00
YuBaoku
e1ac90d787
[CI] Revert test_rollout_model directory change ( #4626 )
2025-10-28 20:14:00 +08:00
zhouchong
567f61072c
[CI][BugFix] fix port conflicts in concurrent ci test and add more unit test on async_llm ( #4616 )
...
* fix:port conflicts in concurrent ci test
* add more unit test on async_llm
2025-10-28 19:04:24 +08:00
yyssys
cd6d1f633c
[XPU]add xpu ci w4a8 case ( #4501 )
2025-10-28 19:02:29 +08:00
Ryan
07956a87b3
[Graph Optimization] Fix IR graph dependency error exposed after enabling SOT by updating the return value of TextImageGatherScatter ( #4610 )
...
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* fix TextImageGatherScatter in sot
* fix codestyle
2025-10-28 18:31:23 +08:00
lizhenyun01
4d2f478d53
[BugFix] fix TPDP mix parallel infer ( #4583 )
...
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-10-28 16:58:20 +08:00
freeliuzc
c63361fd1d
[Speculative Decoding][MTP]Support mtp in epdptp mode ( #4614 )
...
* support mtp many features
* support mtp reshard in rl mode
* fix function
* support mtp ep
* support mtp in hybird-dp-tp mode
* default open scheduler_v1 in mtp
2025-10-28 16:02:47 +08:00
Zhang Yulong
b4014834a9
Extend sleep time to 10 seconds in switch_service ( #4618 )
...
Increase sleep duration before switching services.
2025-10-28 15:19:21 +08:00
RAM
86d5006a57
[Graph Optimization][Speculative Decoding] Update yaml and fix typo ( #4612 )
2025-10-28 11:43:26 +08:00
YuBaoku
b2c6c41447
[CI] Relocate server test cases from ci_use directory to e2e ( #4608 )
2025-10-28 11:37:30 +08:00
xiaolei373
31180a6a13
fix_run_batch_unittest ( #4613 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-10-28 10:38:06 +08:00
xiaolei373
0b196d82f3
[docs] add cli uasge to docs ( #4569 )
2025-10-28 10:35:11 +08:00
Daci
6426414a0f
[Feature] EngineWorkerQueue anonymous port ( #4597 )
...
* EngineWorkerQueue 支持匿名端口设置
* EngineWorkerQueue 支持匿名端口设置
* EngineWorkerQueue 支持匿名端口设置
* EngineWorkerQueue 支持匿名端口设置
* EngineWorkerQueue 支持匿名端口设置
2025-10-28 10:22:37 +08:00
ming1753
7681375a19
[BugFix] PaddleOCR-VL fix FD_DEBUG type and support v1 loader ( #4605 )
...
* [Bug Fix] PaddleOCRVL fix FD_DEBUG type and support HF model
* fix bug
* fix bug
* fix bug
2025-10-28 09:47:47 +08:00
zhouchong
6dcf5a3e87
fix: resolve decode bug in offline stream output ( #4603 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-10-27 20:17:10 +08:00
周周周
3729e910a6
remove dev sync in prefill ( #4598 )
2025-10-27 19:54:43 +08:00
K11OntheBoat
64d1aa973b
[Unitest]Add unitest of Attention Layer ( #4494 )
2025-10-27 19:18:50 +08:00
ophilia-lee
70aa7423f8
benchmark工具适配SGLang框架 ( #4607 )
...
* benchmark工具适配SGLang框架
* benchmark工具适配SGLang框架
* benchmark工具适配SGLang框架
2025-10-27 18:52:56 +08:00
ddchenhao66
c91c5040c4
[XPU] update kunlun doc about supported models ( #4586 )
...
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-27 18:31:51 +08:00
RAM
25a983ba9c
1.fix the bug of draft model with ep 2.fix sampler bug ( #4589 )
2025-10-27 17:47:34 +08:00
kevin
8aab4e367f
[Feature] mm support prefix cache ( #4134 )
...
* support mm prefix caching
* update code
* fix mm_hashes
* support encoder cache
* add encoder cache
* update code
* update encoder cache
* fix features bug
* fix worker bug
* support processor cache, need to optimize yet
* refactor multimodal data cache
* update code
* update code
* update v1 scheduler
* update code
* update code
* update codestyle
* support turn off processor cache and encoder cache
* update pre-commit
* fix code
* solve review
* update code
* update code
* update test case
* set processor cache in GiB
* update test case
* support mm prefix caching for qwen model
* fix code style check
* update pre-commit
* fix unit test
* fix unit test
* add ci test case
* fix rescheduled bug
* change text_after_process to prompt_tokens
* fix unit test
* fix chat template
* change model path
* [EP] fix adapter bugs (#4572 )
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* fix v1 hang bug (#4573 )
* fix import image_ops error on some platforms (#4559 )
* [CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool (#4558 )
* add collect-env
* del files
* [Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching (#4578 )
* add new branch for sot
* reorder
* fix batch bug
* [XPU]Moe uses a new operator (#4585 )
* [XPU]Moe uses a new operator
* [XPU]Moe uses a new operator
* update response
* [Feature] Support Paddle-OCR (#4396 )
* init
* update code
* fix code style & disable thinking
* adapt for common_engine.update_mm_requests_chunk_size
* use 3d rope
* use flash_attn_unpadded
* opt siglip
* update to be compatible with the latest codebase
* fix typo
* optim OCR performance
* fix bug
* fix bug
* fix bug
* fix bug
* normlize name
* modify xpu rope
* revert logger
* fix bug
* fix bug
* fix bug
* support default_v1
* optim performance
* fix bug
---------
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
* [DataProcessor] add reasoning_tokens into usage info (#4520 )
* add reasoning_tokens into usage info initial commit
* add unit tests
* modify unit test
* modify and add unit tests
* fix unit test
* move steam usage to processor
* modify processor
* modify test_logprobs
* modify test_logprobs.py
* modify stream reasoning tokens accumulation
* fix unit test
* perf: Optimize task queue communication from engine to worker (#4531 )
* perf: Optimize task queue communication from engine to worker
* perf: get_tasks to numpy
* perf: get_tasks remove to_numpy
* fix: request & replace ENV
* remove test_e2w_perf.py
* fix code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Clean up ports after processing results (#4587 )
* [CI] Add /re-run command in PR comments to restart failed CI workflows (#4593 )
* [Others] api server exits when worker process is dead (#3271 )
* [fix] fix terminal hangs when worker process is dead
* [chore] change sleep time of monitor
* [chore] remove redundant comments
* update docs
---------
Co-authored-by: ApplEOFDiscord <wwy640130@163.com >
Co-authored-by: ApplEOFDiscord <31272106+ApplEOFDiscord@users.noreply.github.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: yinwei <yinwei_hust@163.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: qwes5s5 <45442318+qwes5s5@users.noreply.github.com >
Co-authored-by: Ryan <zihaohuang@aliyun.com >
Co-authored-by: yyssys <atyangshuang@foxmail.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: kxz2002 <115912648+kxz2002@users.noreply.github.com >
Co-authored-by: SunLei <sunlei5788@gmail.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com >
2025-10-27 17:39:51 +08:00
YuBaoku
a4fb3d4ff0
[CI] Fix path error of /re-run ( #4606 )
...
* [CI] Add /re-run command in PR comments to restart failed CI workflows
* [CI] Fix /re-run command
2025-10-27 17:03:11 +08:00
chen
5c63a089f6
[Feature] Support logprobs_mode ( #4567 )
2025-10-27 14:27:48 +08:00
CSWYF3634076
acd331780c
[V1 loader] Qwen25 VL support v1 loader and torch style safetensors load ( #4388 )
...
* [BugFix] qwen2.5vl enable_thinking=true and image_patch_id bug fix
* [Docs]offine infer add apply_chat_template add_generation_prompt parameter
* [Model]qwen2.5VL support --use-cudagraph
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v2
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v3
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v4
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v5
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v6
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v7
* qwen25vl v1 loader
* qwen25vl v1 loader v2
* qwen25vl v1 loader v3
* qwen25vl v1 loader fix tp2 weight PySafeSlice
* qwen25vl v1 loader no test
* qwen25vl v1 loader add unit test
* qwen25vl v1 loader add unit test v2
* qwen25vl v1 loader add torch unit test v3
* qwen25vl v1 loader add torch unit test v4
* qwen25vl v1 loader add torch unit test v5
* qwen25vl v1 loader add torch unit test v6
2025-10-27 10:54:15 +08:00
Lucas
5c6105f4a2
[XPU] bind some OPs for VL model with pybind ( #4522 )
2025-10-27 10:50:08 +08:00
李泳桦
cdc40cdc2a
[Others] api server exits when worker process is dead ( #3271 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [fix] fix terminal hangs when worker process is dead
* [chore] change sleep time of monitor
* [chore] remove redundant comments
2025-10-27 10:23:48 +08:00
YuBaoku
ebae69b1f8
[CI] Add /re-run command in PR comments to restart failed CI workflows ( #4593 )
2025-10-27 10:18:56 +08:00
Zhang Yulong
83b720804b
Clean up ports after processing results ( #4587 )
2025-10-27 10:13:24 +08:00
SunLei
dc1a9c7287
perf: Optimize task queue communication from engine to worker ( #4531 )
...
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* perf: Optimize task queue communication from engine to worker
* perf: get_tasks to numpy
* perf: get_tasks remove to_numpy
* fix: request & replace ENV
* remove test_e2w_perf.py
* fix code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-25 22:45:38 +08:00
kxz2002
327fa4c255
[DataProcessor] add reasoning_tokens into usage info ( #4520 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* add reasoning_tokens into usage info initial commit
* add unit tests
* modify unit test
* modify and add unit tests
* fix unit test
* move steam usage to processor
* modify processor
* modify test_logprobs
* modify test_logprobs.py
* modify stream reasoning tokens accumulation
* fix unit test
2025-10-25 16:57:58 +08:00
ming1753
e4e3cede7f
[Feature] Support Paddle-OCR ( #4396 )
...
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* init
* update code
* fix code style & disable thinking
* adapt for common_engine.update_mm_requests_chunk_size
* use 3d rope
* use flash_attn_unpadded
* opt siglip
* update to be compatible with the latest codebase
* fix typo
* optim OCR performance
* fix bug
* fix bug
* fix bug
* fix bug
* normlize name
* modify xpu rope
* revert logger
* fix bug
* fix bug
* fix bug
* support default_v1
* optim performance
* fix bug
---------
Co-authored-by: root <root@szzj-acg-tge1-fdda9.szzj.baidu.com >
Co-authored-by: zhangyue66 <zhangyue66@baidu.com >
2025-10-24 23:34:30 +08:00
yyssys
822dea8d5f
[XPU]Moe uses a new operator ( #4585 )
...
* [XPU]Moe uses a new operator
* [XPU]Moe uses a new operator
* update response
2025-10-24 23:01:46 +08:00
Ryan
f42ed6d5f2
[Graph Optimization] Add dy_runnable and introduce cudagraph_switch_threshold for cudagraph mode switching ( #4578 )
...
* add new branch for sot
* reorder
* fix batch bug
2025-10-24 18:36:52 +08:00
qwes5s5
e02a812880
[CLI]Update parameters in bench latecy cli tool and fix collect-env cli tool ( #4558 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* add collect-env
* del files
2025-10-24 16:46:45 +08:00
JYChen
83d45af1f3
fix import image_ops error on some platforms ( #4559 )
2025-10-24 16:09:20 +08:00
yinwei
5fbc653238
fix v1 hang bug ( #4573 )
2025-10-24 15:35:10 +08:00
ltd0924
b60ce4922b
[EP] fix adapter bugs ( #4572 )
...
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
2025-10-24 12:30:08 +08:00
李泳桦
8edc5cca91
[BugFix] fix create_cache_tensor for ep ( #4542 )
...
* [fix] fix create_cache_tensor for ep
* [fix] fix again
2025-10-24 11:31:13 +08:00
xiaozude
f7069b8057
[Metax] adapt DeepSeek ( #4498 )
2025-10-24 10:14:53 +08:00
Sunny-bot1
8718fa34b2
support static C8 ( #4568 )
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-10-23 22:01:03 +08:00
RAM
e36343d807
[FDConfig]Turn on the CUDAGraph + PD Disaggregation switch ( #4530 )
2025-10-23 21:05:14 +08:00
RAM
9dc5c3e370
[Graph Optimization] Support CUDAGraph Padding + MTP ( #4545 )
...
* Support CUDAGraph Padding + MTP
* support orther write cache kernel
2025-10-23 20:57:26 +08:00
RichardWooSJTU
5a8c60454e
[BugFix] Fix decode_type which has been deleted in req and optimize token client retry scheme ( #4564 )
2025-10-23 05:08:10 -07:00
zhupengyang
3a43dbf82d
[XPU] merge apply_tp, ops support token_num = 0 ( #4507 )
2025-10-23 19:09:58 +08:00
Sunny-bot1
4ffe41a747
WINT4/WINT8 dense gemm default use Machete ( #4451 )
2025-10-23 17:57:59 +08:00
YuBaoku
a240425db9
[CI] Optimize coverage upload reporting ( #4547 )
...
* [CI] Optimize coverage upload reporting
* [CI] fix upload reporting
* [CI] fix code style
2025-10-23 17:01:48 +08:00
ddchenhao66
5443b2cffb
[XPU] xpu support think length limit ( #4539 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [XPU] xpu support think length limit
* [XPU] xpu c++ code files format
---------
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-23 15:58:11 +08:00
tianlef
2676a918f0
[Doc]fix deepseek ce ( #4560 )
2025-10-23 14:09:11 +08:00
luukunn
bbf06b9ff7
[BugFix]Fix finish reason ( #4543 )
...
* fix finish reason
* add unit test
* add unit test
* fix unie test
* fix unit test
2025-10-23 14:04:43 +08:00
YuanRisheng
ac4f5ca272
delete useless code ( #4544 )
...
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
2025-10-23 13:40:34 +08:00
RAM
8a02ab43a8
[FDConfig]Turn on the CUDAGraph + RL switch ( #4508 )
...
* Turn on the CUDAGraph + RL switch
* reduce max_num_seqs and number of request
2025-10-23 11:08:07 +08:00
plusNew001
918e4e9850
[XPU] Change XPU stable third-party version ( #4524 )
...
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* add xpu ci case
* Add xDeepEP download and build steps
Download and build xDeepEP before running tests.
* Fix formatting and add missing sleep command
* Update Docker image version in CI workflow
* Modify run_ci_xpu.sh for log cleanup and error handling
Clean up log files before running tests and output worker log on failure.
* Enhance test_ep.py with process management and assertions
Refactor test function to include process cleanup and assertions.
* Replace test_fastdeploy_llm with test_fd_ep
* Fix conditional statement in run_ci_xpu.sh
* Update test_ep.py for string handling and formatting
Fix string encoding issues and improve readability.
* Rename test_ep.py to run_ep.py
* Change test script from test_ep.py to run_ep.py
* Update dependency versions for stable release
* Install pytest-timeout and modify test execution
Added pytest-timeout installation and updated test command.
2025-10-22 19:43:03 +08:00
zhupengyang
3a6883ac1a
c++ code format ( #4527 )
2025-10-22 17:59:50 +08:00
周周周
d7bcedf421
small change in test_fusedmoe.py ( #4538 )
2025-10-22 17:49:18 +08:00
Yuanle Liu
8e02a509c3
[CI] stable test_rollout_model.py ( #4536 )
...
* stable test_rollout_model.py
* update baseline
* update baseline
2025-10-22 01:59:44 -07:00
zhouchong
dce988824d
[Feature] Support AsyncLLM ( #4458 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* add async_llm
* apply review
* update engine config
* Adapt to latest engine.py changes
* add more unit tests
* Increase unit test coverage
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-22 15:50:12 +08:00
guozhuangzhuang
b6cd3aec70
[Feature] support fd return decode response ( #4407 )
...
* [Feature] support fd return decode response
* Resolving conflicts
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-10-22 14:22:08 +08:00
yyssys
cd9195d54c
[XPU]Modify the xpu memory display unit of log ( #4534 )
2025-10-22 12:46:01 +08:00
YuBaoku
f69c9cd122
[CI] Remove redundant .coveragerc file ( #4521 )
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-10-21 23:24:05 +08:00
Yuanle Liu
3b58310c26
enhance set_stop_value_multi_ends and standardize the registration of some operators ( #4525 )
...
* fix custom_ops
* paddleformers>=0.3.1
2025-10-21 22:06:06 +08:00
yzwu
dc7facaa7f
[Iluvatar GPU] fix ci error caused by rebuild_padding param and cuda graph ( #4504 )
2025-10-21 21:41:41 +08:00
RAM
d70aacfbdc
[FDConfig] Turn on the CUDAGraph + MultiModel switch ( #4512 )
2025-10-21 06:21:26 -07:00
SunLei
809c1ac7ec
feat: add post-processing step for pool_output ( #4462 )
...
* feat: add post-processing step for pool_output
* bugfix
* fix: test_serving_embedding
* fix test_request_to_batch_dicts
* fix: code style
2025-10-21 20:24:26 +08:00
plusNew001
2bd3fb6315
[XPU]add xpu ci ep case ( #4432 )
...
* add xpu ci case
* Add xDeepEP download and build steps
Download and build xDeepEP before running tests.
* Fix formatting and add missing sleep command
* Update Docker image version in CI workflow
* Modify run_ci_xpu.sh for log cleanup and error handling
Clean up log files before running tests and output worker log on failure.
* Enhance test_ep.py with process management and assertions
Refactor test function to include process cleanup and assertions.
* Replace test_fastdeploy_llm with test_fd_ep
* Fix conditional statement in run_ci_xpu.sh
* Update test_ep.py for string handling and formatting
Fix string encoding issues and improve readability.
* Rename test_ep.py to run_ep.py
* Change test script from test_ep.py to run_ep.py
2025-10-21 19:19:40 +08:00
Copilot
175391389f
Add comprehensive unit tests for limit_thinking_content_length operators ( #4510 )
...
* Initial plan
* Add comprehensive unit tests for limit_thinking_content_length functions
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
* fix (#4514 )
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-10-21 18:55:57 +08:00
RAM
7cbe6b2472
[FDConfig] Turn on the CUDAGraph + Speculative Decoding switch ( #4511 )
2025-10-21 03:34:16 -07:00
tianlef
153f15db39
[Doc]add deepseek wint4 ce ( #4517 )
2025-10-21 16:41:51 +08:00
ltd0924
fb76cdfb4f
[Fearture] Support mm model close prefix cache ( #4459 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support prefix cache in DP
* fix
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* [BugFix] fix workers more than 1
* fix
* Update api_server.py
* fix
* Update api_server.py
* fix
* [Fearture] Support mm model close prefix cache
* Update api_server.py
* Update engine_client.py
* Update engine_client.py
* add test
* Update test_chat.py
* fix
* fix
* Update test_chat.py
* Update test_chat.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-21 15:37:59 +08:00
Divano
2b53c4d684
【CI】Add test cases for n parameter and streaming validation ( #4503 )
...
* add repitation early stop cases
* add repitation early stop cases
* add structure test openai
* add n parameters case
2025-10-21 15:33:29 +08:00
SunLei
ee915220bd
[Speculative Decoding] Add draft_logprobs Support for Speculative Decode MTP ( #4467 )
...
* feat: add draft_logprobs for Speculative Decode MTP
* feat: add draft_logprobs for Speculative Decode MTP
* feat: add draft_logprobs for Speculative Decode MTP
* fix: postprocess for speculative decode
* test: test_speculative_decoding_use_logprobs
* fix: test_completion_echo
* fix test_max_streaming_tokens
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-21 14:57:50 +08:00
RAM
775edcc09a
[Executor] Default use CUDAGraph ( #3594 )
...
* add start intercept
* Adjustment GraphOptConfig
* pre-commit
* default use cudagraph
* set default value
* default use cuda graph
* pre-commit
* fix test case bug
* disable rl
* fix moba attention
* only support gpu
* Temporarily disable PD Disaggregation
* set max_num_seqs of test case as 1
* set max_num_seqs and temperature
* fix max_num_batched_tokens bug
* close cuda graph
* success run wint2
* profile run with max_num_batched_tokens
* 1.add c++ memchecker 2.success run wint2
* updatee a800 yaml
* update docs
* 1. delete check 2. fix plas attn test case
* default use use_unique_memory_pool
* add try-except for warmup
* ban mtp, mm, rl
* fix test case mock
* fix ci bug
* fix form_model_get_output_topp0 bug
* fix ci bug
* refine deepseek ci
* refine code
* Disable PD
* fix sot yaml
2025-10-21 14:25:45 +08:00
Lucas
99564349a7
[XPU] bind block_attn kernel with pybind ( #4499 )
2025-10-21 10:58:52 +08:00
gaoziyuan
d85ef5352a
【BugFix】fix ep buffer clear ( #4450 )
...
* fix
* fix
2025-10-21 10:56:00 +08:00
YuBaoku
70a29ec49e
[CI] update ernie-4_5-vl baseline ( #4495 )
...
* [CI] update ernie-4_5-vl baseline
* [CI] update Qwen2.5-VL-7B-Instruct baseline
2025-10-21 10:18:29 +08:00
ltd0924
a498736af5
[APIServer] support define gunicorn timeout ( #4496 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [BUGFIX] clear request #4286
* [BugFix] support define gunicorn timeout
* Update utils.py
* Update utils.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-20 23:36:07 +08:00
Yuanle Liu
cef3164c3b
Optimizing the performance of think length limit using custom operators ( #4279 )
...
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete impl
* delete min_length&max_length
* support limit thinking content strategy
* fix
* fix
* fix
* update
* fix set_value_by_flags_and_idx
* fix
* fix
* fix
* fix
* update
* fix
* fix
* fix typo
* fix ci
* fix
* fix
* support mtp
* fix
* fix
* update
* update
2025-10-20 21:09:13 +08:00
Ryan
36af88ff3f
[BugFix][CI] Clean up SOT code cache using tearDown in CINN unitest ( #4491 )
...
* fix CINN BUG
* 1e-3 -> 1e-2
2025-10-20 20:45:00 +08:00
yinwei
bf03b6fcea
fix vl bug ( #4485 )
2025-10-20 20:13:34 +08:00
yyssys
97ee3c403a
[XPU]Fix w4a8 garbled code issue ( #4493 )
2025-10-20 19:41:11 +08:00
Zhang Yulong
10e85daf15
update benchmark scripts ( #4497 )
2025-10-20 17:03:10 +08:00
李泳桦
b8d235445e
[fix] remove cache tensor creation for cache_transfer_manager ( #4420 )
...
* [fix] remove cache tensor creation for cache_transfer_manager
* [fix] fix code style
* [fix] fix code style
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-20 16:19:56 +08:00
bukejiyu
de2eaf4f81
add qwen-2.5-7B-PRM/ernie-rm ( #4319 )
2025-10-20 15:31:03 +08:00
GoldPancake
47595a2480
[Feature] support mtp logprob ( #4464 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support mtp logprob
* fix unitest
2025-10-20 15:18:12 +08:00
Haonan Luo
1b9f351d21
Support GPT-OSS-BF16 ( #4240 )
...
* [Feature] AppendAtten support sinks & HEAD_DIM=64
* fix bug
* fix bug
* fix bug
* fix bug
* [Feature] support gpt-oss
* fix bug
* add mask
* support-gpt-oss
* support-gpt-oss
* fix long seq
* support wint8
* support wint8
* support wint8
* update test
* change sliding windows init pos
---------
Co-authored-by: ming1753 <ideaminghp@163.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
2025-10-20 14:44:58 +08:00
SuperNova
80a16c4c87
[fix] adjust mctlass moe api ( #4474 )
2025-10-20 14:23:54 +08:00
zhuzixuan
1e59905e34
Optimization of ‘tools’ in request fields ( #4380 )
...
* Remove multiple 'tools'
* Remove multiple 'tools'
* Remove multiple 'tools'
* Remove multiple 'tools'
2025-10-20 11:04:08 +08:00
RAM
528c55776e
[Graph Optimization][Speculative Decoding] Fix the bug of CUDAGraph + MTP + EP ( #4456 )
...
* Fix MTP dummy run bug
* Target Model and Draft Model using the same flag
* In mtp replace use_cudagraph as step_use_cudagraph
2025-10-20 10:38:55 +08:00
YuBaoku
c4fc0073cf
[CI] Handle unit test issues ( #4483 )
2025-10-20 10:13:21 +08:00
周周周
817210e47f
[ATTN]delete code and add ffn and moe layer level test ( #4440 )
...
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* delete code
* delete code
* delete code
* commit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
* copmmit
2025-10-19 16:23:11 +08:00
kxz2002
b5b993e48e
【feature】support n parameter ( #4273 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support n parameter
* pre-commit check
* pre-commit check
* restore format_and_add_data
* update n_param
* bug fix index - str to int
* bug fix del child_task
* bug fix metrics
* add debug info
* add debug info2
* remove debug info
* change connecting symbol to '-'
* bugfix change connecting symbol
* bugfix change connecting symbol2
* unit tests fix
* unit test fix2
* unittest add param n=2
* n param add unit tests and adapt to echo
* pre-commit fix
* resolve review
* adjust stop reason
* add unittest for _create_chat_completion_choice
* modify unittest
* solve confict
* solve conflict
* resolve conflict
---------
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com >
Co-authored-by: gaoziyuan <m13689897706@163.com >
2025-10-17 20:51:59 +08:00
kxz2002
8ccfd975b5
LLM.chat add "tools" param ( #4415 )
...
* llm add tools param initial commit
* llm add tools param bugfix
* offline add tools add unittests
* fix preprocessor
* move tools paramter into tasks
* change variable name
2025-10-17 20:25:03 +08:00
yangjianfengo1
329d074326
[Docx] fix the broken link ( #4479 )
...
* 修改文档
* 修改文档
2025-10-17 18:28:50 +08:00
yinwei
a64c0408b9
[XPU]Fix w4a8 precision bug && rollback moe algo ( #4463 )
...
* fix w4a8 precision bug
* add env
* code stype check
2025-10-17 18:27:53 +08:00
chen
63ef593450
check paddle version for v1 loader ( #4473 )
2025-10-17 17:25:03 +08:00
yzwu
4b661512ca
[Iluvatar GPU] Adapt VL model ( #4313 )
2025-10-17 16:13:38 +08:00
yangjianfengo1
ba5c2b7e37
[Docx] add language (en/cn) switch links ( #4470 )
...
* add install docs
* 修改文档
* 修改文档
2025-10-17 15:47:41 +08:00
Ayakouji
a3e0a15495
fix seqlen sync ( #4442 )
2025-10-17 14:37:52 +08:00
xiaolei373
720697e265
add environment variables ( #4466 )
2025-10-17 14:20:01 +08:00
YuBaoku
01510876ab
[CI] Fix partial instability issues ( #4461 )
2025-10-17 14:17:06 +08:00
ddchenhao66
14785eb65d
[XPU] abstract a hardware-agnostic operator wrapper for prefix cache and specify xpu device id definition ( #4455 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-17 14:05:33 +08:00
lizexu123
c234b995ab
[Feature] support pooling model dummy_run ( #4345 )
...
* support qwen3-embedding
* fix ci bug
* support pooling dummy_run
* fix
* delete print
* parallel_config.max_model_len
* delete is_pooling_model in dummy_run
* fix
* fd_model
* fix embedding load
* fix
* fix post_process
2025-10-17 13:30:55 +08:00
Ryan
15b6b8dc25
[CINN] Remove the restriction of automatically falling back to SOT after enabling CINN ( #4411 )
...
* remove CINN limitation
* fix unitest
* fix codestyle
2025-10-17 12:51:07 +08:00
chen
b134e6afe6
[BugFix]Dev fix custom ar unstable result ( #4437 )
2025-10-17 11:47:16 +08:00
Ryan
6160145f82
[SOT] Change warnings to errors and remove fallback operations ( #4378 )
...
* Change warnings to errors and remove fallback operations
* fix unitest
* fix codestyle
2025-10-17 11:27:04 +08:00
chenjian
0413c32b8f
[Optimize] Set preempted schedule log as info level ( #4453 )
2025-10-17 11:25:46 +08:00
Zero Rains
5885953211
[Others] add PR Template ( #4452 )
...
* add PR Template
* update
* update
* update
* update
* update
* update
2025-10-17 11:09:51 +08:00
Sunny-bot1
930f7b781c
[Optimization] Put get_block_shape_and_split_kv_block in cuda graph for append attention backend ( #4443 )
...
* get block in cuda graph
* fix sot
2025-10-17 10:59:56 +08:00
Ryan
49cea8fb1c
[SOT][Cudagraph] Remove BreakGraph of #3302 && update CustomOp ( #3694 )
...
* rm inplace info && to(gpu)
* update append_attention
* unpin paddle version
* add full_cuda_graph=False
* add blank line
---------
Co-authored-by: SigureMo <sigure.qaq@gmail.com >
2025-10-17 10:57:55 +08:00
YuanRisheng
a37c9416ac
[FDConfig]Remove reasoning_parser/guided_decoding_backend/disable_any_whitespace/device_ids in FDConfig ( #4362 )
...
* remove devices id
* fix unittest
* fix ce
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
2025-10-17 10:40:59 +08:00
xiaolei373
d1637db86a
modify_comment ( #4460 )
2025-10-17 10:10:09 +08:00
chen
db82e9a022
[BugFix]Fix wfp8afp8 triton moe group_topk renormalized=True ( #4449 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix group_topk renormalized=True
* check test
2025-10-16 23:17:48 +08:00
xiaolei373
dbca63f862
[bugfix] kill cache_transfer_manager process ( #4401 )
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-10-16 20:45:24 +08:00
YuanRisheng
0355235fb9
[FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig ( #4400 )
...
* delete some attr in parallel config
* delete comment
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
2025-10-16 20:00:37 +08:00
Ryan
b87e2c6184
[CUDAGraph]Add support for custom all-reduce operators under SOT mode ( #4386 )
2025-10-16 19:31:19 +08:00
zhupengyang
26ff2f8683
[XPU] refine fused moe ( #4219 )
2025-10-16 19:04:07 +08:00
Jianyu Li
3bbe99eae7
[Intel HPU] Enable dist sampler on intel hpu platform ( #4445 )
2025-10-16 19:02:27 +08:00
LiqinruiG
4251ac5e95
【Fix】 remove text_after_process & raw_prediction ( #4421 )
...
* remove text_after_process & raw_prediction
* remove text_after_process & raw_prediction
2025-10-16 19:00:18 +08:00
Zhang Yulong
8f77adc381
Add data dictionary for API response processing ( #4454 )
...
Initialize data dictionary for response handling.
2025-10-16 17:23:11 +08:00
Zhenghai Zhang
6adfbe07ad
【Hackathon 9th No.86】autogen MultiQueryDecoderAttention template_instantiation -part ( #4383 )
...
* split MultiQueryDecoderAttention template_instantiation
* update comment
* CI
2025-10-16 17:08:19 +08:00
kevin
f72be7a2c8
[BUG] fix ep bug ( #4275 )
...
* fix ep bug
* update code
* update code
* update code
* [BugFix] fix config bugs (#4370 )
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* Update expert_service.py
* Update expert_service.py
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* update code
---------
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-16 16:46:40 +08:00
SunLei
5abf59715d
perf: optimize ZMQ communication with async queue and single-threaded… ( #4444 )
...
* perf: optimize ZMQ communication with async queue and single-threaded model
* perf: _async_output_busy_loop
* fix: async_output_queue init
2025-10-16 15:46:26 +08:00
Zhang Yulong
98f8c3703a
Add filtering for failed requests in benchmark outputs ( #4448 )
...
Filter out requests with end_timestamp == 0.0
2025-10-16 14:57:47 +08:00
Zhang Yulong
9dc3968c13
[benchmark] Fix benchmark duration calculation logic ( #4446 )
...
* Fix benchmark duration calculation logic
Calculate benchmark duration using filtered outputs.
* Fix benchmark duration calculation using benchmark_outputs
2025-10-16 14:36:29 +08:00
Lucas
a5063b96c8
[XPU] moe support VL 0-dim input ( #4408 )
2025-10-16 14:01:01 +08:00
gaoziyuan
fd5dd1a0f1
[Bugfix]fix ep clear buffer perf ( #4389 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix
* Update fused_moe_backend_base.py
2025-10-16 13:05:39 +08:00
chenjian
670aaa3f83
[Bug fix] Fix pd for x1 thinking ( #4433 )
2025-10-16 12:03:45 +08:00
ddchenhao66
8e392f0ea6
[XPU] support prefix cache ( #4423 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-10-16 11:27:41 +08:00
ltd0924
5bde20b0c9
[BugFix] fix config bugs ( #4370 )
...
* Update expert_service.py
* Update common_engine.py
* Update expert_service.py
* Update expert_service.py
* Update expert_service.py
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-10-16 10:25:21 +08:00
Zhang Yulong
7f94f063ff
Update benchmark_serving.py ( #4438 )
...
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
丢弃的请求依旧保存,用于结果分析
2025-10-15 20:36:19 +08:00
SunLei
b4b579a7ed
Feature:Add support for Pooling Model Embedding and provide an OpenAI-compatible API. ( #4344 )
...
* feat: add OpenAIServing
* feat: add ZmqOpenAIServing & OpenAIServingEmbedding
* feat: Refine the basic ServingEngine class and introduce ServingContext
* fix: codestyle
* fix: request
* fix: pooling_params
* feat: _process_chat_template_kwargs
* feat: support batch request
* feat: pooling_params verify & default parameters
---------
Co-authored-by: sunlei1024 <sunlei1024@example.com >
2025-10-15 19:42:59 +08:00
freeliuzc
744287e1a9
fix param ( #4419 )
2025-10-15 18:44:24 +08:00
ltd0924
fbdb056de0
[BUGFIX] clear request #4286 ( #4402 )
...
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-15 17:43:28 +08:00
Lucas
bdc0207277
[XPU] fix VL multi-batch accuracy issue ( #4394 )
2025-10-15 17:27:43 +08:00
ltd0924
d8841b7b40
[BugFix] fix workers=1 ( #4364 )
...
* [Feature] support prefix cache in DP
* fix
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* [BugFix] fix workers more than 1
* fix
* Update api_server.py
* fix
* Update api_server.py
* fix
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-15 17:06:25 +08:00
bukejiyu
bcaa98ff9c
V1 loader default ( #4251 )
...
* v1 laoder
* update
* update
2025-10-15 16:49:17 +08:00
tianshuo78520a
e98c1c2f47
Disable gcu ci ( #4427 )
...
* Disable GCU CI
* Disable GCU CI
* Update _ci_gcu.yml
2025-10-15 16:06:25 +08:00
AIbin
6938df9c23
【Fix CI Bug】Fix ci bug ( #4413 )
...
* Support DSK-v3.2 model
* Support DSK-v3.2
* Support DSK-v3.2
* Support DSK-3.2
* fix CI bug
* fix_CI_BUG
* update ci bug
2025-10-15 14:19:04 +08:00
chen
4efd073a41
fix block_wise_fp8_v1_loader_moe_shape ( #4384 )
2025-10-15 14:08:53 +08:00
freeliuzc
582aebd48b
[MTP]support mtp chunk_prefill_v1 ( #4366 )
...
* support mtp chunk_prefill_v1
* fix mtp chunkprefill output, fix unit test
* fix unit test
* fix save_output
2025-10-15 13:21:32 +08:00
李泳桦
ffe7af8a97
[fix] fix requests & block metrics ( #4404 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [fix] fix requests & block metrics
* [chore] rename variables
2025-10-15 11:49:24 +08:00
qwes5s5
abb62624b8
[fix] Fixed the issue of excessive/redundant spans being returned for streaming requests. ( #4375 )
...
* fix stream span
* fix stream span
2025-10-15 11:47:47 +08:00
ltd0924
28d1b6cd97
[BugFix] fix multinode bugs ( #4377 )
...
* [BugFix] fix multinode bugs
* Update test_config.py
* Update test_config.py
* Update test_config.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-15 11:43:39 +08:00
zhupengyang
d6f775e33b
[XPU] fix ep ( #4393 )
2025-10-15 11:41:05 +08:00
Sunny-bot1
6d0cc0dd9c
[Optimization] Optimize split_q_block kernel ( #4367 )
2025-10-15 11:28:00 +08:00
Zhang Yulong
c4f866c457
update benchmark tools ( #4416 )
2025-10-15 11:15:25 +08:00
YuBaoku
4b647d17de
[CI] Fix partial instability issues ( #4418 )
2025-10-15 11:13:43 +08:00
yangjianfengo1
c1a2e78b18
add install docs ( #4414 )
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-10-14 20:17:29 +08:00
ApplEOFDiscord
7f85f00a7d
fix offline inference doc ( #4412 )
2025-10-14 19:25:21 +08:00
tianlef
14eb8b4f8b
add x1 a3b quantization ( #4397 )
2025-10-14 15:04:06 +08:00
co63oc
73c8e0849f
【Hackathon 9th No.67】add speculate_verify ( #4326 )
...
* add speculate_verify
* fix
* fix
2025-10-14 14:13:17 +08:00
YuBaoku
6f53b67f6c
[CI] fix diff_error temporarily ( #4390 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-10-14 11:13:40 +08:00
Sunny-bot1
a751d977bc
[Optimization] Fuse get_max_len and get_kv_max_len ( #4369 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* opt split_q_block
* fuse max_lens and max kv_len
2025-10-13 20:35:00 +08:00
YuBaoku
425205b03c
[Doc] fix the port conflict issue in the usage example ( #4379 )
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-10-13 20:17:06 +08:00
ooo oo
2d641078c3
【Hackathon 9th No.20】add unit tests for masked_per_token_quant ( #4111 )
...
* test: add unit tests for masked_per_token_quant
* apply review
2025-10-13 14:51:11 +08:00
yyssys
584d116889
[Doc] fix document navigation link paths ( #4368 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-10-13 11:01:35 +08:00
plusNew001
a21e16ee5f
[XPU] fix XPU CI bug ( #4358 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* Update assertions for response content in test_45t
fix XPU CI bug
* Comment out base_response print statement
Comment out the print statement for base_response.
* Refactor assertion for clarity in run_45T.py
* Add blank line before main function call
2025-10-11 14:48:14 +08:00
YuanRisheng
a2ec2c4152
[FDConfig]Remove max_model_len in FDConfig ( #4350 )
...
* modify max_model_len
* fix unittest
* fix unittest
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com >
2025-10-11 14:04:17 +08:00
freeliuzc
365601ea5a
[MTP]support more branchs in topp kernel ( #4352 )
2025-10-11 11:33:52 +08:00
gaoziyuan
b463a41a06
Update rollout_model.py ( #4348 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-10-11 10:48:09 +08:00
ltd0924
3f535b45a2
[Feature] support prefix cache in DP ( #4359 )
...
* [Feature] support prefix cache in DP
* fix
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
* Update common_engine.py
---------
Co-authored-by: ltd0924 <luotingdan@baidu.com >
2025-10-11 10:12:12 +08:00
AIbin
368049673b
Add DeepSeek model end-to-end CI ( #4360 )
...
Add DeepSeek-v3-5layers end-to-end CI
2025-10-11 08:33:37 +08:00
AIbin
533896fd63
fix paddle_peak_increase size ( #4355 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-10-10 21:31:38 +08:00
AIbin
f7eaca3971
【Bug Fix】mla enables tensorcore by default ( #4354 )
...
* mla tensor-core kernel is enabled by default
2025-10-10 20:45:16 +08:00
YUNSHEN XIE
245931f53d
add release images build job ( #4265 )
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-10-10 16:35:49 +08:00
qwes5s5
6fd3e72da1
[FastDeploy Cli] Bench Command eval and throughput ( #4239 )
...
* bench command
* bench command
* bench command
* bench command
* bench command
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
2025-10-10 16:17:44 +08:00
lzy
3aa04fbf21
[MTP][Cfp8]supports spec dynamic cfp8 ( #4290 )
...
* supports spec dynamic cfp8
* supports spec dynamic cfp8
---------
Co-authored-by: freeliuzc <lzc842650834@gmail.com >
2025-10-10 16:08:10 +08:00
yinwei
20c7b741f4
[XPU] Support W4A8C8-TP4-300B Model ( #4068 )
...
* support w4a8
* delete ep block attn
* delete moe_topk_select
* update note
* update
* delte useless info
* update
* add some note
* fix some format
* update scale info
* add ans baseline
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-10-10 15:41:32 +08:00
Zhenghai Zhang
c46d5e48f8
【Hackathon 9th No.86】autogen MultiQueryAppendC8Attention template_instantiation -part ( #4330 )
...
* split MultiQueryAppendC8Attention template_instantiation
* update setup_ops.py
* fix ci
* fix bug
2025-10-10 15:07:48 +08:00
AIbin
c4ebaf8a07
【Inference Optimize】MLA Tensor-Core is enabled by default ( #4335 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-10-10 10:54:56 +08:00
Nyakku Shigure
5f80862578
Remove redundant inplace outputs for append_attention ( #4340 )
2025-10-10 10:21:27 +08:00
RAM
aa27b03bc0
[Executor]CUDAGraph support Speculate Decode ( #3769 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* success run ngram
* Revert "[Code Simplification] remove cum_offsets (#3410 )"
This reverts commit 32b39620bc .
* success run ngram5 tp4 42bs
* success run ngram5 tp4 42bs
* mtp draft commit
* add decorator for target model
* enable draft model in cudagraph v0.5
* revert revrt cum_offset
* enable target model in cudagraph v0.9 And clean debug code
* Revert "success run ngram"
This reverts commit 8351e83993 .
* add reverted code
* enable target model in cudagraph v0.9
* solve comment
* fix bid < 0
* Enable Target Model Padding And Draft Model in cudagraph
* solve problem
* delete rebuild padding debug note
* fast compile
* Add capture list for mtp
* success run 256 tp1 mtp
* Enable Lite TP2 Bsz256
* realy enable tp2 bsz 256
* fix problem
* Solve problem for Draft model in cudagraph
* Solve comment
* replace emptytensor as zeros
* Solve comments
* Revert "fast compile"
This reverts commit 834639a7ff .
* fix bug
* fix merge bug
* fix typo
* fix bug
---------
Co-authored-by: lizexu <2694294196@qq.com >
Co-authored-by: littledgg <1658565283@qq.com >
Co-authored-by: zeroRains <linjunlu@zerorains.top >
Co-authored-by: gongshaotian <gstain5555@outlook.com >
2025-10-09 21:18:29 +08:00
AIbin
7b1689f437
schedule_bugfix ( #4336 )
2025-10-09 20:40:10 +08:00
yyssys
3cb4b4d7d4
[Doc] Update xpu fastdeploy version to 2.2.1 ( #4338 )
2025-10-09 20:14:07 +08:00
yangjianfengo1
b650867fff
修改文档 ( #4339 )
2025-10-09 20:10:58 +08:00
AIbin
48fd5d757d
Support MLA_CACHE & Fix V1_Schedule Bug ( #4318 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
Support MLA_CACHE & Fix V1_Schedule Bug
2025-10-09 12:11:25 +08:00
RichardWooSJTU
791b101195
revert worker process ipc signal suffix ( #4323 )
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-30 03:56:41 -07:00
fangfangssj
af3872215e
[BugFix]remove redundant includes ( #4312 )
...
Co-authored-by: Tao Luo <luotao02@baidu.com >
2025-09-30 17:54:19 +08:00
Zero Rains
d14aadf70e
[FIx] CI Approve fix ( #4316 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* check approve
* fix the bug
* update
2025-09-29 11:38:24 -07:00
chen
81959c7d88
[NewFeature]custom_allreduce support cudagraph recapture ( #4305 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* custom_allreduce support cudagraph recapture
* add shut_down/restart default group
2025-09-29 15:56:54 +08:00
xiaozude
7c919070f7
[Metax] support cutlass moe & optimize flash attention ( #4208 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-29 11:22:43 +08:00
K11OntheBoat
2b2b645296
Fix bugs of splitwise_complete_prefilled_step IPCsignal clear ( #4309 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-09-29 11:21:22 +08:00
RichardWooSJTU
3740e33fea
【Feature】ResourceManagerV1 support need block num notifying ( #4220 )
...
* support need block num notifying
* adapt t2i
* fix unexpected change
2025-09-29 11:11:51 +08:00
李泳桦
70633c6641
[fix] fix gpu_caches key ( #4311 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-28 21:32:57 +08:00
xiaolei373
1282ebe1b1
add_cli_tokenizer ( #4278 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-28 20:47:35 +08:00
李泳桦
6265f4385f
[feat] support prefix cache clearing when /clear_load_weight is called ( #4008 )
...
* [feat] support clearing prefix cache (cherry-picked from release/2.1)
* [fix] fix ipc suffix, use port instead
* [fix] fix prefix caching not enabled
* [fix] fix key/value_cache_scales indent
* [fix] fix ep group all-reduce
* [fix] fix clear/update lock not working when workers > 1
* [chore] add preemption triggered info log
* [fix] fix code style
* [fix] fix max_num_seqs config
* [fix] do not force enable_prefix_caching=False in dynamic loading
* [fix] fix ci
* Revert "[fix] fix ci"
This reverts commit 0bc6d55cc8 .
* [fix] initialize available_gpu_block_num with max_gpu_block_num
* [fix] fix config splitwise_role
* [fix] fix clearing caches synchronization and add more logs
* [chore] print cache_ready_signal in log
* [fix] fix scheduler_config.splitwise_role
* [fix] fix cache_messager cache_ready_signal create=True
* [fix] stop cache messager from launching in mixed deployment
2025-09-28 19:42:53 +08:00
Lucas
59313ed7f9
[XPU] fix VL thinking mode ( #4266 )
2025-09-28 17:37:37 +08:00
Sunny-bot1
aa1cc09c5b
fix machete pre quant ( #4295 )
2025-09-28 16:11:09 +08:00
K11OntheBoat
7b6cb72ab2
Fix wrong batch size of thinking_mask ( #4296 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
Co-authored-by: xiegegege <46314656+xiegegege@users.noreply.github.com >
2025-09-28 14:56:42 +08:00
chenjian
3cef851468
[Bug fix] Fix bug for running ep ( #4245 )
...
* fix bug for ep
* fix bug
2025-09-28 14:56:18 +08:00
luukunn
17e00d9f5d
fix reasoning_max_tokens ( #4277 )
2025-09-28 14:05:29 +08:00
Zhenghai Zhang
aa045aa84f
fix typos ( #4274 )
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-27 09:25:43 +08:00
GoldPancake
79c2c52756
deepgemm pre-compile tool support mixed parallel ( #4282 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-26 18:43:39 +08:00
YUNSHEN XIE
5c6e859681
increase ccache size ( #4255 )
2025-09-26 17:40:07 +08:00
yyssys
f40d7c6d65
[Docs]When XPU starts the service, the model loader uses the default version ( #4292 )
2025-09-26 15:58:12 +08:00
Zero Rains
331c4d2a74
Set approve checking for config.py, worker, model and cudagraph ( #4276 )
...
* set approve checking for config.py and worker files
* update
* update
* update file list
* check worker
* update
* check graph
* check model_loader
* check models
* update
2025-09-26 14:50:54 +08:00
GoldPancake
838de53de8
Add speculative decoding approval check ( #4284 )
2025-09-26 14:47:45 +08:00
xiaolei373
55124f8491
Add cli run batch ( #4237 )
...
* feat(log):add_request_and_response_log
* [cli] add run batch cli
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-26 14:27:25 +08:00
tianlef
8a964329f4
add glm benchmark yaml ( #4289 )
2025-09-26 14:23:29 +08:00
Zhong Hui
67e693b18b
fix ernie vl distributed attr. ( #4215 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-26 14:18:49 +08:00
zhuzixuan
12a3587cca
[Supplements and upgrades]Improvement of X1 parsers ( #4172 )
...
* reasoning_parser
* reasoning_parser
* reasoning_parser
* reasoning_parser
* reasoning_parser
* reasoning_parser
* reasoning_parser
2025-09-26 13:37:37 +08:00
YuBaoku
dd2e844ea3
[CI] fix base_test error temporarily ( #4283 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-26 11:24:55 +08:00
memoryCoderC
4ec00df2b0
[Feature] add config api ( #4254 )
2025-09-26 11:21:02 +08:00
kxz2002
83d41d23b0
initial commit ( #4248 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-25 21:42:05 +08:00
yyssys
c415885a94
[Docs]Add ENABLE_V1_KVCACHE_SCHEDULER=0 to docs ( #4268 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-25 20:09:03 +08:00
K11OntheBoat
4515ad21e9
Support limit thinking lengths ( #4069 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-09-25 19:55:56 +08:00
Yuanle Liu
0c6f1932c5
delete_moe_phase_in_parallel_config ( #4264 )
2025-09-25 17:14:37 +08:00
Lucas
87179cb744
[XPU] support XPU VL model inference ( #4030 )
...
* [XPU] support XPU VL model inference
* fix image op import and device check
* rebase develop
* fix perf
2025-09-25 14:34:15 +08:00
ooo oo
e36eccfdad
【Hackathon 9th No.21、23】add unit tests for fused_hadamard_quant_fp8, moe_fused_hadamard_quant_fp8 ( #4094 )
...
* test: add unit tests for fused_hadamard_quant_fp8
* test: add unit tests for moe_fused_hadamard_quant_fp8
* tests: simulate CUDA kernel's hadamard32_warp using butterfly operations
* apply review
* apply review
2025-09-25 12:15:00 +08:00
Zero Rains
b433a93d9a
fix the bug for prefilled_step_idx signal of cache_messager in cudagraph and PD ( #4235 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-24 19:46:52 +08:00
RAM
870364b547
[CUDAGraph]CUDA Graph support unique memory pool ( #4230 )
...
* cuda graph use unique memory pool
* fix custom device import bug
* refine code
* refine code
* refine code
2025-09-24 19:45:22 +08:00
CSWYF3634076
5ff10c8ced
[Model] Qwen2.5VL support --use-cudagraph and unit testing ( #4087 )
...
* [BugFix] qwen2.5vl enable_thinking=true and image_patch_id bug fix
* [Docs]offine infer add apply_chat_template add_generation_prompt parameter
* [Model]qwen2.5VL support --use-cudagraph
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v2
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v3
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v4
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v5
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v6
* [Model]qwen2.5VL support --use-cudagraph buffer and qwenvl test v7
2025-09-24 19:45:01 +08:00
luukunn
18f4977aec
[fix]update apply_chat_template ( #4137 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* update apply_chat_template
* fix unittest
* fix unittest
* fix
* fix
* fix unit test
* fix
* fix unit test
* add unit test
2025-09-24 18:56:32 +08:00
chen
7c1fd19f0f
[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 ( #4238 )
2025-09-24 16:39:51 +08:00
memoryCoderC
8b0ce8e3ab
[Feature] add cli command serve ( #4226 )
2025-09-24 14:50:45 +08:00
ApplEOFDiscord
9566ae8827
[Bug Fix] disable prefix caching in mm model ( #4167 )
...
* add http get retry
* fix coments
* disable prefix caching in mm model
* fix unit test
---------
Co-authored-by: zhangjunjun04 <zhangjunjun04@baidu.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-24 14:43:46 +08:00
lizexu123
e8318b7477
[BugFix] fix qwen3-embedding model tp>1 ( #4223 )
...
* support qwen3-embedding
* fix ci bug
* fix
* fix ci bug
* fix ci bug
* fix
* fix qwen3-embedding
* fix
* fix
* fix
2025-09-24 14:13:26 +08:00
chen
3161014e49
[BugFix]fix v1 loader moe bf16, and supoort dynamic_load_weight create quant param ( #4229 )
...
* fix v1 loader moe bf16, and supoort dynamic_load_weight create quant param
* include_stop_str_in_output=False not return eos text
2025-09-24 14:12:05 +08:00
Yohanna
44010cee13
FIX] Fix CUDA error(700): 'cudaErrorIllegalAddress' in CascadeAppendWriteCacheKVQKV cache_kernel(). Continue when batch_id_per_token[token_idx] is default value -1. ( #4218 )
2025-09-24 14:08:49 +08:00
fmiao2372
f1b5392e20
[Intel HPU] Support intel hpu platform ( #4161 )
...
* [Intel HPU] Support intel hpu platform
* fix some issues
* apply precommit and move AttentionBackend_HPU
* fix format issue
* correct ops import
* fix ci issue
* update code in layers
* fix code style issue
* remove dense tp moe ep mode
* fix enc_dec_block_num
* fix rebase issue
* rename hpu to gaudi in readme
* rename ForwardMeta_HPU to HPUForwardMeta
2025-09-24 12:27:50 +08:00
co63oc
a1c5d930bb
【Hackathon 9th No.24】add rebuild_padding ( #4107 )
2025-09-24 12:08:17 +08:00
Yuanle Liu
b455fd39f3
register_model_class compatible with plugins ( #4236 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-24 11:17:12 +08:00
yyssys
d6e59447f5
[XPU] Enable XPU V1 mode based on environment variable ( #4213 )
...
* Enable XPU V1 mode based on environment variable
* add default param to xft_moe_fc_block_eb for latest xvllm compatibility; update run_ci_xpu to use latest xvllm
2025-09-24 10:29:48 +08:00
chen
ec99474e71
[Test]add glm45_air logprob test and rollout model ( #4175 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* add glm45_air logprob test
* add glm rollout model and pretrainedmodel for rl
* add glm rollout model and test
* check
* delete cudagraph in glm45
* add UT for glm rollout model
* revert glm UT
2025-09-23 21:06:07 +08:00
bukejiyu
62d1c48363
[v1 loader]code style ( #4204 )
...
* code style
* update
2025-09-23 19:36:00 +08:00
chen
1a6283424e
Fix noaux_tc cuda Error 700 in CUDAGraph ( #4174 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-23 18:41:33 +08:00
lizexu123
c96a535a5d
[Feature] support qwen3-embedding model load ( #4202 )
...
* support qwen3-embedding
* fix ci bug
* fix
* fix ci bug
* fix ci bug
* fix
2025-09-23 00:14:35 -07:00
zhupengyang
9082f625ba
[xpu] use cpu barrier ( #4181 )
2025-09-23 12:19:03 +08:00
plusNew001
813befadfa
Update run_ci_xpu.sh to lock xvllm version ( #4210 )
...
Temporarily lock xvllm version due to compilation errors and update XVLLM_PATH.
2025-09-23 11:20:08 +08:00
plusNew001
c32aae901f
[XPU] update XPU CI ( #4209 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* change xpu ci model
* change xpu ci model
* change xpu ci model
* change xpu ci model
* Update model path and XPU settings in run_ci_xpu.sh
* Increase health check timeout to 10 minutes
Increased the timeout duration for health checks from 5 minutes to 10 minutes in two places.
* Implement test for OpenAI chat completion
Add a test function for the OpenAI client chat response.
* Change script to use pytest for running tests
* Update health check timeout to 15 minutes
Increase the timeout for health checks from 10 minutes to 15 minutes.
* Add pytest installation to CI script
* Modify base response in test_45t function
Updated the base response message for the test.
* Add V0 and V1 mode test echo statements
* Set ENABLE_V1_KVCACHE_SCHEDULER to 0
Disable V1 KVCACHE SCHEDULER for V0 mode testing.
---------
Co-authored-by: root <root@yq01-inf-hic-k8s-a100-aa24-0591.yq01.baidu.com >
2025-09-23 10:28:49 +08:00
yangjianfengo1
4325b737e7
【FIX】Change the name of sparse attn from moba to plas ( #4006 ) ( #4076 )
...
* 【FIX】Change the name of sparse attn from moba to plas (#4006 )
* 更新文档
* 【docs】 update readme (#4000 )
* 更新文档
* update readme
* update docs
* 【FIX】Change the name of sparse attn from moba to plas (#3845 )
* 更新文档
* 更新文档
* 更新文档
* 更新文档
* 修改moba为plas
* code style
* update ci
* code style
* update ci
* code style
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* fix max_num_seqs
* fix test load attn
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-23 10:26:40 +08:00
plusNew001
2c34a557f4
[XPU]change xpu ci model ( #4117 )
...
* change xpu ci model
* change xpu ci model
* change xpu ci model
* change xpu ci model
* Update model path and XPU settings in run_ci_xpu.sh
* Increase health check timeout to 10 minutes
Increased the timeout duration for health checks from 5 minutes to 10 minutes in two places.
* Implement test for OpenAI chat completion
Add a test function for the OpenAI client chat response.
* Change script to use pytest for running tests
* Update health check timeout to 15 minutes
Increase the timeout for health checks from 10 minutes to 15 minutes.
* Add pytest installation to CI script
* Modify base response in test_45t function
Updated the base response message for the test.
* Add V0 and V1 mode test echo statements
---------
Co-authored-by: root <root@yq01-inf-hic-k8s-a100-aa24-0591.yq01.baidu.com >
2025-09-23 10:21:17 +08:00
ltd0924
83720da79f
[Feature] support clear data ( #3601 )
...
* [Feature] support clear data
* update
* fix
* fix
* fix
* fix
* fix
* fix
* fix
2025-09-23 10:20:02 +08:00
Jiang-Jia-Jun
772f0156f3
Remove useless code ( #4195 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-22 21:18:19 +08:00
yzwu
504461b6b5
[Iluvatar GPU] Optimize attention performance and fix moe load ckpt error ( #3651 )
2025-09-22 21:13:59 +08:00
Zhang Yulong
5532e8a323
[FD CLI] Add bench cli ( #4160 )
...
* add bench cli
* Update test_main.py
2025-09-22 20:37:30 +08:00
Echo-Nie
5e1f13bd3b
add test_set_value_by_flags_and_idx.py ( #4186 )
2025-09-22 20:21:34 +08:00
co63oc
c5671d7c09
[MTP][Unit Test]add test_top_p_candidates ( #4046 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* add test_top_p_candidates
* fix
* fix
* fix
2025-09-22 17:06:38 +08:00
chenjian
918ccdb123
[Feature] Support pd ep deployment with yiyan adapter ( #4029 )
...
* [Feature] Support mixed deployment with yiyan adapter in release2.2
* fix metrics
* add unit test
* add unit test
* add unit test
* Support pd ep deployment with yiyan adapter
* Support pd ep deployment with yiyan adapter
* refactor cache messager
* support scheduler v1 in PD
* suppport pd v1 + chunk prefill
* suppport pd v1 + chunk prefill
* add eplb
* support eplb
* support eplb
* support eplb
* support v1
* fix
* fix
* fix bug
* remove eplb support
* support prefix cache in P
* fix bug
* fix bug
* support one stop in V1
* fix bug
* fix ci
* fix ci
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-22 16:41:38 +08:00
Echo-Nie
9845f0d010
【Hackathon 9th No.30】add test_tritonmoe_preprocess ( #3891 )
...
* add test_tritonmoe_preprocess
* add value check
* del test_support_all...
2025-09-22 15:31:32 +08:00
co63oc
c4830ef24c
fix typos ( #4176 )
...
* fix typos
* fix
2025-09-22 14:27:17 +08:00
Divano
0b62648924
test xly ci
2025-09-22 14:13:00 +08:00
lizexu123
c86945ef49
[Feature] support pool ( #3827 )
...
* support pool
* update pooling
* add pooler_config and check
* update
* support AutoWeightsLoader load weight
* fix
* update
* delete print
* update pre-commit
* fix
* fix xpu
* fix ModelRegistry->model_registry
* fix Copilot review
* fix pooler.py
* delete StepPooler
* fix abstract
* fix default_loader_v1
* fix Pre Commit
* support torch qwen3 dense
* add test and fix torch-qwen
* fix
* fix
* adapter ci:
* fix review
* fix pooling_params.py
* fix
* fix tasks.py 2025
* fix print and logger
* Modefy ModelRegistry and delete AutoWeightsLoader
* fix logger
* fix test_embedding
* fix ci bug
* ernie4_5 model_registry
* fix test
* support Qwen3-Embedding-0.6B tp=1 load
* fix extra code
* fix
* delete fix vocab_size
* delete prepare_params_dict
* fix:
2025-09-22 14:09:09 +08:00
chen
da74a5f0b3
fix glm all_reduce tp group ( #4187 )
2025-09-22 10:56:55 +08:00
co63oc
718f32a6b0
fix nul ( #4191 )
2025-09-22 10:55:33 +08:00
Lucas
5c33be5a7d
[TEST] init first commit ( #4192 )
2025-09-22 10:51:27 +08:00
RichardWooSJTU
91912cc2e1
fix t2i ( #4163 )
...
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-09-19 18:07:13 +08:00
Echo-Nie
cc6e14d2ec
【Hackathon 9th No.46】add test_fused_rotary_position_encoding ( #3848 )
...
* add test_fused_rotary_position_encoding
* 添加版权
* fix according to the review
2025-09-19 17:50:19 +08:00
YuanRisheng
24180fba0a
[FDConfig]Remove splitwise_role and engine_worker_queue_port in FDConfig ( #4147 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* remove splitwise_role and engine_worker_queue_port
* fix xpu
* fix xpu
* fix xpu
* fix unittest
* resolve conflct
2025-09-19 17:01:52 +08:00
luukunn
ee9d8a840a
[fix]Modify follow-up push parameters and Modify the verification method for thinking length ( #4086 )
...
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* 续推参数 generated_token_ids 修改成 completion_token_ids;修改思考长度校验方式
* add completion_token_ids
* add logger
* fix reasoning_max_tokens ParameterError
* add unittest
* add unittest
* add unittest
* add unittest
* add unittest
* add unit test
2025-09-19 14:26:01 +08:00
chen
66a98b44ed
ep support logprob ( #4089 ) ( #4151 )
2025-09-19 14:07:31 +08:00
Yuanle Liu
a685e5ad35
Each module should have its own plugins_loaded ( #4164 )
2025-09-19 14:06:10 +08:00
xiaolei373
ddf5606263
Bugfix test exception ( #4171 )
...
* feat(log):add_request_and_response_log
* modify default error type
2025-09-19 11:48:49 +08:00
Sunny-bot1
c3b8ebeb18
[Optimize] Machete using group scale default ( #4121 )
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-18 13:51:11 +08:00
qwes5s5
62b8b02e08
fix_unitest ( #4159 )
...
Co-authored-by: K11OntheBoat <your_email@example.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-18 11:17:15 +08:00
xiaolei373
98447beb4d
Add param valid log ( #4113 )
...
* feat(log):add_request_and_response_log
* [bugfix] add param valid log
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-18 10:39:24 +08:00
chenjian
618ccdbfba
[Feature] Support mixed deployment with yiyan adapter in develop ( #3976 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] Support mixed deployment with yiyan adapter in release2.2
* fix metrics
* add unit test
* add unit test
* add unit test
* fix ci
* fix for eb5
* fix ci
* fix ci
* fix ci
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-18 01:52:20 +08:00
YuBaoku
2745f37017
[CI] enhance clean port and add waiting time ( #4152 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-17 20:31:49 +08:00
gaoziyuan
896e3bb606
[NewFeture]add ep rollout model init and update/clear ep buffer ( #4039 )
...
* fix gid
* merge
* fix test
* fix bug
* fix
* fix ci
2025-09-17 20:24:53 +08:00
YuanRisheng
0d3a57a2c6
fix unittest ( #4155 )
2025-09-17 20:20:26 +08:00
qw86972190
b52971749c
Print KV Cache available memory and block memory usage in GB format ( #4148 )
2025-09-17 20:01:55 +08:00
RichardWooSJTU
2adca04f1f
Reconstruct streaming data transfer with zmq ( #3836 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* reconstruct USE_GET_SAVE_OUTPUT_V1
* fix ut
* use dp rank
* fix ci
2025-09-17 14:30:39 +08:00
Jiang-Jia-Jun
f9766f917b
[BugFix] Forbiden FD_DISABLED_RECOVER while ENABLE_V1_KVCACHE_SCHEDULER ( #4142 )
...
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-09-17 14:11:44 +08:00
YuanRisheng
2e9e53ff7e
[FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config ( #4116 )
...
* remove max_num_batched_tokens in parallel config
* remove max_num_seqs
* update test case
* fix test
* fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-17 10:43:35 +08:00
YUNSHEN XIE
c01a756912
mv test to tests ( #4129 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-16 20:45:40 +08:00
Zhang Yulong
cd09913552
Update test_w4a8_model.py ( #4125 )
2025-09-16 20:43:10 +08:00
chenjian
67e6d8c691
[Feature] Set prefix caching as default ( #3814 )
...
* Set prefix caching as default
* Set prefix caching as default
* Set prefix caching as default
* skip dynamic load scene
* fix kill bug
* fix kill bug
* fix kill bug
* fix
* fix
* fix ci
2025-09-16 20:34:27 +08:00
Yuan Xiaolan
de8638b1e9
fix dynamic Cfp8 computing error ( #4119 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-16 20:21:49 +08:00
YUNSHEN XIE
4f8901489c
ci: Increase compilation task time limit ( #4098 )
...
* ci: Increase compilation task time limit
* update
* update
* rename
* update
* update
2025-09-16 20:05:45 +08:00
tianlef
e79a1a7938
x1_a3b config ( #4135 )
2025-09-16 19:44:46 +08:00
xiegegege
d682c97dd3
[benchmark]add lite-vl and x1 yaml ( #4130 )
2025-09-16 16:38:36 +08:00
Divano
8e49d99009
Addcase ( #4112 )
...
logprob 没跑,不影响,增加校验openai 异常情况下 错误输出格式字段的case
2025-09-16 16:12:14 +08:00
tianlef
83bf1fd5aa
[Doc]add plas attention config ( #4128 )
2025-09-16 15:55:12 +08:00
co63oc
b70ca35c0b
【Hackathon 9th No.52】add test_dynamic_per_token_scaled_fp8_quant ( #4015 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* add test_dynamic_per_token_scaled_fp8_quant
* fix
* add bfloat16
* ci
2025-09-16 14:11:29 +08:00
Echo-Nie
befe463f01
【Hackathon 9th No.37】add test_top_k_renorm_probs ( #3755 )
...
* add test_top_k_renorm_probs.py
* add size=2,3
2025-09-16 11:12:46 +08:00
Sunny-bot1
442543cd6b
fix ep wint8 ( #4102 )
2025-09-16 11:05:33 +08:00
Yuanle Liu
ed2dcec829
add ignore=all for deepgemm ( #4118 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-15 21:52:00 +08:00
Jiang-Jia-Jun
a04365a0c7
Update api_server.py
2025-09-15 21:31:33 +08:00
YuanRisheng
03b3d6175d
fix mtp ( #4105 )
2025-09-15 20:26:07 +08:00
co63oc
17a27170bc
fix typos ( #4093 )
2025-09-15 18:33:30 +08:00
bukejiyu
113e330030
fix bf16 and add comments ( #4106 )
2025-09-15 17:23:07 +08:00
freeliuzc
69aa2781a1
[MTP]Support mtp reshard ( #4099 )
...
* support rl reshard
* modify model name
2025-09-15 17:13:53 +08:00
freeliuzc
46911f903d
[MTP]update hybrid-mtp-with-ngram ( #4047 )
2025-09-15 17:13:31 +08:00
Yuanle Liu
b1b33211e8
[CUDAGraph] Support multi output buffers and merge some fixes from feature/exp_0908 ( #4062 )
...
* refine cudagraph
* refine cudagraph
* typo
* fix
* fix plugins
* fix
* update
* update
* update
2025-09-15 16:21:30 +08:00
zhupengyang
9409665713
[xpu] support ep ( #4067 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-15 13:53:11 +08:00
bukejiyu
29ed617f0f
[v1 loader]qwen Offline fp8 ( #4036 )
...
* support offline fp8
* update ut
* update ut
* update ut
* fix
* update
* update
2025-09-15 13:44:11 +08:00
Sunny-bot1
b1a5b756a3
[Optimize] Support WINT8 and group scale for Machete ( #3905 )
2025-09-15 12:01:34 +08:00
Echo-Nie
4408dc7f67
【Hackathon 9th No.49】add test_pre_cache_len_concat ( #3847 )
...
* add test_pre_cache_len_concat
* fix according review, add ref_pre_cache_len_concat
2025-09-15 11:20:14 +08:00
co63oc
ef4a1aa2da
【Hackathon 9th No.61、65】add test_draft_model_update ( #3940 )
...
* add draft_model_update test
* fix
* fix
* fix
* fix
* fix
2025-09-15 11:19:50 +08:00
Zero Rains
f213ae1e86
[Bug Fix]fix the bug for cache_messager signal loss ( #3879 )
...
* fix the bug for real size 0 in cudagraph
* fix cache_messager
2025-09-15 11:16:24 +08:00
qwes5s5
553adb299e
【FastDeploy CLI】collect-env subcommand ( #4044 )
...
* collect-env subcommand
* trigger ci
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
2025-09-15 10:31:23 +08:00
zhouchong
958abebeab
Support offline inference with streaming output ( #4071 )
...
* Support offline inference with streaming output
* add unit test
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-15 10:27:03 +08:00
YUNSHEN XIE
4871f18dad
fix(CE): update concurrency to stop CE tasks from canceling each other ( #4083 )
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-12 19:16:26 +08:00
Ayakouji
987609c894
[BugFix] Fix image_feature 0-Size causing insert failed ( #4042 )
...
* update
* fix image_feature
2025-09-12 19:13:08 +08:00
xiaolei373
9ac539471d
[format] Valid para format error info ( #4035 )
...
* feat(log):add_request_and_response_log
* 报错信息与OpenAI对齐
2025-09-12 19:05:17 +08:00
YuanRisheng
88ea565aba
[BugFix]Fix load kv cache quant scale ( #4077 )
...
* fix kv cache
* fix kv_cache
* fix kv cache
2025-09-12 17:44:03 +08:00
co63oc
c86b3357ce
【Hackathon 9th No.78】add test_chat.py ( #3958 )
2025-09-12 16:53:27 +08:00
Echo-Nie
06f4b49ca3
【Hackathon 9th No.25】add test_fused_get_rotary_embedding ( #3892 )
...
* add test_fused_get_rotary_embedding
* 增加基于 NumPy 的基准实现
* 添加,开源软件的版权和许可声明
2025-09-12 15:38:43 +08:00
SuperNova
805f29a06c
[Feature] refactor metax_gpu attention and moe and remove some useless code ( #3688 )
...
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-09-12 14:40:25 +08:00
ltd0924
cab7a633fe
[CI] add multi api server test ( #4049 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [BugFix] fix max streaming tokens invalid
* fix scheduler bug
* fix scheduler bug
* Update multi_api_server.py
* Create test_multi_api_server.py
* fix
2025-09-12 11:18:38 +08:00
qwes5s5
58e0785bab
[metrics] update metrics markdown file ( #4061 )
...
* adjust md
* trigger ci
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
2025-09-12 11:13:43 +08:00
co63oc
8466219ec8
fix typos ( #3840 )
...
* fix typos
* ci
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-12 11:04:38 +08:00
RichardWooSJTU
82dab8a91a
Add token processor plugin support ( #4059 )
...
* Add token processor plugin support
* fix import
* fix import
2025-09-12 10:17:23 +08:00
chenjian
37f1632732
[Optimize] optimize prefix cache in develop ( #3890 )
...
* optimize prefix cache in release22
* fix
* fix
* fix
* add ci for v1
* add unit test
---------
Co-authored-by: xiegegege <46314656+xiegegege@users.noreply.github.com >
2025-09-12 10:15:59 +08:00
chen
4859f40b20
[Feature] GLM-45-AIR Support Mix Quantization(Dense wfp8afp8 and wint8 triton_moe_backend) ( #4051 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-11 20:08:09 +08:00
lddfym
2056a428bd
[bug fix] Fix the placeholder in qwen prompt and add some unittests ( #4065 )
...
* fix the placeholder in qwen prompt
* fix the placeholder in qwen prompt
* add soem unittests for qwen_vl_processor
2025-09-11 20:00:02 +08:00
memoryCoderC
850465e8ed
[Feature] add cli command chat,complete ( #4037 )
2025-09-11 19:53:14 +08:00
zhuzixuan
a47976e82d
[Echo] Support more types of prompt echo ( #4022 )
...
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
* wenxin-tools-700 When the prompt type is list[int] or list[list[int]], it needs to support echoing after decoding.
---------
Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com >
2025-09-11 19:34:44 +08:00
xiaoxiaohehe001
abdcef30aa
[BugFix] mm_post_fix ( #4005 )
...
* mm_post_fix
* mm_post_fix_1
2025-09-11 19:09:46 +08:00
Zhang Yulong
d2ec7f6aa2
update ci ( #4064 )
...
* update ci
* update ci
2025-09-11 18:36:25 +08:00
YuBaoku
fec58639db
[CI] skip test_structured_outputs* temporarily ( #4055 )
2025-09-11 18:07:50 +08:00
YuanRisheng
d2d04c2d5e
[setup optimize]Support git submodule ( #4033 )
...
* support git submodule
* update setup
* fix ci network
* fix clone
* revert clone linux
* delete args
* fix ci
* update
2025-09-11 17:41:16 +08:00
SuperNova
d60f7c4661
fix import tests.utils error in tests/model_loader/test_load_mtp.py ( #4027 )
...
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-09-11 16:47:16 +08:00
CSWYF3634076
e4c64a71cc
[BugFix] qwen2.5vl enable_thinking=true and image_patch_id bug fix ( #3921 )
2025-09-11 15:08:24 +08:00
bukejiyu
2650f58740
[docs] Update environment variables documentation ( #3957 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-10 21:17:06 -07:00
co63oc
2af0f671b1
【Hackathon 9th No.55】add test_update_inputs_v1.py ( #3992 )
2025-09-11 11:34:22 +08:00
AIbin
a7392a0ff9
【Inference Optimize】DeepSeek-V3-model MLA Optimize ( #3886 )
...
* support MLA chunk_size auto search & cuda_graph
2025-09-11 10:46:09 +08:00
chen
637d96c6ae
[Feature] Support zai-org/GLM-4.5-Air BF16 model ( #3928 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support glm45_air
2025-09-10 19:36:10 +08:00
freeliuzc
7ee100903f
support rope_3d in spec mode ( #4034 )
2025-09-10 03:15:05 -07:00
ltd0924
684e93269b
[Fix] fix multi api server log dir ( #3967 )
...
* [BugFix] fix max streaming tokens invalid
* fix scheduler bug
* fix scheduler bug
* Update multi_api_server.py
2025-09-10 17:15:30 +08:00
wanrui
276f73cf83
【Hackathon 9th No.28】add test_cutlass_fp8_fp8_fp8_dual_gemm_fused ( #3935 )
...
* add test_cutlass_fp8_fp8_fp8_dual_gemm_fused
* fix the version
* fix code style
---------
Co-authored-by: Tao Luo <luotao02@baidu.com >
2025-09-10 14:57:49 +08:00
RAM
d3e4ae3d49
[Executor] Adjust signal sending order in RL training ( #3773 )
...
* Adjust processing order
* fix bug
* fix update_parameters bug
* refine code
2025-09-10 13:24:20 +08:00
Ayakouji
453487d5b0
[Feat] ernie4_5_vl_moe support CudaGraph ( #3226 )
...
* delete dynamic control flow for decode
* coda-style
* fix scatter/gather typos and use input stream instead default stream
* support 0-Size Tensor
* update runner and model
* using static mem address as input
* fix mem leak
* refine code
* update mm_buffer
* fix typo
* fix buffersize
* fix unk token
* refine code
* refine
* support other arch
* open cudagraph in vlci
* fix
* update
* update
* update
* fix cmd
* update
---------
Co-authored-by: aquagull <hongyuh@qq.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-09-10 13:11:57 +08:00
zhupengyang
9d0074a91a
[xpu] add ep custom ops ( #3911 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-10 12:22:50 +08:00
Yuanle Liu
c3b2a60fb8
[BugFix] Fix the abnormal memory usage caused by shape errors in the triton moe backend ( #4026 )
...
* fix device_id to in
* fix triton_moe bug
2025-09-09 20:05:54 -07:00
周周周
dbab579299
clean code ( #4020 )
2025-09-10 10:56:15 +08:00
guozhuangzhuang
f078a959b6
metrics shared folder naming ( #4007 )
...
* Fixed the issue of metrics file conflicts between multiple instances on a single machine
* Use uuid to name the metrics shared folder
* Use uuid to name the metrics shared folder
2025-09-10 10:47:20 +08:00
Sunny-bot1
3b1da6e4dd
support v1 loader for machete ( #3999 )
2025-09-10 10:21:33 +08:00
YuanRisheng
b3fac5bde1
[V1 Loader] Ernie kv cache quant support v1 loader ( #3899 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* support c8 for ernie
* add unittest
* support vl
* fix c8
2025-09-09 05:25:08 -07:00
Zero Rains
98bfefea02
get org_vocab_size from args ( #3983 )
2025-09-09 15:08:03 +08:00
Jiang-Jia-Jun
c60adf4281
Revert "【FIX】Change the name of sparse attn from moba to plas ( #3845 )" ( #4001 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
This reverts commit e31c8f7336 .
2025-09-09 11:08:23 +08:00
Jiang-Jia-Jun
bbd548ceb6
Revert "【Fix】Change the name of sparse attn from moba to plas ( #3993 )" ( #4002 )
...
This reverts commit a553d1896c .
2025-09-09 11:07:46 +08:00
yangjianfengo1
f556561584
【docs】 update readme ( #4000 )
...
* 更新文档
* update readme
* update docs
2025-09-09 11:04:08 +08:00
yangjianfengo1
a553d1896c
【Fix】Change the name of sparse attn from moba to plas ( #3993 )
...
* 更新文档
* 更新文档
* 更新文档
* 更新文档
* 修改moba为plas
* code style
* update ci
* code style
* update ci
2025-09-09 10:57:07 +08:00
yangjianfengo1
e31c8f7336
【FIX】Change the name of sparse attn from moba to plas ( #3845 )
...
* 更新文档
* 更新文档
* 更新文档
* 更新文档
* 修改moba为plas
* code style
* update ci
* code style
* update ci
2025-09-09 10:56:50 +08:00
yangjianfengo1
de34222842
更新文档 ( #3998 )
2025-09-09 10:44:15 +08:00
JYChen
8e8a5913da
add a3b-thinking doc ( #3994 )
2025-09-09 10:27:01 +08:00
Jiang-Jia-Jun
9f0e2a6854
Update README_CN.md
2025-09-09 10:11:25 +08:00
Jiang-Jia-Jun
30ddcc9115
Update README.md
2025-09-09 10:10:45 +08:00
Zhang Yulong
2359c8d21c
update ci ( #3962 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-09 10:09:13 +08:00
Jiang-Jia-Jun
1dc1397ef6
Update docs for thinking model support
2025-09-09 10:08:05 +08:00
ming1753
12326b60e1
[Docs] update VL best_practices for release/2.2 ( #3965 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Docs] update VL best_practices for release/2.2
* fix bug
* modify
2025-09-08 22:07:37 +08:00
lzy
f12159b630
del batch id per token ( #3963 )
...
* Update decoder_write_cache_with_rope_kernel.cu
del batch_id_per_token
* Update decoder_write_cache_with_rope_impl.cuh
* Update test_append_attention.py
* Update test_append_attention.py
2025-09-08 21:58:34 +08:00
bukejiyu
08b3153661
update doc ( #3990 )
...
Co-authored-by: root <root@tjdm-inf-sci-k8s-hzz2-h12ni8-0214.tjdm.baidu.com >
2025-09-08 21:04:26 +08:00
AIbin
d00faeec69
update dsk doc ( #3989 )
2025-09-08 20:42:48 +08:00
yinwei
7e0bfd024f
update release note ( #3986 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-08 19:03:14 +08:00
JYChen
1f056a7469
[docs] update best practice docs ( #3969 )
...
* update best practice docs
* add version and v1 loader info
2025-09-08 17:39:38 +08:00
Echo-Nie
319a4bf75f
【Hackathon 9th No.36】add test_extract_text_token_output( #3862 )
2025-09-08 17:31:58 +08:00
co63oc
f884cd4f62
[UnitTest][MTP]add test_speculate_set_stop_value_multi_seqs.py ( #3941 )
2025-09-08 17:11:00 +08:00
co63oc
f32327661c
[UnitTest][MTP]add test_eagle_get_hidden_states ( #3876 )
2025-09-08 17:10:01 +08:00
co63oc
976aa88e66
【Hackathon 9th No.69】add test_draft_model_preprocess ( #3832 )
...
* add test_draft_model_preprocess
* fix
* ci
2025-09-08 17:08:50 +08:00
co63oc
ed462cf238
[UnitTest][MTP] add test_speculate_get_token_penalty_multi_scores.py ( #3742 )
...
* add test_speculate_get_token_penalty_multi_scores
* fix
2025-09-08 17:07:11 +08:00
Echo-Nie
20495f927e
[UnitTest][MTP] supplementary unit test for ngram_match ( #3732 )
...
* supplement unittest for custom_ops: ngram_match
* add annotation
* 借助 step_idx 信息,改为在具体位置判断是否相等
* del anno
* del print
---------
Co-authored-by: Tao Luo <luotao02@baidu.com >
2025-09-08 17:06:06 +08:00
ooo oo
0c46318b34
【Hackathon 9th No.22】add unit tests for share_external_data ( #3744 )
2025-09-08 17:05:48 +08:00
yangjianfengo1
9ead10e1bc
更新文档 ( #3975 )
2025-09-08 16:53:37 +08:00
xiaolei373
571ddc677b
Modify markdown ( #3896 )
...
* feat(log):add_request_and_response_log
* modify markdown graceful shutdown
2025-09-08 16:42:34 +08:00
AIbin
316ac546d3
update_wint2_doc ( #3968 )
2025-09-08 15:53:09 +08:00
zhuzixuan
83bd55100b
[Optimize]Error messages about Model api. ( #3839 )
...
* add v1/models interface related
* add model parameters
* default model verification
* unit test
* check model err_msg
* unit test
* type annotation
* model parameter in response
* modify document description
* modify document description
* unit test
* verification
* verification update
* model_name
* pre-commit
* update test case
* update test case
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/entrypoints/openai/serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* 优化报错信息。
---------
Co-authored-by: yangzichao01 <yangzichao01@baidu.com >
Co-authored-by: Yzc216 <101054010+Yzc216@users.noreply.github.com >
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-09-08 15:52:26 +08:00
co63oc
aadd6a94d8
fix typos ( #3951 )
2025-09-08 15:22:41 +08:00
co63oc
2033450391
rename ep_moe_prefill_func ep_moe_expert_dispatch ( #3938 )
2025-09-08 15:19:28 +08:00
Sunny-bot1
ed5133f704
update env docs for Machete ( #3959 )
2025-09-08 14:44:31 +08:00
qwes5s5
17169a14f2
[metrics] Add serveral observability metrics ( #3868 )
...
* Add several observability metrics
* [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息
* adjust some metrics and md files
* trigger ci
* adjust ci file
* trigger ci
* trigger ci
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-08 14:13:13 +08:00
Jundong Liu
3d0aaa5923
[Excutor] Experiment Feature-Support Prefill in cudagraph ( #3459 )
...
* Support prefill in Cudagraph
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4
* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5
* Solve problem about encoder_num_blocks_x_cpu
* Add early-exit mechanism for attention kernel
* fix test case about append-attention
* Update testcode, Add annotations to related tensors
* move get_input_length_list
* solve test_code
* Add annotations about early-exit for attention kernel
* Add annotations about early-exit for attention kernel2
* solve comment
* solve mtp
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-09-08 13:12:24 +08:00
yangjianfengo1
472402bf4e
Update sparse attn documentation ( #3954 )
...
* 更新文档
* 更新文档
* 更新文档
* 更新文档
2025-09-08 12:23:18 +08:00
lzy
af49b81ffd
supports dynamic Cfp8 ( #3767 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* supports dynamic Cfp8
* add unittest
2025-09-07 20:41:29 -07:00
chenjian
b5e20e3015
[Bug fix] Fix prompt token ids dtype in v1 ( #3860 )
2025-09-08 11:34:13 +08:00
yinwei
7833f2f6cb
[XPU]Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER ( #3897 )
...
* fix bug
* fix bug
* update
* update
* update
2025-09-08 10:34:46 +08:00
ApplEOFDiscord
b649494655
[Feature] add HTTP GET retry ( #3838 )
...
* add http get retry
* fix coments
---------
Co-authored-by: zhangjunjun04 <zhangjunjun04@baidu.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-08 10:11:14 +08:00
bukejiyu
7c268693ed
ignore ci ( #3950 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-07 23:58:52 +08:00
bukejiyu
e52ce1c4b1
cache feature ( #3857 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-07 18:52:46 +08:00
co63oc
30a1c1783f
rename eagle_get_base_model_hidden_states.cu ( #3753 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-07 10:24:58 +08:00
Zhang Yulong
349aa6348b
add cache queue port ( #3904 )
...
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* add cache queue port
* add cache queue port
* add cache queue port
2025-09-05 21:17:06 +08:00
ltd0924
0c45e225d3
mv connection_manager init ( #3901 )
...
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-09-05 21:11:48 +08:00
周周周
f6f726c773
clean code in sttantion ( #3917 )
2025-09-05 20:49:01 +08:00
chen
0d989829bb
Compatible with EB 0.3B torch model arch ( #3913 )
...
* fix
* check
2025-09-05 19:04:59 +08:00
ltd0924
bd7d15f7ea
[Feature] support controller port in multi api server ( #3898 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* Update serving_chat.py
* Update serving_completion.py
* Update serving_completion.py
* Update multi_api_server.py
2025-09-05 17:16:31 +08:00
Yuan Xiaolan
2cf55168ca
load hadamard_block_size from config ( #3797 )
2025-09-05 17:07:58 +08:00
AIbin
41aee08982
【Inference Optimize】Update MergedReplicatedLinear for DSK qkv_a_proj_with_mqa. ( #3673 )
...
* support MergedReplicatedLinear
* update MergedReplicatedLinear to support DSK_wint4 V1_load
* update model name
* update linear class
* fix
* fix v0 moe_bias load
---------
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com >
2025-09-04 21:16:05 -07:00
ooo oo
b23fc654d9
【Hackathon 9th No.32】add unit tests for group_swiglu_with_masked ( #3748 )
2025-09-05 11:53:47 +08:00
gaoziyuan
ab1929f5ff
fix mem boom in ep ( #3854 )
2025-09-05 11:48:21 +08:00
Echo-Nie
fc3bc56e59
【Hackathon 9th No.35】add test_moe_redundant_topk_select ( #3867 )
2025-09-05 11:29:02 +08:00
ltd0924
7643e6e6b2
[Docs] add data parallel ( #3883 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Docs] add data parallel
* [Docs] add data parallel
2025-09-04 20:33:50 +08:00
ltd0924
e0e7d68435
Update qwen_vl_processor.py ( #3808 )
2025-09-04 20:31:48 +08:00
Zhang Yulong
4c160aa4dd
Update test_ernie_21b_mtp.py ( #3885 )
2025-09-04 20:20:36 +08:00
YuBaoku
c7b7126b20
[CI] update paddleformers==0.2 in develop ( #3878 )
2025-09-04 20:12:41 +08:00
SunLei
29628de6a7
Support for async processor added. ( #3869 )
...
* Support for async processor added.
* remove yappi code
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-09-04 19:58:53 +08:00
xiaolei373
ed97cf8396
Graceful shut down ( #3785 )
...
* feat(log):add_request_and_response_log
* 优雅退出-接口增加退出时长参数
2025-09-04 19:33:50 +08:00
freeliuzc
88d44a2c93
support mtp in v1_scheduler mode ( #3695 )
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-04 17:39:59 +08:00
xiaoxiaohehe001
f265a26f8b
support mtp rope_3d ( #3791 )
...
* support mtp rope_3d
* Update speculate_write_cache_with_rope_kernel.cu
2025-09-04 17:18:05 +08:00
RichardWooSJTU
f36a388ffe
fix response processsors ( #3826 )
...
* fix response processsors
* fix ci
* fix ut
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-04 16:01:25 +08:00
chenjian
22c165d6dd
[Feature] Set v1 scheduler as default in develop ( #3807 )
...
* Set scheduler v1 as default
* Set scheduler v1 as default
* Set scheduler v1 as default
* Set scheduler v1 as default
* Set scheduler v1 as default
* close V1 in guided_decoding
* fix vl ci
* close V1 in guided_decoding
2025-09-04 15:16:56 +08:00
co63oc
e83251699f
【Hackathon 9th No.63】add test_draft_model_postprocess.py ( #3757 )
...
* add test_draft_model_postprocess.py
* fix
* fix
2025-09-04 15:00:48 +08:00
Echo-Nie
ac46ef403a
【Hackathon 9th No.34】add test_get_position_ids_and_mask_encoder_batch ( #3739 )
2025-09-04 14:54:30 +08:00
RichardWooSJTU
0989788b29
support extend block tables ( #3824 )
2025-09-04 14:39:04 +08:00
gaoziyuan
6ef3b611b0
add dp config ( #3822 )
2025-09-04 11:46:48 +08:00
ooo oo
460809070c
【Hackathon 9th No.54、57】 add unit tests for per_token_quant and per_token_quant_padding ( #3746 )
2025-09-04 11:46:38 +08:00
co63oc
7baf1b56e0
【Hackathon 9th No.27】add test_get_padding_offset ( #3708 )
...
* add test_get_padding_offset
* fix
* fix
* fix
2025-09-04 11:42:35 +08:00
co63oc
9ec4fa0f8e
fix typo EngineSevice EngineService ( #3841 )
2025-09-04 11:20:36 +08:00
yangjianfengo1
c870be6d27
fix port ( #3863 )
2025-09-04 10:01:38 +08:00
plusNew001
3790505319
[XPU] Update XPU stable xvllm and xtdk version for 2.2 ( #3853 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* Add debug environment variable exports
Added debug environment variable exports for CLANG_PATH and XVLLM_PATH.
* Lock paddlepaddle-xpu version in CI script
Temporarily lock paddlepaddle-xpu version due to framework update issues.
* Update no_proxy environment variable in CI workflow
* Install lsof tool in run_ci_xpu.sh
* Update dependency versions for stable release
* Update paddlepaddle-xpu installation command
2025-09-03 23:21:00 +08:00
co63oc
e24b745d48
[UnitTest][MTP]add test_speculate_get_output_padding_offset ( #3740 )
2025-09-03 22:21:21 +08:00
co63oc
aaa2de1afa
[UnitTest][MTP]add test_speculate_get_padding_offset ( #3730 )
2025-09-03 22:21:02 +08:00
yyssys
abde903813
Automatically configure workers based on max-num-seqs ( #3846 )
...
Automatically configure workers based on max-num-seqs
2025-09-03 21:12:42 +08:00
YUNSHEN XIE
7dbd9412b0
reopen ut ( #3795 )
...
* reopen ut
* update
* update
* update ci dockerfile
2025-09-03 19:05:20 +08:00
luukunn
fc598d4c5a
add reasoning parser plugin ( #3811 )
...
* add reasoning parser plugin
* fix finish reason
2025-09-03 18:31:27 +08:00
Ayakouji
31313e0f3d
[Feature] ernie4_5_vl_moe support huggingface safetensor loading ( #3750 )
...
* update
* update
* update in tp
* add todo
* update
---------
Co-authored-by: aquagull <hongyuh@qq.com >
2025-09-03 02:58:59 -07:00
lizexu123
4c998c3636
[Code Simplification] delete cum_offsets_out ( #3815 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix
* fix
2025-09-03 16:15:33 +08:00
YuanRisheng
0a1ce612c2
V1 loader support ep ( #3801 )
2025-09-03 16:05:41 +08:00
Yuan Xiaolan
fa58a9fa8f
qk norm for speculate decode C16 ( #3637 )
2025-09-03 14:53:56 +08:00
plusNew001
d22d3de256
[XPU] Update XPU CI case ( #3837 )
...
* Add debug environment variable exports
Added debug environment variable exports for CLANG_PATH and XVLLM_PATH.
* Lock paddlepaddle-xpu version in CI script
Temporarily lock paddlepaddle-xpu version due to framework update issues.
* Update no_proxy environment variable in CI workflow
* Install lsof tool in run_ci_xpu.sh
2025-09-03 14:32:12 +08:00
lzy
2527eb0e4e
fix test_append_attention_with_output.py ( #3831 )
...
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com >
2025-09-03 14:07:50 +08:00
AIbin
54b458fd98
[Doc] update wint2 doc ( #3819 )
...
* update_wint2_doc
2025-09-03 11:27:43 +08:00
plusNew001
d81c57146f
[XPU] FIX XPU CI BUG ( #3829 )
...
* Add debug environment variable exports
Added debug environment variable exports for CLANG_PATH and XVLLM_PATH.
* Lock paddlepaddle-xpu version in CI script
Temporarily lock paddlepaddle-xpu version due to framework update issues.
2025-09-03 11:25:48 +08:00
ooo oo
2396e49f9e
【Hackathon 9th No.73】add unit tests for graph_opt_backend ( #3609 )
...
* test: add unit tests for graph_opt_backend
* refactor(tests): improve graph optimization test structure and readability
* fix(tests): correct CUDA graph related typos in test files
- Fix class name: TestCUDAGrpahSubgraph -> TestCUDAGraphSubgraph
* refactor(test): support attention layer and optimize graph optimization backend test to eliminate redundant baseline calculations
* remove some func call
---------
Co-authored-by: RAM <gstian5555@outlook.com >
Co-authored-by: Tao Luo <luotao02@baidu.com >
2025-09-03 11:18:00 +08:00
co63oc
94a61d505c
fix dcu_worker.py ( #3734 )
2025-09-03 10:57:42 +08:00
co63oc
ce998449e0
fix w8a8.py ( #3733 )
2025-09-03 10:57:26 +08:00
Echo-Nie
f7a4bea785
【Hackathon 9th No.84】Supplementary Unit Test for fastdeploy/reasoning ( #3570 )
...
测试内容:测试基类的注册、获取函数功能是否正常
Co-authored-by: Tao Luo <luotao02@baidu.com >
2025-09-03 10:55:02 +08:00
co63oc
5441538173
rename fused_get_rope.cu ( #3752 )
...
* rename fused_get_rope.cu
* fix
* fix typos
* fix
* fix
2025-09-03 10:54:34 +08:00
ltd0924
2c9b169c0e
[BugFix] fix scheduler invalid ( #3803 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [BugFix] fix max streaming tokens invalid
* fix scheduler bug
* fix scheduler bug
2025-09-02 20:28:51 +08:00
Longzhi Wang
e0c9a6c76c
[Feat] Support streaming transfer data using ZMQ ( #3521 )
...
* Support streaming transfer data of ZMQ
* fix typo
* fix typo
* support tp
* add unittest
* update
* update
* fix typo
* fix typo
* fix tp_num in ci machine
---------
Co-authored-by: Wanglongzhi2001 <>
2025-09-02 19:52:19 +08:00
Echo-Nie
0fe1d62232
[MTP] add test_draft_model_set_value_by_flags.py ( #3741 )
2025-09-02 19:33:33 +08:00
Jiang-Jia-Jun
18e5d355a1
Update version in docs
2025-09-02 19:21:10 +08:00
yangjianfengo1
8e1b35a09b
【Fix bug] w4afp8 的nblock固定为256,并且fa3的append attn 增加mask参数 ( #3771 )
...
* fix w4afp8
* 增加集中式配置
* codestyle
* fix fa3 append attn
2025-09-02 19:17:01 +08:00
bukejiyu
b6a4115369
[v1loader]Reduce EB300B model loading time ( #3700 )
...
* speed up eb45
* update
2025-09-02 19:13:57 +08:00
YUNSHEN XIE
693c7d781c
fix ce compile job ( #3768 )
...
* fix ce compile job
* update
* update
* update
* update
2025-09-02 18:37:13 +08:00
co63oc
aa067a3106
rename speculate_token_penalty_multi_scores.cu ( #3735 )
2025-09-02 18:12:11 +08:00
lzy
7a521bbf62
Modify mask_offset‘s format ( #3525 )
...
* modify mask_offset in decode
* modify mask_offset unittest
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-02 03:05:35 -07:00
co63oc
f296aff6cf
rename speculate_stop_generation_multi_stop_seqs ( #3743 )
2025-09-02 18:04:29 +08:00
RAM
205b706ef8
[Executor] Fix bug of import paddle with RLHF ( #3781 )
2025-09-02 17:32:13 +08:00
Yuanle Liu
306c024ff3
[BugFix] fix error of import paddle.base.core.Config ( #3761 )
...
* 延迟 import Config
* support chunked_prefill
* support chunked_prefill
2025-09-02 17:23:27 +08:00
ltd0924
905d89e42f
[Feature] support model weight update in ep ( #3765 )
...
* support model weight update in ep
* support model weight update in ep
* support model weight update in ep
* support model weight update in ep
* Update fused_moe_backend_base.py
* Update worker_process.py
* Update worker_process.py
* Update dynamic_weight_manager.py
2025-09-02 17:16:03 +08:00
kevin
1908465542
[Feature] mm and thinking model support structred output ( #2749 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* mm support structured output
* update code
* update code
* update format
* update code
* update code
* add enable_thinking default
* update code
* add structured_outputs test case
* add ci install xgrammar
* add ci timeout time
* update test for structured_outputs
* update code
* add error traceback info
* update error msg
* update structred output code
* update code
* update code
* update config
* update torch version
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-02 16:21:09 +08:00
Jiang-Jia-Jun
0e4df5a6f4
[Feature] Setting number of apiserver workers automatically ( #3790 )
...
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-09-02 14:17:48 +08:00
ltd0924
bf0cf5167a
[BugFix] fix max streaming tokens invalid ( #3789 )
2025-09-02 13:57:32 +08:00
kevin
7e751c93ae
[BugFix] Fix chunked prefill ( #3759 )
...
* add error traceback info
* update error msg
* update code
* default enable chunked prefill
* update code
* update code
* add envs
* update code
* update enable chunked_prefill
* update code
* update code
* update code
* update code
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-02 13:40:45 +08:00
Jiang-Jia-Jun
27f2e7a6f1
Create faq.md
2025-09-02 11:07:37 +08:00
co63oc
6ac7cea81b
fix test_load_mtp ( #3780 )
2025-09-02 10:21:02 +08:00
Zhang Yulong
adc246127b
Update test_ernie_21b_mtp.py ( #3783 )
...
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
暂时跳过多卡MTP case
2025-09-01 20:39:40 +08:00
lizexu123
6dd61a1bab
fix Document ( #3782 )
...
Co-authored-by: example_name <example_email>
2025-09-01 20:22:43 +08:00
YUNSHEN XIE
253f388372
add ci images build job ( #3749 )
...
update
update
2025-09-01 19:57:36 +08:00
co63oc
d6369b4d51
fix typos ( #3684 )
2025-09-01 17:50:17 +08:00
Jiang-Jia-Jun
0513a78ecc
Update docs for reasoing-parser
2025-09-01 17:42:58 +08:00
Jiang-Jia-Jun
0297127a93
Update FASTDEPLOY_VERSION to 2.3.0-dev
2025-09-01 16:48:42 +08:00
Jiang-Jia-Jun
2bd7d90929
Remove useless parameters
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-01 14:43:56 +08:00
YuanRisheng
6566e29807
Add loader test for mtp ( #3724 )
...
* add test for mtp
* fix unittest
* fix
2025-09-01 10:55:49 +08:00
Zhang Yulong
085fe070f2
add CI cases ( #3714 )
2025-09-01 10:06:49 +08:00
ming1753
927e8ec55e
Add more runtime information to resource manager ( #3706 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-01 00:25:28 +08:00
chenjian
465065cd19
[Bug fix] Fix prefix cache in V1 ( #3715 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* [Bug fix] Fix prefix cache in V1
* fix code style
2025-08-31 21:29:33 +08:00
lizhenyun01
bed09ae8f8
fix mask_offset in append_attn ( #3745 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mask_offset in append_attn
* fix test
2025-08-31 15:03:16 +08:00
kevin
753772ace8
default enable chunked prefill ( #3731 )
...
* add error traceback info
* update error msg
* update code
* default enable chunked prefill
* update code
* update code
* add envs
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-31 13:15:13 +08:00
李泳桦
98e03fb4ea
[feat] add metrics for yiyan adapter ( #3219 ) ( #3614 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* [feat] add metrics for yiyan adapter
* [fix] fix metrics num_requests_waiting and num_requests_running
* [fix] fix metrics gpu_cache_usage_perc
* [refactor] change where requests_number increases
* [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly
* [chore] delete useless code
2025-08-30 23:20:58 +08:00
Sunny-bot1
fe5d09f9ee
[FIX]Fix Machete compile via ENABLE_MACHETE ( #3727 )
...
* add ENABLE_MACHETE
* fix
* revert
* update
* pre_commit
* fix
* fix
---------
Co-authored-by: Ayakouji <yuhongh@qq.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: aquagull <hongyuh@qq.com >
2025-08-30 17:50:17 +08:00
SunLei
b9af95cf1c
[Feature] Add AsyncTokenizerClient&ChatResponseProcessor with remote encode&decode support. ( #3674 )
...
* [Feature] add AsyncTokenizerClient
* add decode_image
* Add response_processors with remote decode support.
* [Feature] add tokenizer_base_url startup argument
* Revert comment removal and restore original content.
* [Feature] Non-streaming requests now support remote image decoding.
* Fix parameter type issue in decode_image call.
* Keep completion_token_ids when return_token_ids = False.
* add copyright
2025-08-30 17:06:26 +08:00
luukunn
9a7c231f2c
[Feature]support chat_template.jinja ( #3721 )
...
* add support chat_template.jinja
* add support chat_template.jinja
2025-08-30 17:05:34 +08:00
lizexu123
b21e085f3e
[Code Simplification] delete print ( #3729 )
2025-08-30 16:19:07 +08:00
chen
7568b20098
check ( #3720 )
2025-08-30 16:04:20 +08:00
lizexu123
455205f991
[Features] support hugging face qwen3 moe ( #3649 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* split ut
* qwen3-30B-A3B
* fix
* add test
* add test_torch_model.py
* fix test_torch_model.py
* delete print
* fix moe
* delete init.py
* fix
* fix
---------
Co-authored-by: bukejiyu <395822456@qq.com >
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com >
2025-08-30 15:26:05 +08:00
Zero Rains
f206474cc7
fix the bug when num_key_value_heads < tensor_parallel_size ( #3717 )
2025-08-30 12:40:00 +08:00
chenjian
c4b1f6b0a5
[Optimize] Increase zmq buffer size to prevent apiserver too slowly to consume ( #3723 )
2025-08-30 10:45:26 +08:00
YUNSHEN XIE
a18afcfdd9
Optimize coverage jobs ( #3683 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-30 00:12:40 +08:00
chen
cd252ec673
[Feature]support load eb 0.3B and 21B torch model ( #3660 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-29 20:00:48 +08:00
yangjianfengo1
3754a9906d
[Feature] block sparse attention ( #3668 )
...
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
* 修复显存
* add test server
* add test server
* fix mlp 最后一层使用full attn
2025-08-29 19:46:30 +08:00
zhouchong
ccd52b5596
[Model]support qwen2_5_vl ( #3557 )
...
* adapt qwen_2_5_vl model
* adapt qwen_2_5_vl VIT model
* adapt qwen2_5_vl images_embeds
* adapt qwen2_5_vl 3D rope
* adapt qwen2_5_vl 3D rope v2
* adapt qwen2_5_vl processor
* adapt qwen2_5_vl bypass resampler_model
* adapt qwen2_5_vl 绕过部分ernie逻辑
* adapt qwen2_5_vl 绕过部分ernie逻辑 v2
* adapt qwen2_5_vl 权重加载与命名修改
* adapt qwen2_5_vl 非必须think_end_id
* adapt qwen2_5_vl 区分多种模型的extract_vision_features
* fix:adapt qwen2_5_vl model
* adapt qwen2_5_vl norm
* adapt qwen2_5_vl processor 更新
* adapt qwen2_5_vl image and video success
* adapt qwen2_5_vl 部分整理代码
* adapt qwen2_5_vl 支持多卡
* adapt qwen2_5_vl on latest develop
* adapt qwen2_5_vl RL
* adapt qwen2_5_vl 整理代码
* support noex rope3d
* adapt qwen2_5_vl add init.py
* adapt qwen2_5_vl add init.py v2
* adapt qwen2_5_vl remove space
* adapt qwen2_5_vl remove space v2
* adapt qwen2_5_vl pre-commit
* adapt qwen2_5_vl update
* adapt qwen2_5_vl pre-commit v2
* adapt qwen2_5_vl modify comments
* adapt qwen2_5_vl fix indentation
* adapt qwen2_5_vl fix indentation v2
---------
Co-authored-by: wangyafeng <wangyafeng@baidu.com >
Co-authored-by: xiaoxiaohehe001 <49090790+xiaoxiaohehe001@users.noreply.github.com >
Co-authored-by: CSWYF3634076 <58356743+CSWYF3634076@users.noreply.github.com >
2025-08-29 18:28:39 +08:00
YuBaoku
65425bf858
[CI] update paddle version to nightly ( #3698 )
2025-08-29 18:16:13 +08:00
Yuan Xiaolan
c71ee0831c
add w4afp8 offline script ( #3636 )
2025-08-29 17:56:05 +08:00
zyfncg
f677c032c0
[CudaGraph] [SOT] Support spliting static graph into piecewise graph with cuda_graph ( #3478 )
...
* support spliting static graph into piecewise graph with cuda_graph
* Update fastdeploy/model_executor/graph_optimization/cudagraph_piecewise_backend.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix merge conflict
* fix bug
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-08-29 16:28:01 +08:00
lzy
48d760539b
fix deepcopy(tp_group) in spec ( #3648 )
2025-08-29 16:08:21 +08:00
Ryan
45f81b34f0
add dtype int32 ( #3692 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-29 14:56:35 +08:00
xiaoxiaohehe001
1bf4fc7f36
support w4afp8 eplb ( #3680 )
2025-08-29 14:43:06 +08:00
Yuanle Liu
68f87240da
fix key error in mm ( #3702 )
2025-08-29 14:35:12 +08:00
李泳桦
88297240e7
[feat] completion api supports passing input token ids in either prompt or prompt_token_ids ( #3311 )
...
* [feat] completion api supports passing input token ids in either `prompt` or `prompt_token_ids`
* [fix] update comment
* [fix] fix type error
* [test] add a unittest file for serving api test
* [test] try to fix ci error
* [chore] rename test function names
* [test] try to fix ci error
* [test] try to fix ci error
* [test] add tests for qwen
2025-08-29 14:19:42 +08:00
周周周
17b414c2df
MoE Default use triton's blockwise fp8 in TP Case ( #3678 )
2025-08-29 11:07:30 +08:00
co63oc
b6edd15d55
fix scaled_gemm_f8_i4_f16_weight_quantize input ( #3685 )
2025-08-29 11:04:04 +08:00
Yuanle Liu
2fb2c0f46a
fix MultimodalRegistry ( #3699 )
2025-08-29 11:01:30 +08:00
Echo-Nie
43d5bd62b4
【Hackathon 9th No.70】supplementary unit test for CPUPlatform and CUDAPlatform ( #3580 )
...
* 功能模块 CUDAPlatform、CPUPlatform 单测补充
* update the "is_cuda" to "is_cuda_and_available"
* fix pre-commit
---------
Co-authored-by: Tao Luo <luotao02@baidu.com >
2025-08-29 10:34:05 +08:00
lifulll
72094d4d82
enable dcu ci ( #3402 )
2025-08-29 10:23:08 +08:00
kevin
73d60fe64d
update ci envs for structred output ( #3687 )
...
* add error traceback info
* update error msg
* update code
* update ci envs for structred output
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-29 10:21:36 +08:00
bukejiyu
0b51b9c35b
fix qwen3 235B tp 8 ( #3697 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-28 23:46:25 +08:00
Yuanle Liu
4957908275
add input_processor plugin ( #3657 )
...
* add input_processor plugin
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
* update
2025-08-28 22:53:57 +08:00
ming1753
02b3644903
[Bug Fix] VL Support w4a8/w4afp8 ( #3686 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-28 21:38:35 +08:00
YuanRisheng
808b548761
support tmp ( #3675 )
2025-08-28 19:42:32 +08:00
Divano
368bbd9dc6
Update _base_test.yml ( #3690 )
...
新增测试并发参数ci case
2025-08-28 19:15:19 +08:00
gaoziyuan
fc635acc47
[BugFix]fix dp&ep&tp and muti node infer ( #3629 )
...
* rm log
* fix bug
* fix bug
* fix dp&ep&tp and muti node infer
* fix
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-08-28 19:09:10 +08:00
Divano
17731a8acd
add concurrency cases ( #3689 )
2025-08-28 18:30:19 +08:00
Liumengyuan
2a73a6df03
fix_fp8_deepgemm_moe_tp_bug ( #3658 )
2025-08-28 17:19:02 +08:00
Liumengyuan
e93d4cfcdd
Add with_output version AppendAttention ( #3302 )
...
* get use_output from fd_config
* add clear TODO description
* add mask_offset para to align with develop
* fix bug
* fix use_output logic
* fix sot bug
2025-08-28 17:10:18 +08:00
ltd0924
94ded434bd
[BugFix] ep mixed offline exit ( #3661 )
...
* Update expert_service.py
* Update expert_service.py
2025-08-28 17:09:07 +08:00
ltd0924
e5015eea05
[BugFix] fix logger ( #3666 )
2025-08-28 17:08:00 +08:00
bukejiyu
73cf6096da
fix ( #3676 )
...
* fix
* update
2025-08-28 17:06:32 +08:00
ltd0924
98c217b428
Update config.py ( #3669 )
2025-08-28 15:30:51 +08:00
co63oc
d4fc893fe3
fix typos ( #3633 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-28 14:42:24 +08:00
co63oc
c294fc8139
Fix target_version ( #3159 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fix
* fix
* fix
2025-08-28 14:17:54 +08:00
Mattheliu
108d989d9d
[Docs] add fastdeploy_unit_test_guide.md ( #3484 )
...
* docs:add fastdeploy_unit_test_guide.md
* docs:fix fastdeploy_unit_test_guide.md
* docs: add FastDeploy unit test spec (EN) and update usage nav
* fix codestyle
2025-08-28 14:12:25 +08:00
plusNew001
b791bea0c5
Update run_ci_xpu.sh to lock xvllm version ( #3671 )
...
Lock version due to xvllm update causing service errors.
2025-08-28 12:30:50 +08:00
Yuan Xiaolan
d37331fc71
fix w4afp8_gemm_scale_permute import error on A100 ( #3611 )
2025-08-28 11:42:23 +08:00
YuanRisheng
ad9b95e6dd
fix rl bugs ( #3654 )
2025-08-28 11:09:34 +08:00
yangjianfengo1
e81046fdad
【New Feature】集中式支持w4afp8 ( #3644 )
...
* 支持tp w4afp8
* code style
2025-08-28 10:53:24 +08:00
周周周
76513f6416
Support 45t fp8 8 GPU ( #3659 )
2025-08-28 10:52:53 +08:00
Echo-Nie
7afcd4b776
【Hackathon 9th No.77】supplementary unit test for get_filtered_metrics ( #3578 )
...
* 功能模块 fastdeploy/metrics/metrics/get_filtered_metrics 单测补充
* fix pre-commit
---------
Co-authored-by: Tao Luo <luotao02@baidu.com >
2025-08-28 10:39:02 +08:00
ltd0924
3d92fb09f7
[BugFix] fix parameter is 0 ( #3592 )
...
* Update engine_client.py
* fix
* Update common_engine.py
2025-08-28 09:52:36 +08:00
Sunny-bot1
479c8b85d3
[Optimize]support machete weight only gemm ( #3561 )
...
* support machete weight only gemm
* add generate
* update
* fix
* change file location
* add sm_version limit
* fix
* fix
* fix ci
* fix coverage
* fix xpu
2025-08-28 09:49:58 +08:00
Zero Rains
e37e86b3b8
[V1 Loader]support param create and load for wint2 and xpu backend ( #3581 )
...
* support wint2 backend'
* [V1 Loader]support param create and load for wint2 and xpu backend
* update weight shape name
* update
* update
* update baseline.txt
* update model name
* update baseline.txt
* fix codestyle
* remove debug coode
2025-08-28 09:49:36 +08:00
lizexu123
b28a0343a6
fix ENABLE_V1_KVCACHE_SCHEDULER ( #3625 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-27 21:21:29 +08:00
ltd0924
2974016103
[BugFix] fix ce bugs ( #3641 )
...
* [BugFix] fix tp8 client refuse
* fix engine port bug
* Update utils.py
2025-08-27 20:38:15 +08:00
Yuanle Liu
836345a4dd
delete ernie4_5_vl_tokenizer ( #3631 )
2025-08-27 20:36:02 +08:00
Liumengyuan
11803e0907
fix undefined cuPointerGetAttribute symbol error ( #3628 )
2025-08-27 20:24:59 +08:00
Jiang-Jia-Jun
c694fa2879
Revert "[Feature] block sparse attention ( #3209 )" ( #3647 )
...
This reverts commit 646a0c2fd8 .
2025-08-27 17:35:04 +08:00
李泳桦
b2afdf4fc6
[fix] qwen output inconsistency when top_p=0 ( #3634 )
...
* [fix] qwen output inconsistency when top_p=0
* [fix] remove decode pre_id code
2025-08-27 17:16:23 +08:00
lzy
1265f6c192
deepgemm don't support tp+ep (for ci) ( #3638 )
...
* deepgemm don't support tp+ep (for ci)
* deepgemm don't support tp+ep (for ci)
2025-08-27 16:39:19 +08:00
plusNew001
f0140be1e1
Change paddlepaddle-xpu installation command ( #3646 )
...
Updated the installation command for paddlepaddle-xpu to use a specific wheel file.
2025-08-27 16:17:19 +08:00
JYChen
e645db348b
[docs] Update best practice doc ( #3539 )
...
* fix some docs error
* [docs] x1 best-practice
* update docs
* fix docs
2025-08-27 15:45:30 +08:00
xjkmfa
afb9f327ef
【CI case】for echo finish_reason text_after_process and raw_prediction check ( #3630 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
* echo&finish_reason&text_after_process&raw_prediction check
* echo&finish_reason&text_after_process&raw_prediction check
* echo&finish_reason&text_after_process&raw_prediction check
* echo&finish_reason&text_after_process&raw_prediction check
* echo&finish_reason&text_after_process&raw_prediction check
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2025-08-27 15:21:16 +08:00
chen
5ad8721506
check ( #3639 )
2025-08-27 14:32:13 +08:00
plusNew001
f8b70bf60c
update xpu ci ( #3632 )
...
* Update Docker image version in CI workflow
* Modify paddlepaddle-xpu installation and add dependencies
Updated installation source for paddlepaddle-xpu and added dependency download step.
* Fix no_proxy environment variable in CI workflow
2025-08-27 14:25:56 +08:00
chen
ce9c0917c5
[Precision] Support lm_head layer running in float32 ( #3597 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support lm_head fp32 bf16 fp16
* support lm_head fp32 bf16 fp16
* add doc and check code
* lm_head_fp32 specify lm_head as fp32
* code check
* check doc
2025-08-27 11:34:53 +08:00
xiaoxiaohehe001
ad319a87cc
support fa3 rope3d ( #3622 )
2025-08-27 11:31:29 +08:00
YUNSHEN XIE
85afa72763
fix publish task ( #3635 )
...
* fix publish task
* disable ut
2025-08-27 11:14:53 +08:00
yangjianfengo1
646a0c2fd8
[Feature] block sparse attention ( #3209 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* 支持稀疏attn
* fix bug
* code style
* fix moba attn get kv shape
* 修复a100编译
* codestyle
* code style
* code style
* code style
* fix conflict
* 增加单侧
* code style
* 增加eblite 加载时间
* fix bug
* for ci
* for ci
* for ci
* for ci
* 支持mlp block size 128
* 增加小算子单测
* fix 单测 mlp
* 将环境变量加入到config里面
* fix rollout config
2025-08-26 07:16:04 -07:00
RAM
f0a362af18
[CUDAGraph]Switch the scope so that output buffer of CUDAGraph can automatically release ( #3612 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* fix typo
* fix typo
* add print dot files
* fix bug
* Switch the scope so that output buffer of cudagraph can automatically release
* Revert "add print dot files"
This reverts commit dc21809eb5 .
2025-08-26 21:28:19 +08:00
gaoziyuan
82e64b13e1
[NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop ( #3598 )
...
* [Feature] update ep
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix queue ports idx
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* Update engine.py
* fix ci
* fix some bug in mixed ep
* add server fix and op fix
* rm some log
* fix code style
* ltd fix
* fix
* fix
* fix some bug
* fix bug
* fix bug
* fix style
* Update config.py
* Update splitwise_connector.py
* Update cache_messager.py
* Update __init__.py
* merge and fix
* Update engine.py
* Update common_engine.py
* Update run_ci_xpu.sh
* Update ernie_processor.py
* Update ernie_processor.py
---------
Co-authored-by: ltd0924 <ltd0924@sina.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
2025-08-26 19:59:02 +08:00
Yuanle Liu
cbce94a00e
rename ernie_xxx to ernie4_5_xxx ( #3621 )
...
* rename ernie_xxx to ernie4_5_xxx
* ci fix
2025-08-26 19:29:27 +08:00
YuanRisheng
642480f5f6
[CI] Standard unittest ( #3606 )
...
* standard unittest
* fix bugs
* fix script
2025-08-26 19:03:11 +08:00
SunLei
2f28f40d90
fix: replace list * n initialization with list comprehension to avoid shared references ( #3618 )
2025-08-26 17:53:31 +08:00
bukejiyu
3200a80de3
[v1 loader]support fp8 ( #3593 )
...
* support fp8
* update ci
2025-08-26 02:42:46 -07:00
RAM
00898603c8
[CUDAGraph]Add debug func ( #3616 )
...
* add print dot files
* refine code
2025-08-26 16:43:48 +08:00
xiaoxiaohehe001
9afa236e39
[NewFeatures] support eplb ( #3547 )
...
* [NewFeatures] support eplb
* fix eplb
2025-08-26 16:19:30 +08:00
Yuanle Liu
56e2d7e668
adaptive rms_norm's dtype ( #3617 )
...
* adaptive rms_norm's dtype
* adaptive rms_norm's dtype
* add approve coverage
---------
Co-authored-by: liuyuanle <liuyuanle@baidu.com >
2025-08-26 15:29:15 +08:00
lzy
d339df2e90
Supports DP+TP+EP hybrid parallel deployment strategy ( #3489 )
...
* Support DP+TP+EP hybrid parallel deployment strategy
* Support DP+TP+EP hybrid parallel deployment strategy
* fix conflict
* add moe_tp_ep function split_allgather_out
* del tp_group in moe_cutlass_backend
* for ci
* fix parallel_config for ci
* del log
2025-08-26 00:04:01 -07:00
freeliuzc
52eda7fdb3
[Feature][MTP]support new speculative decoding method named hybrid mtp with ngram ( #3610 )
2025-08-26 14:29:22 +08:00
AIbin
0a0d2959b9
qkv_a_proj horizontal fusion ( #3591 )
...
Support DSK qkv_a_proj horizontal fusion under V0 Loder
2025-08-26 14:25:57 +08:00
YuBaoku
75db0d1ae2
[CI] reopen sot test ( #3613 )
...
* [CI] change check_service time to 360s
* [CI] disable sot test temporarily
* [CI] reopen sot test
2025-08-26 14:23:38 +08:00
xiaoxiaohehe001
70c75798a7
[NewFeatures] support noex rope3d ( #3542 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [NewFeatures] support noex rope3d
* [NewFeatures] support noex rope3d encoder
2025-08-26 11:44:57 +08:00
tianlef
0bc7d076fc
[CE]add x1 w4a8c8 benchamrk config ( #3607 )
...
* [CE]add x1 w4a8c8 benchamrk config
* [CE]add x1 w4a8c8 benchamrk config
* [CE]add x1 w4a8c8 benchamrk config
2025-08-26 11:27:32 +08:00
Ryan
a5b4866ff1
[CudaGraph][SOT] Add unit tests for splitting the static graph into piecewise graphs that support cuda_graph ( #3590 )
...
* add unitest
* change sot_warmup_sizes
* wtf; add missed commit
2025-08-26 11:25:04 +08:00
Sunny-bot1
c68c3c4b8b
[Feature] bad words support v1 scheduler and specifiy token ids ( #3608 )
...
* support bad_words_token_ids
* docs
* fix test
* fix
* bad words support kvcache v1 and token ids
* fix
2025-08-25 20:14:51 -07:00
lizexu123
c43a4bec00
[Features] support hugging face qwen3 dense and qwen2 model ( #3574 )
...
* support qwen2 and qwen3 hugging face
* fix moe
* defualt_v1 loader
* hugging_face_format deprecated
* modify hugging_face_foramt to model_format
* model_format auto
* fix environemt
* fix bug
* fix qwen3-0.6 bug
* model_format is str
* fix
2025-08-26 10:54:53 +08:00
ltd0924
66c5addce4
[Bugfix] fix api server control signal bugs ( #3531 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* Update serving_chat.py
* Update serving_completion.py
* Update serving_completion.py
2025-08-25 21:13:04 +08:00
RAM
2fa173e327
[Executor] CUDAGraph support RL training ( #3265 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* add clear graph opt backend
* cuda graph support rl
* add branch
* 1.fix dynamic_weight_manager bug 2.add clear api for CasualLM
* open test case
* fix typo
* update mkdocs.yaml
* [Docs]Update mkdocs.yml
* update test case
* use unittest in graph test case
2025-08-25 20:59:30 +08:00
Kane2011
2ae7ab28d2
[MetaxGPU] adapt to the latest fastdeploy on metax gpu ( #3492 )
2025-08-25 17:44:20 +08:00
YuBaoku
c13c904971
[CI] temporarily disable sot test due to occasional timeout issue ( #3586 )
...
* [CI] change check_service time to 360s
* [CI] disable sot test temporarily
2025-08-25 14:34:27 +08:00
chen
9cab3f47ff
[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing ( #3552 )
...
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* delete some code
* code check
* code check and add doc
* fix tokenizer.decoder(-1), return 'Invalid Token'
* add ci for temp_scaled and top_p logprobs
* check test
* check seq len time shape
* logprob clip inf
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
2025-08-25 14:11:49 +08:00
YUNSHEN XIE
2410adb041
Add coverage skip ( #3553 )
...
* add coverage skip
* update
* fix
2025-08-25 14:08:24 +08:00
Yuan Xiaolan
9205c88da1
support w4afp8 EP inference ( #3044 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-25 11:27:45 +08:00
YUNSHEN XIE
46664985fc
Modify the existing coverage collection method ( #3573 )
...
fix cov report
2025-08-25 10:35:35 +08:00
YuBaoku
7821534ff5
[CI] add sot test ( #3579 )
...
* [CI] add sot test
* [CI] add sot test
2025-08-25 10:14:50 +08:00
lengxia
137e539456
[Feature][XPU] add custom kernels for mtp ( #3537 )
2025-08-25 10:14:17 +08:00
bukejiyu
bdbac0aa3d
support qwen2 weight only ( #3571 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-24 11:14:34 +08:00
bukejiyu
77514e3e1e
[V1 Loader] support weight_only ( #3413 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* support wint4/wint8
* delete smoe case
* update ci
* print log
2025-08-23 13:13:41 +08:00
Jiang-Jia-Jun
93e1b63200
Revert "[UnitTest][Copilot] Improve unit test coverage for entrypoints module…" ( #3564 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
This reverts commit 36325e9ea7 .
2025-08-23 10:44:23 +08:00
YuanRisheng
e481b7a779
fix sot ( #3556 )
2025-08-23 08:37:06 +08:00
Zero Rains
79f0dbbb55
[V1 Loader] Support qwen2(bf16) ( #3502 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen2(bf16)
* merge bias_loader and weight_loader
2025-08-23 01:08:23 +08:00
YUNSHEN XIE
cb166053ba
fix test name ( #3493 )
...
* fix test name
* update
* update
* fix
* fix
* update
* update
* update
* update
* update
* fix
* update
2025-08-22 23:43:47 +08:00
Copilot
36325e9ea7
[UnitTest][Copilot] Improve unit test coverage for entrypoints modules ( #3546 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* Initial plan
* Add comprehensive unit tests for entrypoints utilities
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Complete entrypoints test coverage improvement with tool parser tests
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Apply pre-commit formatting to test files - fix trailing whitespace and long lines
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-22 19:20:51 +08:00
zhink
df7c31012b
Modified to support custom all reduce by default ( #3538 )
2025-08-22 16:59:05 +08:00
lddfym
27666ee586
[Feature] Add Qwen25-VL Processor ( #3501 )
...
* add qwen-2.5-vl processor
* add qwen25-vl processor
* add qwen25-vl processor
* add qwen25-vl processor
* add qwen25-vl processor position_ids
* add qwen25-vl processor
* add qwen25-vl processor
* position_ids
* add test for qwen25-vl
* organize comments
* formatted
* qwen_vl_processor
* add qwen_vl_processor unittest
* update model path
* update model path
* update qwen_vl_processor unittest
* add unittest and bug fix
* add unittest and bug fix
* Update fastdeploy/input/qwen_mm_processor/image_processor.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/input/qwen_vl_processor.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-08-22 16:49:42 +08:00
YuanRisheng
5b66462f0e
Fix fdconfig bugs ( #3528 )
...
* fix config
* fix parallel
* fix ips
* fix rl
* open code
2025-08-22 16:17:15 +08:00
plusNew001
7ae41e9daf
[CI] fix xpu ci bug ( #3535 )
2025-08-22 15:08:39 +08:00
freeliuzc
76759108c9
[Feature][SpeculativeDecoding]Support tree-attention ( #3514 )
...
* support tree-attention
* fix merge bug
* fix unit-test api
* fix merge bug
2025-08-22 13:36:41 +08:00
YuBaoku
cc88671507
[CI] add container naming and cleanup logic in workflows ( #3526 )
2025-08-22 11:42:57 +08:00
YUNSHEN XIE
2630260616
disable stable test ( #3529 )
2025-08-22 11:38:18 +08:00
YuanRisheng
85fbf5455a
[V1 Loader]Ernie VL support loader v1 ( #3494 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* ernie vl support new loader
* add unittest
* fix test
2025-08-22 11:16:57 +08:00
Zhang Yulong
3cc182236a
update ci ( #3519 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-21 20:05:50 +08:00
YuanRisheng
c389a4013c
Unify server-side and model-side Config(Part-5) ( #3497 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* move config
* fix xpu
* fix
* fix vl
* fix vl
* fix unitest
* fix args
* add unitest
* fix test
2025-08-21 19:00:21 +08:00
yangjianfengo1
e5aa7087db
【bug fix】修复w4a8编译慢 ( #3510 )
...
* 修复w4a8编译
* code style
* 修复tma copy
2025-08-21 18:50:14 +08:00
Zhang Yulong
a5692e8b7d
Add PD CI case ( #3490 )
...
* Create test_ernie_03b_pd.py
* Update test_ernie_03b_pd.py
2025-08-21 18:48:34 +08:00
李泳桦
8bea4b1e25
[fix] fix output tokens count in streaming completion api ( #3507 )
2025-08-21 18:19:13 +08:00
李泳桦
e4f0b755b4
[fix] setting disable_chat_template while passing prompt_token_ids led to response error ( #3228 )
...
* [fix] setting disable_chat_template while passing prompt_token_ids led to response error
* [fix] code syntax
* [test] add test case for this bug
* [test] add test case for empty message list
* [test] fix test case for empty message list
2025-08-21 17:30:51 +08:00
luukunn
371fb3f853
[Feature] add tool parser ( #3483 )
...
* add tool parser
* add x1 enable_thinking
* restart ci
* fix vl reasoning parser
* modify call style
* modify call style
* add offline enablethinking
* fix completion
* fix
* fix unit test
* fix unit test
* fix unit test
* fix vl reasoning parser
* fix vl reasoning parser
2025-08-21 17:25:44 +08:00
Yzc216
466cbb5a99
[Feature] Models api ( #3073 )
...
* add v1/models interface related
* add model parameters
* default model verification
* unit test
* check model err_msg
* unit test
* type annotation
* model parameter in response
* modify document description
* modify document description
* unit test
* verification
* verification update
* model_name
* pre-commit
* update test case
* update test case
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/entrypoints/openai/test_serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/entrypoints/openai/serving_models.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-08-21 17:02:56 +08:00
Zhang Yulong
b7eee3aec1
Update CI ( #3474 )
...
* update CI cases
* update CI cases
* update CI cases
* update CI cases
* Merge upstream/develop and resolve directory rename conflict
* Merge upstream/develop and resolve directory rename conflict
* Merge upstream/develop and resolve directory rename conflict
* update deploy
* update deploy
* update deploy
* update deploy
* update deploy
2025-08-21 16:49:20 +08:00
qw86972190
c83381d650
revert pr ( #3481 )
...
Co-authored-by: iosmers <yinwei_hust@163.com >
2025-08-21 14:19:50 +08:00
ltd0924
51f68ae593
[Feature] add dealer manager to reuse the connection ( #3471 )
...
* [BugFix] fix control signal release failed
* [BugFix] fix control signal release failed
* update
* update
* update
* [Feature] add dealer manager to reuse the connection
* fix
* fix
* fix
* fix
* fix
* fix
* Create test_dealer_connection_manager.py
* Delete test/entrypoints/openai directory
* Update test_dealer_connection_manager.py
* Update test_dealer_connection_manager.py
2025-08-21 13:11:13 +08:00
YUNSHEN XIE
985b1265c3
CE 编译任务(合入触发) ( #3491 )
...
* add ce compile job
* fix
* update
2025-08-21 11:33:26 +08:00
memoryCoderC
31f639f10b
[Feature] add prompt_tokens and completion_tokens ( #3504 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-21 10:23:27 +08:00
Zero Rains
30b3f2dc07
[BugFix][V1 Loader] fix the bug in creat weight for block_wise_fp8 ( #3486 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-20 05:52:54 -07:00
Ryan
bcdfc1d6b9
Add custom op declaration for all_reduce ( #3473 )
...
* add custom op declaration
* roll back try except
2025-08-20 20:29:58 +08:00
Zhang Yulong
33ff0bfe38
Update disaggregated.md ( #3495 )
...
修复文档错误
2025-08-20 19:39:18 +08:00
YUNSHEN XIE
e197894977
add e2e cases ( #3476 )
...
* add e2e cases
* fix
2025-08-20 18:50:14 +08:00
Zhang Yulong
9ff2dfb162
Create eb45-8k-fp8-tp1-dp8_ep.yaml ( #3485 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
混合架构EP并行yaml
2025-08-20 14:33:54 +08:00
YuBaoku
33d369586b
[CI] remove useless case ( #3482 )
2025-08-20 14:20:30 +08:00
xiaolei373
5d131485d8
add error log to file ( #3431 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* feat(log):add_request_and_response_log
* feat[log]:add error log to file
2025-08-20 09:52:34 +08:00
YUNSHEN XIE
3a6058e445
Add stable ci ( #3460 )
...
* add stable ci
* fix
* update
* fix
* rename tests dir;fix stable ci bug
* add timeout limit
* update
2025-08-20 08:57:17 +08:00
kevin
67298cf4c0
add error traceback info ( #3419 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add error traceback info
* update error msg
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-19 19:32:04 +08:00
yangjianfengo1
b047681c5d
【New Feature】支持Fp8 group Gemm 24稀疏 ( #3463 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* 支持24稀疏
* code style
* 增加stmatrix 宏定义判断
* code style
2025-08-19 02:54:47 -07:00
ltd0924
d587fb257f
[CI] add test generation demo ( #3270 )
...
* Create test_generation.py
* update
* update
* format
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update test_generation.py
* Update setup.py
* Delete test/plugins/test_model_runner_register.py
---------
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-19 17:12:40 +08:00
Zero Rains
fef447e350
[V1 Loader] Support MOE parameters create and load for DeepGemm and marlin backend ( #3447 )
...
* support deepgemm backend
* support marlin backend
* remove print
* fix process_prequanted_weights
2025-08-19 14:15:53 +08:00
chen
6735626014
fix request_output sampling_params ( #3154 ) ( #3464 )
2025-08-19 13:52:50 +08:00
ltd0924
bca8905b40
[BugFix] fix control signal release failed ( #3390 )
...
* [BugFix] fix control signal release failed
* [BugFix] fix control signal release failed
* update
* update
* update
2025-08-19 13:51:38 +08:00
Zero Rains
8b12c80f90
[FixBug] compute early stopping with real batch size ( #3418 )
...
* [FixBug] compute early stopping with real batch size
* update
* fix test_sampler
2025-08-18 22:09:21 -07:00
luukunn
3a7a20d191
[Feature] Pass through the chat_template_kwargs to the data processing module ( #3421 )
...
* fix chat_template_args
* fix args
* add offline
* add offline
* fix
* fix
* fix default enable_thinking value
* fix default enable_thinking value
* modify condition
* Revert "modify condition"
This reverts commit 26430bdeb1 .
* fix unit test
2025-08-19 10:50:01 +08:00
lizexu123
a053ab889b
[BugFix] fix num_running_requests in cuda_graph ( #3457 )
...
* fix cuda_grpah
* add note
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-08-19 10:47:22 +08:00
AIbin
beec24fd89
【Inference Optimize】DeepSeek-v3 model inference performance optimization ( #3455 )
...
* DSK_OPT_01
* update FA3
2025-08-19 10:42:42 +08:00
zhuzixuan
c95b3395e9
【BugFix】completion接口echo回显支持 ( #3245 )
...
* wenxin-tools-511,修复v1/completion无法回显的问题。
* 支持多prompt的回显
* 支持多prompt情况下的流式回显
* 补充了 completion 接口支持 echo 的单元测试
* pre-commit
* 移除了多余的test文件
* 修复了completion接口echo支持的单测方法
* 补充了单元测试文件
* 补充单测
* unittest
* 补充单测
* 修复单测
* 删除不必要的assert.
* 重新提交
* 更新测试方法
* ut
* 验证是否是正确思路单测
* 验证是否是正确思路单测
* 验证是否是正确思路单测3
* 优化单测代码,有针对性地缩小单测范围。
* 优化单测代码2,有针对性地缩小单测范围。
* 优化单测代码3,有针对性地缩小单测范围。
* support 'echo' in chat/completion.
* update
* update
* update
* update
* update
* update
* 补充了关于tokenid的单元测试
* update
* 修正index错误
* 修正index错误
2025-08-19 10:41:51 +08:00
lizexu123
32b39620bc
[Code Simplification] remove cum_offsets ( #3410 )
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-18 20:21:25 +08:00
YUNSHEN XIE
2cf96ddd68
add publish workflow ( #3063 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add publish job
* update
* update
2025-08-18 16:42:36 +08:00
luukunn
9c129813f9
[Feature] add custom chat template ( #3251 )
...
* add custom chat_template
* add custom chat_template
* add unittest
* fix
* add docs
* fix comment
* add offline chat
* fix unit test
* fix unit test
* fix
* fix pre commit
* fix unit test
* add unit test
* add unit test
* add unit test
* fix pre_commit
* fix enable_thinking
* fix pre commit
* fix pre commit
* fix unit test
* add requirements
2025-08-18 16:34:08 +08:00
Jundong Liu
70ee910cd5
[Excutor] Change cudagraph hashkey from batch size to num_tokens ( #3454 )
2025-08-18 16:16:48 +08:00
Jundong Liu
ea4a3b479c
[Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool ( #3404 )
...
* 修复buffer申请不够大,增加打印forwardmetadata的工具
* fix mistake
* Make CPU tensor in CPUPlace
* Add test about forward_meta_str and Add unitest_requirement
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-08-18 16:14:09 +08:00
chen
5585cf7aa5
fix mtp_rej_topp input ( #3450 )
2025-08-18 16:12:42 +08:00
Divano
246cd7b3a5
Perf ( #3453 )
...
* add repitation early stop cases
* add repitation early stop cases
* add stress tool
2025-08-18 15:37:46 +08:00
gaoziyuan
6fdd83da10
fix some bug ( #3434 )
2025-08-18 14:39:13 +08:00
freeliuzc
a12d0bc549
[Feature][MTP]update multi-draft-token strategy ( #3369 )
...
* update multi-draft-token strategy
* fix format
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-08-18 13:59:56 +08:00
Zhang Yulong
3ee6053e5d
Add ci case ( #3355 )
...
* add ci cases
* debug
debug H20 baseline
* Update run_pre_ce.sh
* Update test_EB_Lite_serving.py
* Update test_EB_VL_Lite_serving.py
* Update test_EB_Lite_serving_mtp.py
* Update test_Qwen3-MoE_serving.py
* Update test_Qwen2-7B-Instruct_serving.py
* Update run_pre_ce.sh
2025-08-18 11:35:56 +08:00
chen
e88f5552db
fix cpu __ini__.py ( #3448 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-17 12:38:54 +08:00
RAM
33c0197ebe
[Docs] Update mkdocs.yml ( #3444 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Updata docs of graph opt backend
* update best_practices
* update mkdocs.yaml
* [Docs]Update mkdocs.yml
2025-08-15 21:57:40 +08:00
RAM
154308102e
[Docs]Updata docs of graph opt backend ( #3442 )
...
* Updata docs of graph opt backend
* update best_practices
2025-08-15 21:30:32 +08:00
yongqiangma
5703d7aa0f
update installation readme ( #3429 )
2025-08-15 19:09:41 +08:00
yangjianfengo1
615930bc05
Update README ( #3426 )
...
* 修改READMe
* code style
* code style
2025-08-15 18:46:28 +08:00
JYChen
6f11171478
fix some docs error ( #3439 )
2025-08-15 18:45:27 +08:00
yinwei
354575b6d1
[Docs]Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95 ( #3428 )
...
* XPU Update 2.1 Release Documentation
* code style check
* Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95
2025-08-15 18:34:37 +08:00
YUNSHEN XIE
cc8ee50f27
add accuracy check ci ( #3389 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add accuracy ci
* fix
* fix
* update
* rename ci jobs
2025-08-15 15:17:43 +08:00
GoldPancake
4bd6a9fa7d
[Bugs] Fix DeepGEMM pre-compile tools. ( #3351 )
...
Fix some miss cache problems.
Add README.md.
2025-08-15 14:37:49 +08:00
ming1753
d4e3a20300
[Docs] Release 2.1 docs and fix some description ( #3424 )
2025-08-15 14:27:19 +08:00
yinwei
fbb6dcb9e4
[Docs]XPU Update 2.1 Release Documentation ( #3423 )
...
* XPU Update 2.1 Release Documentation
* code style check
2025-08-15 14:07:47 +08:00
JYChen
562e01c979
update docs ( #3420 )
2025-08-15 13:00:08 +08:00
Jiang-Jia-Jun
cca96ab1e4
Update Dockerfile.gpu
2025-08-15 12:29:20 +08:00
Jiang-Jia-Jun
7132fa9ec2
Update dockerfile
2025-08-15 12:28:08 +08:00
Sunny-bot1
6c1f3ff897
topk_gating_softmax support bias ( #3405 )
2025-08-15 11:57:45 +08:00
ltd0924
5a84324798
[Doc] Add multinode deployment documents ( #3417 )
...
* Create multi-node_deployment.md
* Create multi-node_deployment.md
* Update mkdocs.yml
2025-08-15 10:37:04 +08:00
chen
f0f00a6025
[OPs] Universal optimization and Fix early_stop cuda 700 ( #3375 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* delete nonzero
* delete setup_ops_base.py
* check if
* check gcp infer_seed.cpu()
* fix repetition_early_stopper_kernel cuda 700
2025-08-14 22:40:44 +08:00
YuanRisheng
09c979f3dd
[V1 Loader] Support Ernie text(moe and dense) ( #3110 )
...
* new loader support 0.3B
* fix weight
* support parallel load
* support parallel load
* fix slice
* support moe
* delete code
* perfect code
* perfect code
2025-08-14 20:25:28 +08:00
xjkmfa
ab60292f89
【CI】 evil case ( #3359 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
* 边缘检测 ,攻击性测试
* 边缘检测 ,攻击性测试
* 边缘检测 ,攻击性测试
* 边缘检测 ,攻击性测试
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2025-08-14 20:00:47 +08:00
freeliuzc
cacc52bf21
modify readme ( #3409 )
2025-08-14 19:47:36 +08:00
Sunny-bot1
79d8ae4c38
[UT Fix] Fix bad_words test ( #3385 )
...
* fix bad_words test
* add streaming
* fix
* fix
2025-08-14 03:55:02 -07:00
lzy
1e06b9fa6d
make append_attn supports mask_offset ( #3138 )
...
* make append_attn supports mask_offset
* add unittest
2025-08-14 03:40:55 -07:00
memoryCoderC
6031f9a5f5
[BugFix] fix ErnieProcessor not set raw_prediction ( #3400 )
2025-08-14 18:07:49 +08:00
YUNSHEN XIE
f72db9386c
Add requirements for running unit tests ( #3350 )
...
* Add requirements for running unit tests
* update
2025-08-14 17:37:18 +08:00
lizexu123
7b596d0877
[BugFix] fix real_bsz in ep ( #3366 )
...
* Your commit message here
* fix ep
* delete cuda_graph
2025-08-14 17:31:19 +08:00
gaoziyuan
0ea8712018
fix op tests ( #3398 )
2025-08-14 16:45:25 +08:00
Sunny-bot1
2e7831185f
[Optimize]Add norm_weights feature for topk_gating_softmax ( #3372 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-14 15:05:23 +08:00
Jiang-Jia-Jun
666ab65a51
[Polish Code] Remove useless notes
2025-08-14 14:04:52 +08:00
Jiang-Jia-Jun
dd583fb16a
[BugFix] Fix default log level of paddleformers ( #3376 )
...
* [BugFix] Fix default log level of paddleformers
* [BugFix] Fix default log level of paddleformers
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-08-14 11:36:24 +08:00
xiaolei373
d4f610e4cd
feat(log):add_request_and_response_log ( #3373 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-13 23:27:41 +08:00
ming1753
396dba0d62
[Bug Fix] Fix V1 video bug ( #3388 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-13 23:04:07 +08:00
YUNSHEN XIE
1ace375fc3
Optimize CI execution workflow ( #3371 )
...
* Optimize CI execution workflow
* fix
2025-08-13 18:47:31 +08:00
Zero Rains
be94bdd0b0
[Loader V1] modify layername for DeepSeekV3 ( #3336 )
...
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-13 15:47:06 +08:00
memoryCoderC
f702a675a1
fix TestOpenAIServingCompletion fail ( #3368 )
2025-08-13 15:45:07 +08:00
EnflameGCU
d1a92e3e17
[GCU] Enable gcu CI ( #3190 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [GCU] Update to the latest version
* [GCU] Enable CI
2025-08-13 11:48:24 +08:00
yzwu
ce9180241e
[Iluvatar GPU] Modify the names of some variables ( #3273 )
2025-08-13 11:38:02 +08:00
Kane2011
b4fef2cf29
[MetaxGPU] Support FastDeploy on metax gpu ( #3241 )
...
* [MetaxGPU] Support FastDeploy on metax gpu
* Update metax_worker.py
1. change worker log;
2. remove custom allreduce, adapt it later;
3. remove cuda graph;
* Update __init__.py
1. remove metax's key work comment
* Update __init__.py
1. remove metax's key word comment;
2. add fused_moe_kernel_paddle import
---------
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-08-13 11:11:54 +08:00
Ryan
ed6bff215a
fix custom op order rms_norm_eps ( #3348 )
2025-08-13 10:12:49 +08:00
Sunny-bot1
8224b21525
Refactor moe_topk_select op to use apply_norm_weight as a template parameter ( #3345 )
...
* Refactor moe_topk_select op to use apply_norm_weight as a template parameter
* update test
2025-08-13 08:44:16 +08:00
luukunn
eda83ca672
add Tool Parser ( #3272 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add tool-parser
* add tool-parser
* add tool parser
* add tool parser
* fix
* add offline
* add offline
* fix
* parsers:tool&reasoning
* 修改tool parser名称·
* update
* fix reasoning-parser
* add requirements
* fix finish reason
* fix
* fix reasoning-parser
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: zhuzixuan <zhuzixuan@baidu.com >
2025-08-13 01:06:55 +08:00
memoryCoderC
2d1a4cacdf
Completion add raw_prediction/text_after_process ( #3356 )
2025-08-12 23:06:45 +08:00
zhink
2c0d853067
add test for CustomAllreduce ( #3313 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-12 20:44:47 +08:00
YUNSHEN XIE
8791ad4e61
Pre ce modified ( #3335 )
...
* update
* update
* fix
* fix
* update
* update
* update
* fix
* update
2025-08-12 20:25:03 +08:00
memoryCoderC
c575611a5b
[BugFix] v1/completions add finish_reason ( #3246 )
...
* [BugFix] v1/completions add finish_reason
* update TestOpenAIServingCompletion for merge
---------
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-12 19:40:26 +08:00
Jiang-Jia-Jun
90bfa0be9c
Update envs.py
2025-08-12 16:24:47 +08:00
Jiang-Jia-Jun
5620bd12de
Update envs.py
2025-08-12 16:24:33 +08:00
YUNSHEN XIE
7d0d5a543a
Use latest PaddlePaddle package ( #3347 )
...
* Use latest PaddlePaddle package
* fix
2025-08-12 16:23:41 +08:00
gaoziyuan
ccc7f1beb3
fix mapping ( #3320 )
2025-08-12 16:15:59 +08:00
RichardWooSJTU
283da92bfa
fix ep lm head ( #3244 )
...
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-08-12 15:38:28 +08:00
ming1753
f5164215be
[Bug Fix] fix vl V1 schedule bug ( #3323 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug Fix] fix vl V1 schedule bug
* fix format
2025-08-12 11:31:39 +08:00
yangjianfengo1
b808c49585
[Doc] 增加中英文切换 ( #3318 )
...
* 增加中英文切换
* 增加中英文切换
* 修改readme
2025-08-12 11:20:45 +08:00
chenjian
b21272d9ff
[Bug fix] fix block num setting in scheduler v1 for develop ( #3303 )
...
* fix block num setting in scheduler v1
* fix block num setting in scheduler v1
* fix max_block_num and max_num_batched_tokens setting
* fix max_block_num and max_num_batched_tokens setting
* fix max_block_num and max_num_batched_tokens setting
* fix max_block_num and max_num_batched_tokens setting
2025-08-12 10:38:51 +08:00
Jiang-Jia-Jun
183e3863e8
Remove useless code ( #3337 )
2025-08-12 10:32:31 +08:00
Sunny-bot1
19fda4e912
fix docs ( #3332 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-11 21:03:49 +08:00
JYChen
973ddad91e
fix unittest ( #3328 )
2025-08-11 20:58:24 +08:00
Divano
f27e879785
Update _base_test.yml ( #3331 )
2025-08-11 20:57:20 +08:00
Sunny-bot1
789dc67ff7
[Docs]fix sampling docs ( #3113 )
...
* fix sampling docs
* fix sampling docs
* update
2025-08-11 20:42:27 +08:00
Divano
8bf96217b4
Update test_evil_cases.py
2025-08-11 20:27:02 +08:00
YUNSHEN XIE
770b0aa3c5
fix ci pypi index error ( #3326 )
2025-08-11 20:21:08 +08:00
kevin
9627619235
fix uvicorn multi worker error ( #3300 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-11 19:39:41 +08:00
Zero Rains
b23af29d0b
Launch expert_service before kv_cache initialization in worker_process ( #3045 )
...
* launch expert_service before kv_cache initialization
* add two signal make sure model loading and expert_service lauching finished
* fix the EP bug
* fix ep
* update launching way
* fix ep
* update
* roback ep
* pre-commit all files
---------
Co-authored-by: RAM <gstian5555@outlook.com >
Co-authored-by: Divano <dddivano@outlook.com >
2025-08-11 19:38:46 +08:00
Zhang Yulong
c27a3dc43b
Update deploy.py ( #3310 )
...
* Update deploy.py
更新部署工具
* Update deploy.py
2025-08-11 19:11:57 +08:00
Jiang-Jia-Jun
c56c99837a
Revert "[BugFix] num_seqs ( #3291 )" ( #3316 )
...
This reverts commit e0aeac58e1 .
2025-08-11 16:16:51 +08:00
Yuanle Liu
9571c458f0
enhance eos_tokens ( #3274 )
...
* enhance eos_tokens
* update
* update
2025-08-11 14:47:52 +08:00
Divano
21caa63794
update base test ( #3304 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* update base test
额外启动一次服务测试repetition stop
* Update _base_test.yml
2025-08-11 14:15:45 +08:00
Zero Rains
42af0b4b64
[V1 Loader] Support DeepSeekV3(bf16) ( #3294 )
...
* Support new loader for DeepSeekV3(bf16)
* update paddle version
* remove useless attr
2025-08-11 13:39:28 +08:00
lizexu123
e0aeac58e1
[BugFix] num_seqs ( #3291 )
...
* fix num_seqs
* merge develop
2025-08-11 13:38:55 +08:00
chenjian
b88537a456
fix bug for scheduler v0 ( #3308 )
2025-08-11 13:07:04 +08:00
xjkmfa
71018fb62e
【CI case】include total_tokens in the last packet of completion interface stream output ( #3279 )
...
* Add ci case for min token and max token
* 【CI case】include total_tokens in the last packet of completion interface stream output
---------
Co-authored-by: xujing43 <xujing43@baidu.com >
2025-08-11 10:59:47 +08:00
Divano
0b77d396ad
Acc ( #3301 )
...
* add repitation early stop cases
* add repitation early stop cases
* add accuracy cases
2025-08-11 10:22:06 +08:00
Divano
79868be220
Update _base_test.yml ( #3299 )
...
add more cases
2025-08-11 10:03:27 +08:00
chen
46c8491201
merge logprob into batch_output ( #3266 )
2025-08-11 10:03:00 +08:00
Divano
566badb83c
Update _base_test.yml ( #3298 )
2025-08-11 09:40:14 +08:00
Divano
eaae4a580d
Split cases ( #3297 )
...
* add repitation early stop cases
* add repitation early stop cases
* split repetition_early_stop from the base test
2025-08-11 09:38:35 +08:00
chenjian
c011cb8b16
[Bug Fix] Fix scheduler bug in develop ( #3292 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fix scheduler bug in develop
* Fix scheduler bug in develop
* Fix scheduler bug in develop
2025-08-10 13:55:38 +08:00
Jundong Liu
1e4968e810
[Excutor] Fixed the issue of CUDA graph execution failure caused by different branches during decoding ( #3223 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* 彻底解决解码切块问题
* update C8 and C4 kernel
* fix problem
* fix with pre-commit
* retain branch for mtp
2025-08-09 07:37:19 +08:00
ltd0924
31d4fcb425
[BugFix] fix too many open files problem ( #3256 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Update cache_messager.py
* fix too many open files problem
* fix too many open files problem
* fix too many open files problem
* fix ci bugs
* Update api_server.py
* add parameter
* format
* format
* format
* format
* Update parameters.md
* Update parameters.md
* Update serving_completion.py
* Update serving_chat.py
* Update envs.py
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-08 20:10:11 +08:00
YUNSHEN XIE
22255a65aa
add base test ci ( #3225 )
2025-08-08 19:08:55 +08:00
gaoziyuan
a799d14df1
[Bugfix] Fix model accuracy in some ops ( #3231 )
...
* fix noaux_tc op
* fix
* update
* fix qk norm
* fix linear for prequant loader
* test
* fix
* fix
* rm some print
* fix noaux_tc op
* test
* Fix the confused enable_early_stop when only set early_stop_config (#3214 )
* fix the confused early_stop_config when only set early_stop_config
* pre-commit
* write a general method
* Add ci case for min token and max token (#3229 )
Co-authored-by: xujing43 <xujing43@baidu.com >
* add some evil cases (#3240 )
* add repitation early stop cases
* add repitation early stop cases
* add bad cases
* add bad cases
* add evil cases
* qwen3_moe (#3084 )
* [Feature] support seed parameter (#3161 )
* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix
* 【Fix Bug】 修复 fa3 支持集中式bug (#3235 )
* fix fa3 集中式bug
* 增加qknorm参数
* fix qk norm
* fix
* update
* fix linear for prequant loader
* fix
* fix
* rm some print
* fix
* fix moe init weight&scale
* fix moe init weight&scale
---------
Co-authored-by: bukejiyu <395822456@qq.com >
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
Co-authored-by: Zero Rains <linjunlu@zerorains.top >
Co-authored-by: xjkmfa <108254620+xjkmfa@users.noreply.github.com >
Co-authored-by: xujing43 <xujing43@baidu.com >
Co-authored-by: Divano <dddivano@outlook.com >
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com >
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com >
Co-authored-by: yangjianfengo1 <125249383+yangjianfengo1@users.noreply.github.com >
Co-authored-by: qingqing01 <dangqingqing@baidu.com >
2025-08-08 17:30:37 +08:00
Zero Rains
ce1f353c70
Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend ( #3148 )
...
* w4a8 bug
* fix w4a8 bug
* remove code
* modify the triton backend
* fix ep
* fix the bug with tensor_wise_fp8 in triton backend
* fix the RL
* fix bug by merge
* fix the bug in w4a8
* fix the tensor_wise_fp8 bug
* fix RL
2025-08-08 15:55:47 +08:00
plusNew001
d0e9a70380
[CI] add CI logprobs case ( #3189 )
...
* [ci] add CI case
* [ci] add CI case
* [ci] add CI case
* [ci] add CI case
---------
Co-authored-by: ZhangYulongg <1272816783@qq.com >
2025-08-08 15:47:55 +08:00
freeliuzc
71267840f7
【Fix】fix mtp bug ( #3139 )
2025-08-08 13:30:12 +08:00
bukejiyu
b76b17fc1b
qwen3 0.3B fix ( #3255 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-08 11:35:40 +08:00
Yuanle Liu
fac2f64837
delete parallel_state.py ( #3250 )
2025-08-08 11:03:29 +08:00
yzwu
fbdd6b0663
[Iluvatar GPU] Optimze attention and moe performance ( #3234 )
2025-08-08 10:51:24 +08:00
bukejiyu
37569cca86
[feat]add fast_weights_iterator ( #3258 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add fast_weights_iterator
* update
* update
2025-08-07 22:36:46 +08:00
chenjian
5f0b30f6d0
support logprob in scheduler v1 ( #3249 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-07 20:14:01 +08:00
Yzc216
6037dd5d9c
[fix] multi source download ( #3259 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
* modify model download path
* add requirements
* error optimization
* 连接失败兜底
* 连接失败兜底
* 连接失败兜底
* unit test
* unit test
* unit test
* test
* test
* 兜底修改
* Trigger CI
2025-08-07 19:30:39 +08:00
JYChen
9423c577fe
[stop_seq] fix out-bound value for stop sequence ( #3216 )
...
* fix out-bound value for stop sequence
* catch error if there are out-of-bounds value
* check in offline mode
* add ut tests
2025-08-07 15:40:21 +08:00
Divano
5885285e57
Ce add benchmark test ( #3262 )
...
* add repitation early stop cases
* add repitation early stop cases
* add bad cases
* add bad cases
* add evil cases
* add benchmark gsm8k
2025-08-07 15:28:30 +08:00
YuBaoku
55ac449c31
[CI] remove useless case ( #3261 )
2025-08-07 15:09:40 +08:00
RAM
820798aec5
[Executor]Update graph test case and delete test_attention ( #3257 )
...
* 1.update graph test case 2.delete test_attention
* code style
* delete print
2025-08-07 14:05:15 +08:00
YuanRisheng
0074b423a9
fix ci bug ( #3239 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-07 11:32:39 +08:00
hong19860320
93a1731891
[Doc] Update deps and fix dead links ( #3252 )
2025-08-07 11:04:31 +08:00
李泳桦
09cc4e2802
[fix] fix completion stream api output_tokens not in usage ( #3247 )
2025-08-07 10:36:00 +08:00
Yzc216
d9e3f88f9e
[Feature] multi source download ( #3125 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
* modify model download path
* add requirements
* error optimization
* 连接失败兜底
* 连接失败兜底
* 连接失败兜底
* unit test
* unit test
* unit test
* test
* test
2025-08-07 00:40:27 +08:00
bukejiyu
9408e667a5
[bugfix]fix blockwisefp8 and all_reduce ( #3243 )
...
* fix
* update
* fix linear for prequant loader
2025-08-06 23:54:33 +08:00
yangjianfengo1
3a15e0c53e
【Fix Bug】 修复 fa3 支持集中式bug ( #3235 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix fa3 集中式bug
* 增加qknorm参数
2025-08-06 16:24:27 +08:00
lizexu123
afff4d37ea
[Feature] support seed parameter ( #3161 )
...
* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix
2025-08-06 15:20:47 +08:00
bukejiyu
20839abccf
qwen3_moe ( #3084 )
2025-08-06 14:45:27 +08:00
Divano
91dc87f1c5
add some evil cases ( #3240 )
...
* add repitation early stop cases
* add repitation early stop cases
* add bad cases
* add bad cases
* add evil cases
2025-08-06 14:23:55 +08:00
xjkmfa
256a82b0b3
Add ci case for min token and max token ( #3229 )
...
Co-authored-by: xujing43 <xujing43@baidu.com >
2025-08-06 14:10:57 +08:00
Zero Rains
36dc73470d
Fix the confused enable_early_stop when only set early_stop_config ( #3214 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix the confused early_stop_config when only set early_stop_config
* pre-commit
* write a general method
2025-08-06 11:42:27 +08:00
YuanRisheng
a6e8b780f8
fix approve ( #3224 )
2025-08-06 10:36:01 +08:00
yangjianfengo1
89397516a8
[New Feature] Support W4Afp8 MoE GroupGemm ( #3171 )
...
* init
* 增加多线程编译
* fix bug
* fix bug
* code style
* 增加fp16
* 将print替换成assert
* 修复stmatrix
* 减小单测shape
* 减小单测shape
2025-08-06 10:34:05 +08:00
sg263
841e831575
[Trace]add trace when fd start ( #3174 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* fix annotation
* fix annotation when add opentelemetry
* fix opentelemetry-instrumentation-fastapi
* fix pentelemetry-bootstrap
* fix opentelemetry can not work in uvicorn
* move conf to env
* fd start add trace
* fix pre-commit
* fix pre-commit
* change FD_JOB_ID
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: shige <shige@baidu.com >
2025-08-05 21:18:27 +08:00
YUNSHEN XIE
e0bbd3b6ca
fix approve ci ( #3212 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-05 17:21:26 +08:00
Yuan Xiaolan
7ce00e597c
support qk norm ( #3145 )
2025-08-05 16:46:14 +08:00
RAM
4a10e29804
fix mla attention backend ( #3176 )
2025-08-05 16:43:15 +08:00
Yuan Xiaolan
af543b7f0f
revise get_moe_scores ( #3164 )
2025-08-05 16:43:07 +08:00
Divano
e24929efa3
Ce add bad cases ( #3215 )
...
* add repitation early stop cases
* add repitation early stop cases
* add bad cases
* add bad cases
2025-08-05 16:37:28 +08:00
lizexu123
b01cfd6007
[BugFix] support real batch_size ( #3109 )
...
* support real bsz
* fix
* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py
* add event_loop_ep
* fix
* Add comments
* fix
* support mtp real_batch_size
* fix
* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer
* fix
* fix VL real_seq_lens_this_time
* fix
* fix mtp
* fix
* fix mtp
* fix xpu
* fix
2025-08-05 16:33:54 +08:00
Jiang-Jia-Jun
55939f7942
Update engine.py
2025-08-05 16:10:36 +08:00
chen
04fc7eb931
fix test_air_top_p_sampling name ( #3211 )
2025-08-05 15:47:50 +08:00
Divano
9f1936ae28
Ce add repitation early stop cases ( #3213 )
...
* add repitation early stop cases
* add repitation early stop cases
2025-08-05 15:47:28 +08:00
RichardWooSJTU
1e9a8e8cef
fix lm head bias ( #3185 )
...
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-08-05 15:40:24 +08:00
RichardWooSJTU
f5c64a074c
[EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization ( #3182 )
...
* Add support for mixed-ep across multi nodes
* code refine
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-08-05 15:40:11 +08:00
ming1753
14ed75f7d3
[Test] scaled_gemm_f8_i4_f16 skip test while sm != 89 ( #3210 )
2025-08-05 15:25:28 +08:00
yangjianfengo1
40f7f3e0d8
[New Feature] fa3 支持flash mask ( #3184 )
...
* 支持flash mask
* 修改test_flash_mask
* 修改test.sh
2025-08-05 12:20:48 +08:00
YUNSHEN XIE
b8f3c73aac
fix coverage report ( #3198 )
...
* fix coverage report
* fix
2025-08-05 11:24:55 +08:00
Divano
fb7a0689cc
add more cases ( #3207 )
2025-08-05 11:17:36 +08:00
RAM
c593e1a39c
[Bug Fix]Fix bug of append attention test case ( #3202 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-05 11:04:45 +08:00
RichardWooSJTU
e39159f3bd
Add switch to apply fine-grained per token quant fp8 ( #3192 )
...
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-08-04 19:54:03 -07:00
Divano
88596c0c63
Add more base chat cases ( #3203 )
...
* add test base class
* fix codestyle
* fix codestyle
* add base chat
2025-08-05 10:24:12 +08:00
lizhenyun01
fe540f6caa
[plugin] Custom model_runner/model support ( #3186 )
...
* support custom model&&model_runner
* fix merge
* add test && update doc
* fix codestyle
* fix unittest
* load model in rl
2025-08-04 18:52:39 -07:00
Sunny-bot1
72ef5a9c93
[FIX]fix bad_words when sending requests consecutively ( #3197 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix bad_words
* fix log
* fix log
2025-08-04 05:59:41 -07:00
Yuan Xiaolan
1f8289e106
fix expertwise_scale ( #3181 )
2025-08-04 20:06:15 +08:00
YuBaoku
3eb9a5df60
[CI] add test_compare_top_logprobs ( #3191 )
2025-08-04 19:49:24 +08:00
SunLei
68bc1d12c0
[Bugfix] Fix uninitialized decoded_token and add corresponding unit test. ( #3195 )
2025-08-04 19:23:58 +08:00
Longzhi Wang
01d7586661
[Bug fix] Fix cudagraph when use ep. ( #3130 )
...
* fix cudagraph when use ep
* fix typo
* reduce full length to adapt large bsz such 128/256
2025-08-04 18:06:18 +08:00
周周周
2bd8a50649
remove useless code ( #3166 )
2025-08-04 18:03:08 +08:00
gaoziyuan
0443587a57
【Feature】support qwen3 name_mapping ( #3179 )
...
* add fd plugins && rm model_classed
* fix reviews
* add docs
* fix
* fix unitest ci
* support qwen3 name_mapping
2025-08-04 01:34:07 -07:00
Zero Rains
17f51f0c92
[unitest] fix the bug in test_sampler ( #3157 )
2025-08-04 01:23:25 -07:00
YuanRisheng
79bbacc152
Fix approve shell scripts ( #3108 )
...
* fix approve
* fix
2025-08-04 15:51:33 +08:00
Divano
3bfb2eca92
Update test_base_chat.py ( #3183 )
2025-08-04 15:09:53 +08:00
ltd0924
c9e6ce1518
Update cache_messager.py ( #3172 )
2025-08-04 14:32:34 +08:00
gaoziyuan
4021d66ea5
【Feature】add fd plugins && rm model_classes ( #3123 )
...
* add fd plugins && rm model_classed
* fix reviews
* add docs
* fix
* fix unitest ci
2025-08-03 19:53:20 -07:00
bukejiyu
1582814905
fix load_pre_sharded_checkpoint ( #3152 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-04 10:44:20 +08:00
Divano
66d3bb89ad
Update __init__.py ( #3163 )
...
升级测试基类兼容性
2025-08-04 09:40:09 +08:00
AIbin
22fe695f1c
【Inference Optimize】Support automatic generation of marlin kernel ( #3149 )
...
* Support automatic generation of marlin kernel
2025-08-01 22:43:18 +08:00
ApplEOFDiscord
b71cbb466d
[Feature] remove dependency on enable_mm and refine multimodal's code ( #3014 )
...
* remove dependency on enable_mm
* fix codestyle check error
* fix codestyle check error
* update docs
* resolve conflicts on model config
* fix unit test error
* fix code style check error
---------
Co-authored-by: shige <1021937542@qq.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-01 20:01:18 +08:00
plusNew001
243394044d
[XPU]Updata XPU dockerfiles ( #3144 )
...
* [CI] add xpu ci case
* [CI]Update run_ci_xpu.sh
* [XPU]Update Dockerfile.xpu
* Update Dockerfile.xpu
2025-08-01 19:41:59 +08:00
Zhang Yulong
0eb32bb9c8
add cases ( #3155 )
2025-08-01 18:38:57 +08:00
yangjianfengo1
64d7a3194d
集中式支持fa3 ( #3112 )
2025-08-01 18:03:36 +08:00
YUNSHEN XIE
bdb83e007d
fix ci ( #3141 )
2025-08-01 17:42:26 +08:00
Divano
50db0d7ba9
add case ( #3150 )
...
* add test base class
* fix codestyle
* fix codestyle
* add base chat
2025-08-01 17:30:58 +08:00
Ryan
94264bbf60
[Code Simplification] Refactor Post-processing in VL Model Forward Method ( #2937 )
...
* rm sth useless
* refactor model forward
* mv bool index to kernel
2025-08-01 17:28:07 +08:00
yinwei
3a4db15765
Fix out-of-memory issue during single-XPU deployment ( #3133 )
2025-08-01 17:12:03 +08:00
JYChen
c34088b0fd
fix stop seq unittest ( #3126 )
2025-08-01 16:50:05 +08:00
ming1753
fc5f43c6bc
[Docs] Optimal Deployment ( #2768 )
2025-08-01 11:56:27 +08:00
chen
a2f5cc54f8
moe preprocess op support 160 experts and fused_moe triton kernel name add K ( #3121 )
2025-08-01 10:46:20 +08:00
Divano
1d93565082
[CE] Add base test class for web server testing ( #3120 )
...
* add test base class
* fix codestyle
* fix codestyle
2025-07-31 23:28:50 +08:00
YUNSHEN XIE
e1011e92d9
disable test_cuda_graph.py ( #3124 )
2025-07-31 22:03:48 +08:00
plusNew001
8c63237cfa
[CI] add xpu ci case ( #3111 )
...
* [CI] add xpu ci case
* [CI]Update run_ci_xpu.sh
2025-07-31 22:03:34 +08:00
YUNSHEN XIE
ff6a109b4d
Describe PR diff coverage using JSON file ( #3114 )
...
* Refactored ci pipeline
* update
* Describe PR diff coverage using JSON file
* remove pip cache setting from Approve
* fix
* update
2025-07-31 21:59:20 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
chenjian
fe17410f9c
[BUG] Fix bug for pd in fd ( #3034 )
...
* Fix bug for pd in fd
* Fix bug for pd in fd
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:17:27 +08:00
Zhang Yulong
1a543bca29
Fix test_EB_Lite_serving.py ( #3119 )
...
* Fix test_EB_Lite_serving.py
* fix test_EB_Lite_serving.py
2025-07-31 20:15:25 +08:00
Yuan Xiaolan
5f56d289a7
fix is_permuted ( #3098 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:58:05 +08:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
kevin
22cab724e8
[Feature] block scheduler v1 support prefix caching ( #3061 )
...
* block scheduler v1 support prefix cache
* update code
* update code
* fix code bug
* add timeout time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:29:19 +08:00
chenjian
32307283f1
Fix bug for offline inference in scheduler v1 ( #3117 )
2025-07-31 17:54:24 +08:00
YUNSHEN XIE
583eae2fd1
fix ci ( #3106 )
...
* fix ci
* disable test_non_streaming_chat_with_min_tokens
2025-07-31 17:25:08 +08:00
JYChen
1ef38b1563
[doc] best practice for eb45 text models ( #3002 )
...
* [doc] best practice for eb45 text models
* fix docs
2025-07-31 17:21:55 +08:00
Jiang-Jia-Jun
4498058722
Update README.md
2025-07-31 15:33:12 +08:00
Jiang-Jia-Jun
66304cf921
Update sampling.md
2025-07-31 15:02:57 +08:00
yinwei
5b9aec1f10
xpu release 2.0.3 ( #3105 )
2025-07-31 14:26:07 +08:00
YUNSHEN XIE
66c3835a46
add approve ci ( #3093 )
...
* add approve ci
* fix
* fix
2025-07-31 10:10:10 +08:00
RAM
d850660872
[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )
...
* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug
2025-07-31 00:09:31 +08:00
Jiang-Jia-Jun
998968f1e8
[Doc] Update parameters of serving
2025-07-30 22:35:01 +08:00
chenjian
fe0e3f508b
[BUG FIX] Fix bug when preempted request rescheduled ( #3080 )
...
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
2025-07-30 22:25:47 +08:00
Jiang-Jia-Jun
0616c208d2
[Feature] Support include_stop_str_in_output in completion api ( #3096 )
...
* [Feature] Support include_stop_str_in_output in completion api
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-30 22:18:48 +08:00
YuanRisheng
7dfdd157ac
[BugFix]Fix ep size ( #3092 )
...
* fix ep
* fix num_layer
2025-07-30 21:03:12 +08:00
ltd0924
d17886de19
[Feature] support ep in mixed mode ( #3001 )
...
* [LLM] support ep
* Update worker_process.py
* Update expert_service.py
* Update worker_process.py
* format files
2025-07-30 20:43:39 +08:00
JYChen
bd29b2aaca
add stop_seqs doc ( #3090 )
2025-07-30 20:36:18 +08:00
Jiang-Jia-Jun
6ead7a3a49
Update setup.py
2025-07-30 20:21:41 +08:00
YUNSHEN XIE
e4ba9a0dde
debug use ( #3095 )
2025-07-30 20:18:36 +08:00
Zhida Hu
3f8a41e68c
[*] fix the memory leak when modify qp to rts failed ( #3051 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-30 19:49:07 +08:00
李泳桦
b242150f94
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3058 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [fix] fix ci consistency test error with reasoning parser
* [doc] update docs related to metadata
* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
bukejiyu
db698bda01
qwen loader ( #3057 )
2025-07-30 19:09:38 +08:00
AIbin
28fff1b035
Revert "Add uinttest for moe_ffn_wint2. ( #3037 )" ( #3085 )
...
This reverts commit 327e1943fa .
2025-07-30 19:04:07 +08:00
YuanRisheng
acc5c0aa85
add ci for custom op approve ( #3079 )
2025-07-30 16:50:20 +08:00
zhink
d89b6dd43f
adapter qwen3 moe attr for init ( #3066 )
...
adapter qwen3 moe attr for init
2025-07-30 16:49:28 +08:00
bukejiyu
8e203666d9
w4a8 offline ( #3074 )
...
* w4a8 offline
* update
* update
* update
2025-07-30 16:33:30 +08:00
ming1753
5acde4eb43
[Feature] Multimodal Scheduler V1 ( #3019 )
...
* [Feature] Support multimodal scheduler v1
* remove debug log
* fix bug
* fix format
* modify code
* fix bug
* fix bug
* fix bug
* modify code
2025-07-30 16:05:55 +08:00
Jiang-Jia-Jun
ffa0f4d99b
[Fix] Fix version function ( #3076 )
...
* [Fix] Fix version function
* Fix commit
* Fix commit
* fix code sync
* Update coverage_run.sh
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-30 16:05:24 +08:00
ltd0924
ecf2fd5b9a
[BugFix] vl encoder tokens dtype problem ( #3069 )
2025-07-30 15:20:53 +08:00
YuanRisheng
eeadbf332a
delete unused unittest ( #3065 )
2025-07-30 15:11:58 +08:00
Yiqun Liu
327e1943fa
Add uinttest for moe_ffn_wint2. ( #3037 )
...
Change-Id: Ifd452527eaf87ea96c3fa4fa9aeb17729b33c2de
2025-07-30 15:03:09 +08:00
Yuan Xiaolan
35935da9e5
support W4A8 EPLB ( #3075 )
2025-07-30 14:34:12 +08:00
Yzc216
159767717d
[Feature] multi source download ( #3072 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
* modify model download path
2025-07-30 14:10:13 +08:00
Zero Rains
4dc130c5a9
[Doc] add repetition early stopping doc ( #3078 )
...
* add repetition early stop doc
* add the early_stop.md
2025-07-29 22:01:57 -07:00
YuanRisheng
99a70fc722
unify parallel config ( #3070 )
2025-07-30 11:41:23 +08:00
lddfym
5ca684c762
update doc: load_balance.md ( #3008 )
...
* update doc of load_balance
* update doc: load_balance.md
2025-07-30 10:27:56 +08:00
Sunny-bot1
74aa31d15b
[Feature] support bad_words ( #3055 )
...
* support bad_words
* support online infer bad_words
* update
* add CI test
* update
* update
* update
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-07-30 09:31:29 +08:00
Sunny-bot1
9c962343f2
[Docs] add sampling docs ( #2973 )
...
* add sampling docs
* add minp sampling docs
* update sample docs
* update
* update
* add bad words desc
* update
2025-07-30 02:24:16 +08:00
zhuzixuan
ad7bb52a28
修复传入max_tokens=1时的报错 ( #3068 )
...
* 修复传入max_tokens=1时的报错
* 修复传入max_tokens=1时的报错
* 修复传入max_tokens=1时的报错
* 修复传入max_tokens=1时的报错
* 修复传入max_tokens=1时的报错
* 修复传入max_tokens=1时的报错
2025-07-29 23:49:28 +08:00
Ryan
73cfe1fd37
[SOT] Extend SOT warmup support to new hardware ( #3032 )
...
* add new hardware
* add_sot_warmup4new_hardware
* fix conflict
* rm Optional
2025-07-29 22:45:20 +08:00
Zero Rains
b2f9a42d87
[Feature] Support repetition early stop ( #3024 )
...
* support repetition early stop and support user to set the parameter
* remove log
* fix codestyle
* add the early_stop_config to rollout_config
* update config and EarlyStopper class
* fix the bug for triton
* modify the stop method
* update description
* modify the usage for stop_flags
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-07-29 22:42:54 +08:00
Yuan Xiaolan
3214fb5393
support model loading for w4a8 offline quant ( #3064 )
...
支持W4A8 EP 对离线量化权重的load
2025-07-29 21:54:37 +08:00
Longzhi Wang
be0a0f2bb2
fix arguement error in ep when pd ( #3060 )
2025-07-29 17:17:24 +08:00
YuanRisheng
502ee92a0a
Unify server-side and model-side Config (Part3) ( #3047 )
...
* merge model config
* fix arch
* fix rl
2025-07-29 17:07:44 +08:00
Longzhi Wang
907d561523
fix ep when paddle version mismatch ( #3056 )
2025-07-29 15:06:49 +08:00
JYChen
dafe02a7b9
[stop sequence] support stop sequence ( #3025 )
...
* stop seqs in multi-ends
* unittest for gpu stop op
* kernel tid==0
2025-07-29 14:17:37 +08:00
YuanRisheng
1a815b7a2a
Fix Speculative Config bug ( #3049 )
...
* fix speculative bug
* fix rl
2025-07-29 10:50:48 +08:00
yinwei
f2a528f9ae
[XPU] Support kvblock centralized management ( #3017 )
2025-07-29 10:40:55 +08:00
Jiang-Jia-Jun
286802a070
Update ernie-4.5.md
2025-07-29 10:10:09 +08:00
Yuan Xiaolan
7d87aaace8
optimize w4a8 decoding ( #3050 )
2025-07-28 22:20:13 +08:00
lizhenyun01
e80ea8a71b
remove Synchronize in hadamard
2025-07-28 19:22:46 +08:00
Yuan Xiaolan
b1d787a272
[fix] w4a8 model loading and hadamard config ( #3013 )
2025-07-28 18:17:59 +08:00
YUNSHEN XIE
c8bf8b3913
add logprob ci test ( #3022 )
...
* add logprob ci test
2025-07-28 17:30:58 +08:00
K11OntheBoat
83048bbe55
[Feature] Deepseekv3 supports cudagraph ( #3041 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-07-28 17:12:54 +08:00
AIbin
ec52d39e68
【Inference Optimize】Update wint2 weight n-dim reorder ( #3042 )
2025-07-28 16:31:56 +08:00
YuanRisheng
bddf403576
Unify server-side and model-side Config (Part2) ( #3035 )
...
* merge speculative and graph opt conifg
* add attr
2025-07-28 15:31:48 +08:00
yinwei
776fb03250
add error info ( #3040 )
2025-07-28 15:10:28 +08:00
YUNSHEN XIE
60311956e4
fix(ci): correct diff coverage data download URL ( #3036 )
2025-07-28 14:44:02 +08:00
lizhenyun01
238766e403
fix c4 prompt_cache
2025-07-28 14:31:37 +08:00
chen
01485cd28b
MTP rejection_topp add topk input ( #3031 )
2025-07-28 13:58:45 +08:00
begin2023
dd877f38b1
[Perf] Remove unnecessary operations in non-cuda_graph ( #3010 )
...
* [Perf] Remove unnecessary operations in non-cuda_graph
* fix code logic
* use suggestion comment
* reduce function call
* reduce function call
* reduce function call
* reduce function call
2025-07-27 20:38:29 -07:00
Longzhi Wang
247010d298
fix arguement error ( #3030 )
2025-07-28 11:03:29 +08:00
YuanRisheng
6ccc10ad47
Unify server-side and model-side Config (Part1) ( #3018 )
...
* move cache config
* fix mtp
2025-07-28 10:51:52 +08:00
Yiqun Liu
8f426c1690
Optimize the performance of moe_expert_ffn_wint2 ( #2990 )
...
* Change wint2 to ColumnMajor.
Change-Id: I6b44d02946a685f8fe24d9f2c7be258b51e16da2
* Unify default_wint2x_mma.
Change-Id: I9e77b0e8e6cecab01fedc0b24b536ee0a1a89ff7
* Change wint2 to ColumnMajorTileInterleave.
Change-Id: I593cbe36f991c0c5044989d65f0014087587c624
* Enable async copy for B.
Change-Id: Ia3ac37ad162a8cf3ccce4f268e81bd06c8ac3c46
* Add wint2x Dequantizer
* Remove TileDequanterB related codes.
Change-Id: Id8e65703b72a8984d367f584ff41b7726017fbb8
* Implement FastInterleavedAndBiasedNumericArrayConverter for wint2.
Change-Id: I438f2b18ab964a04ae1cdb09d9e7d9f7b95eafca
* Implement Wint2ParamsAccessor to load extra quant params from global memory.
Change-Id: Ic3750cd9b767df8893501820880c3342a4b47233
* Implement FastInterleavedAndBiasedNumericArrayConverter for wint2.
Change-Id: I438f2b18ab964a04ae1cdb09d9e7d9f7b95eafca
* Use async copy for local_scale.
Change-Id: Ib882ba41c3d2354bda4d25b40e2408ad3b2f7893
* Check and correct the load and dequantize of weights.
Change-Id: Ie8dca505b39987144964fe6407d465b3b5953790
* Change for performance tuning.
Change-Id: I1da026fb1d1533a9d70350c7ba23c27e896cfc29
* Optimize the global memory access size of local_scale reading.
Change-Id: I4cbe3a2ef5951723d415c2d3252ce912394beaf5
* Specialize mma_tensor_op for wint2 to enable fine-grained pipeline.
Change-Id: Icbb4d48f90a41136f42d6ffff42d68de32f408da
* Minor fix.
Change-Id: I14d4ac9d267ee05442a3b47f00c26bee13d79e6f
* optimizing dequant performance with LOP3
* optimizing dequant performance with LOP3
* Avoid redundant dequantization of local_scale and use bf16 as computing type.
Change-Id: I63239ebc8f8e4a92d6281af59840ba50600b4334
* Add Multiplier and remove some logs.
Change-Id: Ifa199d81e6aeb472d2247c63f85ef30213684bcd
* optimizing dequant performance with LOP3
* Use __byte_perm to implement int8 to float32 conversion for performance improvement.
* Use lop3 to optimize the dequantize of local_scale.
Change-Id: I6189759970cb5b8dcbef769724784b8a7533b63c
* Minor fix and remove some logs.
Change-Id: I6279ba9926d5041093b1c6aea200acf2e4c49d46
* Fix stages for test.
Change-Id: I6f7b7cac612ef2c678e9d49f5ffa60eb53d3ae29
* Fix stages for test and add clock64 to profile.
Change-Id: Iffaf7324beaa910ce9ee56f47ae289de98f1a267
* Use __byte_perm to replace shift-and-or operations for faster integer merging.
* Split the uint2b convert.
Change-Id: I78da672ce8968e21f685285140ba546a161521b4
* Optimize convert of unscale.
Change-Id: I6795da1cdf5e8ab38ddaa9836240921b5312913a
* Minor optimization.
Change-Id: I1800aec34c3f4621abb02658208108f54da44d88
* Optimize mma pipeline and refine codes.
Change-Id: Id3075cf7b88f2813a11ccd1d3b49c62c978f36b8
* Add missing support.
Change-Id: Id65b7bc2c25fbb1a5b232c6bc9fb8c9093f691a8
* Accelerate FP16 dequantization performance
* Support tile shape as Xx64x64.
Change-Id: Ib8fd37e1ba1d06f7d11f2956e7f1367b0a92bcac
* Remove debugging codes and minor optimization.
Change-Id: I6b79bd56a6e8dd823efc169967ecd3cc9a43baf4
* Fix offset bug.
Change-Id: Id7aeb91e99d6f51836f2aff22187b4f79607395e
* Fix typo.
Change-Id: I19dde93fc1c1f7e19605905c90dc46298e203952
* Restore some codes and remove some debugging logs.
Change-Id: I8d44daf82ad1c6f8174134d195e7b3fe9a3afdfb
---------
Co-authored-by: baoqiwen <baoqiwen@baidu.com >
2025-07-28 10:32:43 +08:00
YUNSHEN XIE
fb410b5f4c
Add unit test run and coverage report generation ( #3011 )
...
* Add unit test run and coverage report generation
* fix
* fix: upload coverage report failure
* fix
* update
* fix
* fix
* update
2025-07-27 22:48:34 +08:00
YUNSHEN XIE
1d29dd80f7
modified dockerfile ( #3026 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-25 21:10:23 +08:00
李泳桦
69996a40da
[feat] add disable_chat_template in chat api as a substitute for previous raw_request ( #3020 )
...
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request
* [fix] pre-commit code check
2025-07-25 20:57:32 +08:00
Longzhi Wang
0700c90caa
[Feat] support mixed ep ( #2969 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support mixed ep
* fix comment
* fix comment
* update mixep
* fix conflict
* fix typo
* update
* fix typo
* fix code style
* fix conflict
2025-07-25 15:29:30 +08:00
chen
332154f504
[feature] Support FA2 ( #3009 )
2025-07-25 14:09:00 +08:00
YuBaoku
4b02b96467
[CI] fix codestyle_check ( #3015 )
2025-07-25 14:02:34 +08:00
EnflameGCU
8c167e130c
[GCU] Update post_process ( #3012 )
2025-07-25 11:03:03 +08:00
EnflameGCU
7634ffb709
[GCU] Add CI ( #3006 )
2025-07-25 10:59:29 +08:00
Jiang-Jia-Jun
6ce3a8a497
Update index.md
2025-07-25 10:32:47 +08:00
xiaoxiaohehe001
2970b00dfa
[Feature] Support_eplb ( #2997 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb
* [Feature] support_eplb
* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
littledgg
f37d00e856
[Model] Provide clearer error for missing KV cache quantization scales ( #3007 )
2025-07-24 20:15:00 +08:00
EnflameGCU
c40df1802e
[GCU] Update to develop ( #2988 )
2025-07-24 19:30:52 +08:00
Yzc216
980126b83a
[Feature] multi source download ( #3005 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
2025-07-24 17:42:09 +08:00
Zero Rains
0fb37ab7e4
update flake8 version to support pre-commit in python3.12 ( #3000 )
...
* update flake8 version to support pre-commit in python3.12
* polish code
2025-07-24 01:43:31 -07:00
Zhang Yulong
5151bc92c8
Update benchmark tools ( #3004 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* update benchmark tools
* update benchmark tools
2025-07-24 15:19:23 +08:00
ltd0924
f935d6f862
[BugFix] fix multinode deployment ( #2977 )
2025-07-24 15:04:04 +08:00
ltd0924
3792345c3a
[LLM] update function name ( #2985 )
...
* [LLM] update function name
2025-07-24 15:03:40 +08:00
Yzc216
e14587a954
[Feature] multi-source download ( #2986 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
2025-07-24 14:26:37 +08:00
YUNSHEN XIE
87a2f4191d
add ci reuse action ( #2968 )
...
* add ci reuse action
* fix code formatting
* update
2025-07-24 14:24:10 +08:00
xiaoxiaohehe001
2c0ff068e2
[Fix] fix mm ep empty run ( #2999 )
2025-07-24 14:15:55 +08:00
xiegegege
e3a843f2c5
[benchmark] add quantization for benchmark yaml ( #2995 )
2025-07-24 13:26:34 +08:00
lizhenyun01
6235ef3881
fix chunk_prefill
2025-07-24 12:00:52 +08:00
lizhenyun01
29c3292f02
support c4 attn && fix cache
2025-07-24 12:00:52 +08:00
lizexu123
832d25334a
[Code Simplification] fix init_distributed_environment() ( #2982 )
2025-07-24 11:43:28 +08:00
bukejiyu
bfeb664ab8
update ( #2978 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
chenjian
85a78d695d
[Feature] Support block scheduler v1 for FD ( #2928 )
...
* Support FD block scheduler v1
* Support FD block scheduler v1
* Support FD block scheduler v1
* Fix according to copilot review
* Fix according to review
* Remove is_dummy
* Fix bug when real_bsz=1
* Fix infer first token cost time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-23 20:31:31 +08:00
Zero Rains
ca0f71bd39
polish code for prefill restrictions ( #2991 )
2025-07-23 05:10:14 -07:00
chen
172e69fe17
FA3 fix bug ( #2987 )
2025-07-23 19:07:43 +08:00
zhink
1272c7ce98
Fix performance degradation bug of custom_all_reduce ( #2981 )
2025-07-23 17:45:44 +08:00
Zero Rains
850c9d98d4
[BugFix] Add prefill restrictions for chunked_prefill+VL ( #2983 )
2025-07-23 01:45:57 -07:00
freeliuzc
a39a67334c
fix mtp bug in pd-split mode ( #2970 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-23 15:31:16 +08:00
YuBaoku
6c4cfd9359
[CI] add codestyle_check action ( #2972 )
...
* [CI] add codestyle_check action
* [CI] Integrate codestyle check via pre-commit in GitHub Actions
2025-07-23 15:21:56 +08:00
lizexu123
9b22b8d2c3
delete max-len ( #2959 )
2025-07-23 15:11:39 +08:00
Jiang-Jia-Jun
5b59a97030
Update README.md
2025-07-23 13:52:14 +08:00
Jiang-Jia-Jun
475dc6d84e
Update README.md
2025-07-23 13:47:31 +08:00
chen
ad202272ed
【Infer】Improve the performance block_wise_fp8 of triton_moe_backend ( #2942 )
2025-07-23 13:02:50 +08:00
lizhenyun01
e51f018577
support chunk_prefill in fa3
2025-07-23 12:19:20 +08:00
Ryan
95b5af24db
[SOT] Add sot warmup (NVIDIA GPU Only) ( #2929 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add sot warmup
* fix code style
* change batch_size list
* add param to config
* rm free_list settings && set sot_warmup_sizes
* finish debug with dynamic dims by type annotations
* add profile_run guard
* rm sth useless
2025-07-22 21:36:14 +08:00
Sunny-bot1
7c5e34e72d
[FIX]fix rejection sampling when topp=0 using _SAMPLING_EPS ( #2967 )
...
* fix rejection sampling when topp=0
* fix
2025-07-22 05:53:37 -07:00
gaoziyuan
dbe6225b33
fix rl config local rank ( #2957 )
2025-07-22 04:39:54 -07:00
GoldPancake
9b84d51e25
[MTP Fix] Fix code and register cpp operators ( #2965 )
2025-07-22 19:36:24 +08:00
K11OntheBoat
93bb68aa71
[Feature] Marlin MoE backend supports DeepseekV3 ( #2962 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-07-22 18:11:15 +08:00
GoldPancake
dc67c10a7e
[Feature][MTP]Support multi-step MTP ( #2952 )
2025-07-22 16:26:29 +08:00
luukunn
920e6b3f60
[Fix]fix empty prompt_token_ids,update the parser's triggering condit… ( #2891 )
2025-07-22 16:13:05 +08:00
Zero Rains
89a485b69f
[Feature] Support using prefix-caching + cudagraph for inference ( #2924 )
...
* fix the bug in cudagraph+prefix-caching but still have some bug with profile
Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397
* add the signal to make sure cache manager launched
* fix judge condition
* reomove useless control
* update control stream
* update
* fix xpu
* change the do_profile flag
* update
* add new threads to init cache_manager
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-07-22 00:59:45 -07:00
Nyakku Shigure
48e6a0ca26
[SOT] Mark dynamic dims by type annotations ( #2771 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [SOT] Mark dynamic dims by type annotations
* fix conflict of forward_meta
* mark more attn backend
* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS
* auto infer implicit 0 dim dynamic dim
* revert manual marked dims
* revert missing update
* auto infer can use unsafe code in warmup stage
* check -> type_match
* fix codestyle
* restore blank line
* empty commit
* add need_warmup nonlocal;
* add doc for resolver
* add missing type hints
* unquote "ForwardMeta"
2025-07-22 00:23:52 -07:00
K11OntheBoat
e991777757
[Feature] DeepseekV3 use pd_build_static_op ( #2948 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-07-22 15:03:41 +08:00
李泳桦
2a8a2c06de
[fix] non-streaming api now returns full output ids if return_token_ids is enabled ( #2951 )
2025-07-22 14:35:56 +08:00
lifulll
2c6a9e887e
native top_p_sampling ( #2901 )
2025-07-22 14:09:59 +08:00
gaoziyuan
0eedbdaee0
fix import error ( #2944 )
2025-07-22 14:06:01 +08:00
K11OntheBoat
8020927f50
[BugFix] Rename attention params of deepseekv3 ( #2939 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-07-22 14:01:30 +08:00
Jiang-Jia-Jun
56102e91e1
[Polish] Return error message of raw_request ( #2946 )
...
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-22 10:21:32 +08:00
zhink
0262ef7eb3
custom all reduce support cuda graph ( #2938 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag
* rename communication_op to communication
2025-07-21 22:52:03 +08:00
周周周
ff4569f135
remove some code in ep.py ( #2947 )
2025-07-21 22:44:57 +08:00
李泳桦
8a619e9db5
[Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body ( #2940 )
...
* [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body
* [fix] return_token_ids not working in curl request
* [test] improve some test cases of return_token_ids and prompt_token_ids
* [fix] the server responds ok even if request.messages is an empty list
2025-07-21 19:31:14 +08:00
littledgg
2845bde964
[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph ( #2936 )
...
* [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph
* Fix: Apply black formatting
2025-07-21 16:25:51 +08:00
Yuanle Liu
2f74e93d7e
use dist.all_reduce(min) to sync num_blocks_local ( #2933 )
...
* pre-commit all files check
* reduce min num_blocks_local
* fix nranks=1
* pre-commit when commit-msg
2025-07-21 01:23:36 -07:00
lizexu123
67990e0572
[Feature] support min_p_sampling ( #2872 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fastdeploy support min_p
* add test_min_p
* fix
* min_p_sampling
* update
* delete vl_gpu_model_runner.py
* fix
* Align usage of min_p with vLLM
* fix
* modified unit test
* fix test_min_sampling
* pre-commit all files
* fix
* fix
* fix
* fix xpu_model_runner.py
2025-07-20 23:17:59 -07:00
gaoziyuan
95a214ae43
support trainer_degree in name_mapping ( #2935 )
2025-07-20 23:12:55 -07:00
YuanRisheng
bce2c6cd7c
rename test dir ( #2934 )
2025-07-21 14:05:45 +08:00
ltd0924
cc4cec0a74
Update engine_client.py ( #2931 )
2025-07-21 11:42:16 +08:00
liddk1121
17c5d3a241
[Iluvatar GPU] Add CI scripts ( #2876 )
2025-07-21 09:44:42 +08:00
周周周
8c5407d9e4
remove cum_offsets from ForwardMeta ( #2925 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-19 23:57:27 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
ZhangYulongg
b8676d71a8
update ci cases
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-18 21:44:07 +08:00
ZhangYulongg
43976138de
update ci cases
2025-07-18 21:44:07 +08:00
ZhangYulongg
e546e6b1b0
update ci cases
2025-07-18 21:44:07 +08:00
ZhangYulongg
9c8292fb19
update ci cases
2025-07-18 21:44:07 +08:00
ZhangYulongg
a5e95013b5
update ci cases
2025-07-18 21:44:07 +08:00
ZhangYulongg
93481a5478
update ci cases
2025-07-18 21:44:07 +08:00
ZhangYulongg
eb77b1be6d
update ci cases
2025-07-18 21:44:07 +08:00
ming1753
5328daa333
[Bug Fix] fix ep config bug ( #2920 )
2025-07-18 19:12:56 +08:00
xiaoxiaohehe001
a42fc3f40b
[Feature] Support 45tVL EP FP8 Infer. ( #2909 )
...
* support_mm_ep_fp8
* support_mm_ep
2025-07-18 17:57:15 +08:00
Jiang-Jia-Jun
fbe3547c95
[Feature] Support include_stop_str_in_output in chat/completion ( #2910 )
...
* [Feature] Support include_stop_str_in_output in chat/completion
* Add ci test for include_stop_str_in_output
* Update version of openai
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-18 16:59:18 +08:00
gaoziyuan
6efad14b95
support vl ori_vacab_size ( #2900 )
2025-07-18 16:26:14 +08:00
周周周
d306944f4f
remove cum_offsets from get_block_shape_and_split_kv_block ( #2913 )
...
* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove cum_offsets from get_block_shape_and_split_kv_block
* remove cum_offsets from get_block_shape_and_split_kv_block
2025-07-18 16:13:32 +08:00
YUNSHEN XIE
e81137e581
fix ci workflow ( #2896 )
2025-07-18 16:01:00 +08:00
RAM
cd52dc0f65
[Executor] Fix set capture sizes bug ( #2902 )
2025-07-18 15:12:19 +08:00
周周周
1339e56282
[XPU] Remove padding_offsets from get_padding_offset.cu ( #2911 )
2025-07-18 14:16:44 +08:00
YuanRisheng
0eb5dc18d3
[BugFix]Fix sample rejection ( #2908 )
...
* fix config
* fix rejection
2025-07-18 13:44:30 +08:00
sg263
e679567d59
[Trace]fix opentelemetry can not work in uvicorn ( #2906 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* fix annotation
* fix annotation when add opentelemetry
* fix opentelemetry-instrumentation-fastapi
* fix pentelemetry-bootstrap
* fix opentelemetry can not work in uvicorn
* move conf to env
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 23:16:45 +08:00
RAM
bbe2c5c968
Update GraphOptimizationBackend docs ( #2898 )
2025-07-17 21:38:18 +08:00
ltd0924
4b14dca1d6
[LLM] delete fixed slots ( #2893 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 19:19:54 +08:00
yulangz
c8c280c4d3
[XPU][Doc] fix typo ( #2892 )
2025-07-17 19:13:54 +08:00
周周周
ddb10ac509
[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )
...
* remove padding_offsets from atten
2025-07-17 18:41:31 +08:00
freeliuzc
d49f8fb30a
[Feature][MTP] Support cacheKV transfer in per_chunk mode ( #2890 )
...
* support chunk_prefill both normal and speculative_decoding(mtp)
* optimize pd-disaggregation config
* fix bug
2025-07-17 17:58:08 +08:00
ming1753
67180c1ff9
[Bug Fix] fix bug of prompt penalty ( #2888 )
2025-07-17 17:21:37 +08:00
Xintong Yu
273efba76f
[Fix] remove misleading variables ( #2841 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 16:49:14 +08:00
YUNSHEN XIE
1cfba5ba3e
enable CI workflow for pull requests targeting release/* branches ( #2887 )
2025-07-17 16:48:03 +08:00
Jiang-Jia-Jun
31cab9f87b
Update test_openai.py
2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun
d3dfa1446c
Update test_openai.py
2025-07-17 16:07:07 +08:00
ltd0924
b630031414
[LLM] fix serval bugs ( #2878 )
2025-07-17 14:21:05 +08:00
LokeZhou
f50c25178b
[MM_PROCESS] add _extract_labels ( #2879 )
2025-07-17 14:20:01 +08:00
Yuanle Liu
dbb9e2506b
Fix rollout_model init ( #2881 )
2025-07-16 22:36:21 -07:00
ming1753
1f15ca21e4
[Feature] support prompt repetition_penalty ( #2806 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 12:05:52 +08:00
yulangz
7dfd2ea052
[XPU][doc] Update minimal fastdeploy required ( #2863 )
...
* [XPU][doc] update minimal fastdeploy required
2025-07-17 11:33:22 +08:00
GoldPancake
42d4001400
[Features] Add speculative metrics ( #2857 )
2025-07-17 11:08:55 +08:00
sg263
52aca233e8
[Trace] fix annotation when add opentelemetry ( #2869 )
...
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* fix annotation
* fix annotation when add opentelemetry
* fix opentelemetry-instrumentation-fastapi
* fix pentelemetry-bootstrap
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 10:29:16 +08:00
ltd0924
9c25dcca0b
[LLM] Update Multinode Deployment ( #2830 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] fix ci bugs
* Update fastdeploy/engine/args_utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [LLM] update random port
* [LLM] update random port
* [LLM] fix ci bugs
* fix ci bugs
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-07-16 23:42:54 +08:00
ltd0924
d245d1ca6c
[LLM] support send batch data and aggregate data ( #2860 )
...
* [LLM] support send batch data and aggregate data
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] update
2025-07-16 23:42:20 +08:00