Commit Graph

3260 Commits

Author SHA1 Message Date
lizhenyun01
d40d3a5a4f fix DP&&TP (#3872)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-09-04 14:38:26 +08:00
luukunn
b8d0f1c081 [bug] fix finish reason (#3858)
* add reasoning parser plugin

* fix finish reason

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-09-04 14:36:03 +08:00
ltd0924
8550e19008 [bugfix] scheduler (#3871)
* fix scheduler bug

* fix

* Update api_server.py
2025-09-04 11:34:12 +08:00
chenjian
a0c03510c0 [Bug fix] Fix prompt token ids dtype in v1 (#3861) 2025-09-04 11:02:37 +08:00
chenjian
fb1e0d6a87 [Feature] Set scheduler v1 as default (#3812)
* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default
2025-09-04 11:02:10 +08:00
gaoziyuan
fbf0e9d2aa fix mem boom in ep (#3852) 2025-09-04 10:38:34 +08:00
SunLei
8c0e7d6fe9 Support for async processor added. (#3870)
* Support for async processor added.

* remove yappi code
2025-09-04 10:35:08 +08:00
yangjianfengo1
b56b015d85 fix port (#3865)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-09-04 10:02:08 +08:00
ming1753
1432e336d7 [Bug Fix] Fix bug of multimodal inputs only text (#3850)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-09-03 19:48:10 +08:00
yangjianfengo1
9213a58a06 【Fix bug] w4afp8 的nblock固定为256,并且fa3的append attn 增加mask参数 (#3771) (#3835)
* fix w4afp8

* 增加集中式配置

* codestyle

* fix fa3 append attn
2025-09-03 19:36:45 +08:00
plusNew001
87ef0f5d30 [XPU] Update XPU stable xvllm and xtdk version for 2.2 & Change CI Case (#3855)
* Update no_proxy environment variable in CI workflow

* Install lsof and kill api_server processes

Install lsof tool and kill processes using it.

* Update dependency versions for stable release

* Update CI script to use stable dependencies
2025-09-03 19:33:06 +08:00
plusNew001
abcd2148c0 [XPU]Update XPU CI Case (#3844)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* Update no_proxy environment variable in CI workflow

* Install lsof and kill api_server processes

Install lsof tool and kill processes using it.
2025-09-03 15:29:47 +08:00
gaoziyuan
05b6591c23 【BugFix】add moe noaux_tc tatics in trition backend (#3821)
* add moe noaux_tc tatics in trition backend

* fix

* add dp config
2025-09-03 13:28:44 +08:00
plusNew001
42402c80e9 Update installation method for paddlepaddle-xpu (#3834) 2025-09-03 11:28:27 +08:00
luukunn
1968c65849 add reasoning parser plugin (#3820) 2025-09-03 11:17:13 +08:00
ltd0924
37cb37b7f2 [BugFix] fix scheduler (#3818)
* fix scheduler bug

* fix
2025-09-03 11:16:49 +08:00
bukejiyu
f975f7de2f [v1loader]Reduce EB300B model loading time (#3700) (#3810)
* speed up eb45

* update
2025-09-03 10:14:31 +08:00
Yuanle Liu
174510180a [BugFix] fix error of import paddle.base.core.Config (#3761) (#3804)
* 延迟 import Config

* support chunked_prefill

* support chunked_prefill
2025-09-03 10:14:03 +08:00
ltd0924
5cda326ba2 Update qwen_vl_processor.py (#3806)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-09-02 21:56:24 +08:00
RAM
a6c8f17431 [Executor] Fix bug of import paddle with RLHF (#3781) (#3817) 2025-09-02 21:42:59 +08:00
ltd0924
cd09384a14 [BugFix] fix max streaming tokens invalid (#3799)
* Update serving_chat.py

* Update serving_completion.py

* Update serving_completion.py
2025-09-02 21:03:13 +08:00
ltd0924
0f42771a84 [Feature] support model weight update in ep (#3802)
* Update config.py

* Update ep.py

* Update fused_moe_backend_base.py

* Update dynamic_weight_manager.py

* Update worker_process.py

* fix ci
2025-09-02 20:52:47 +08:00
Jiang-Jia-Jun
d1d063e4af [Feature] Setting number of apiserver workers automatically (#3794)
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-09-02 17:19:07 +08:00
kevin
a86b35ab49 Fix chunked prefill (#3778)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* update enable chunked_prefill

* update code

* update code

* update code
2025-09-02 13:41:55 +08:00
YUNSHEN XIE
0cdbc950b5 fix ce compile task upload error (#3788) 2025-09-02 11:52:50 +08:00
YUNSHEN XIE
2b0a745d57 fix ce build job (#3777) 2025-09-02 10:53:26 +08:00
Jiang-Jia-Jun
1953c7c759 Update FASTDEPLOY_VERSION to 2.2.0 2025-08-31 21:31:12 +08:00
chenjian
465065cd19 [Bug fix] Fix prefix cache in V1 (#3715)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* [Bug fix] Fix prefix cache in V1

* fix code style
2025-08-31 21:29:33 +08:00
lizhenyun01
bed09ae8f8 fix mask_offset in append_attn (#3745)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix mask_offset in append_attn

* fix test
2025-08-31 15:03:16 +08:00
kevin
753772ace8 default enable chunked prefill (#3731)
* add error traceback info

* update error msg

* update code

* default enable chunked prefill

* update code

* update code

* add envs

* update code

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-31 13:15:13 +08:00
李泳桦
98e03fb4ea [feat] add metrics for yiyan adapter (#3219) (#3614)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* [feat] add metrics for yiyan adapter

* [fix] fix metrics num_requests_waiting and num_requests_running

* [fix] fix metrics gpu_cache_usage_perc

* [refactor] change where requests_number increases

* [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly

* [chore] delete useless code
2025-08-30 23:20:58 +08:00
Sunny-bot1
fe5d09f9ee [FIX]Fix Machete compile via ENABLE_MACHETE (#3727)
* add ENABLE_MACHETE

* fix

* revert

* update

* pre_commit

* fix

* fix

---------

Co-authored-by: Ayakouji <yuhongh@qq.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: aquagull <hongyuh@qq.com>
2025-08-30 17:50:17 +08:00
SunLei
b9af95cf1c [Feature] Add AsyncTokenizerClient&ChatResponseProcessor with remote encode&decode support. (#3674)
* [Feature] add AsyncTokenizerClient

* add decode_image

* Add response_processors with remote decode support.

* [Feature] add tokenizer_base_url startup argument

* Revert comment removal and restore original content.

* [Feature] Non-streaming requests now support remote image decoding.

* Fix parameter type issue in decode_image call.

* Keep completion_token_ids when return_token_ids = False.

* add copyright
2025-08-30 17:06:26 +08:00
luukunn
9a7c231f2c [Feature]support chat_template.jinja (#3721)
* add support chat_template.jinja

* add support chat_template.jinja
2025-08-30 17:05:34 +08:00
lizexu123
b21e085f3e [Code Simplification] delete print (#3729) 2025-08-30 16:19:07 +08:00
chen
7568b20098 check (#3720) 2025-08-30 16:04:20 +08:00
lizexu123
455205f991 [Features] support hugging face qwen3 moe (#3649)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* split ut

* qwen3-30B-A3B

* fix

* add test

* add test_torch_model.py

* fix test_torch_model.py

* delete print

* fix moe

* delete init.py

* fix

* fix

---------

Co-authored-by: bukejiyu <395822456@qq.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
2025-08-30 15:26:05 +08:00
Zero Rains
f206474cc7 fix the bug when num_key_value_heads < tensor_parallel_size (#3717) 2025-08-30 12:40:00 +08:00
chenjian
c4b1f6b0a5 [Optimize] Increase zmq buffer size to prevent apiserver too slowly to consume (#3723) 2025-08-30 10:45:26 +08:00
YUNSHEN XIE
a18afcfdd9 Optimize coverage jobs (#3683)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-30 00:12:40 +08:00
chen
cd252ec673 [Feature]support load eb 0.3B and 21B torch model (#3660)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
2025-08-29 20:00:48 +08:00
yangjianfengo1
3754a9906d [Feature] block sparse attention (#3668)
* 支持稀疏attn

* fix bug

* code style

* fix moba attn get kv shape

* 修复a100编译

* codestyle

* code style

* code style

* code style

* fix conflict

* 增加单侧

* code style

* 增加eblite 加载时间

* fix bug

* for ci

* for ci

* for ci

* for ci

* 支持mlp block size 128

* 增加小算子单测

* fix 单测 mlp

* 将环境变量加入到config里面

* fix rollout config

* 修复显存

* add test server

* add test server

* fix mlp  最后一层使用full attn
2025-08-29 19:46:30 +08:00
zhouchong
ccd52b5596 [Model]support qwen2_5_vl (#3557)
* adapt qwen_2_5_vl model

* adapt qwen_2_5_vl VIT model

* adapt qwen2_5_vl images_embeds

* adapt qwen2_5_vl 3D rope

* adapt qwen2_5_vl 3D rope v2

* adapt qwen2_5_vl processor

* adapt qwen2_5_vl bypass resampler_model

* adapt qwen2_5_vl 绕过部分ernie逻辑

* adapt qwen2_5_vl 绕过部分ernie逻辑 v2

* adapt qwen2_5_vl 权重加载与命名修改

* adapt qwen2_5_vl 非必须think_end_id

* adapt qwen2_5_vl 区分多种模型的extract_vision_features

* fix:adapt qwen2_5_vl model

* adapt qwen2_5_vl norm

* adapt qwen2_5_vl  processor 更新

* adapt qwen2_5_vl image and video success

* adapt qwen2_5_vl 部分整理代码

* adapt qwen2_5_vl 支持多卡

* adapt qwen2_5_vl on latest develop

* adapt qwen2_5_vl RL

* adapt qwen2_5_vl 整理代码

* support noex rope3d

* adapt qwen2_5_vl add init.py

* adapt qwen2_5_vl add init.py v2

* adapt qwen2_5_vl remove space

* adapt qwen2_5_vl remove space v2

* adapt qwen2_5_vl pre-commit

* adapt qwen2_5_vl update

* adapt qwen2_5_vl pre-commit v2

* adapt qwen2_5_vl modify comments

* adapt qwen2_5_vl fix indentation

* adapt qwen2_5_vl fix indentation v2

---------

Co-authored-by: wangyafeng <wangyafeng@baidu.com>
Co-authored-by: xiaoxiaohehe001 <49090790+xiaoxiaohehe001@users.noreply.github.com>
Co-authored-by: CSWYF3634076 <58356743+CSWYF3634076@users.noreply.github.com>
2025-08-29 18:28:39 +08:00
YuBaoku
65425bf858 [CI] update paddle version to nightly (#3698) 2025-08-29 18:16:13 +08:00
Yuan Xiaolan
c71ee0831c add w4afp8 offline script (#3636) 2025-08-29 17:56:05 +08:00
zyfncg
f677c032c0 [CudaGraph] [SOT] Support spliting static graph into piecewise graph with cuda_graph (#3478)
* support spliting static graph into piecewise graph with cuda_graph

* Update fastdeploy/model_executor/graph_optimization/cudagraph_piecewise_backend.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix merge conflict

* fix bug

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-29 16:28:01 +08:00
lzy
48d760539b fix deepcopy(tp_group) in spec (#3648) 2025-08-29 16:08:21 +08:00
Ryan
45f81b34f0 add dtype int32 (#3692)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-29 14:56:35 +08:00
xiaoxiaohehe001
1bf4fc7f36 support w4afp8 eplb (#3680) 2025-08-29 14:43:06 +08:00
Yuanle Liu
68f87240da fix key error in mm (#3702) 2025-08-29 14:35:12 +08:00