Commit Graph

90 Commits

Author SHA1 Message Date
ltd0924
3d92fb09f7 [BugFix] fix parameter is 0 (#3592)
* Update engine_client.py

* fix

* Update common_engine.py
2025-08-28 09:52:36 +08:00
lizexu123
b28a0343a6 fix ENABLE_V1_KVCACHE_SCHEDULER (#3625)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-27 21:21:29 +08:00
Jiang-Jia-Jun
c694fa2879 Revert "[Feature] block sparse attention (#3209)" (#3647)
This reverts commit 646a0c2fd8.
2025-08-27 17:35:04 +08:00
chen
ce9c0917c5 [Precision] Support lm_head layer running in float32 (#3597)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support lm_head fp32 bf16 fp16

* support lm_head fp32 bf16 fp16

* add doc and check code

* lm_head_fp32 specify lm_head as fp32

* code check

* check doc
2025-08-27 11:34:53 +08:00
yangjianfengo1
646a0c2fd8 [Feature] block sparse attention (#3209)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* 支持稀疏attn

* fix bug

* code style

* fix moba attn get kv shape

* 修复a100编译

* codestyle

* code style

* code style

* code style

* fix conflict

* 增加单侧

* code style

* 增加eblite 加载时间

* fix bug

* for ci

* for ci

* for ci

* for ci

* 支持mlp block size 128

* 增加小算子单测

* fix 单测 mlp

* 将环境变量加入到config里面

* fix rollout config
2025-08-26 07:16:04 -07:00
gaoziyuan
82e64b13e1 [NewFeature]Support dp multi api server && Fix some bug in mixed ep && merge develop (#3598)
* [Feature] update ep

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix queue ports idx

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* fix ci

* Update engine.py

* fix ci

* fix some bug in mixed ep

* add server fix and op fix

* rm some log

* fix code style

* ltd fix

* fix

* fix

* fix some bug

* fix bug

* fix bug

* fix style

* Update config.py

* Update splitwise_connector.py

* Update cache_messager.py

* Update __init__.py

* merge and fix

* Update engine.py

* Update common_engine.py

* Update run_ci_xpu.sh

* Update ernie_processor.py

* Update ernie_processor.py

---------

Co-authored-by: ltd0924 <ltd0924@sina.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
2025-08-26 19:59:02 +08:00
Sunny-bot1
c68c3c4b8b [Feature] bad words support v1 scheduler and specifiy token ids (#3608)
* support bad_words_token_ids

* docs

* fix test

* fix

* bad words support kvcache v1 and token ids

* fix
2025-08-25 20:14:51 -07:00
lizexu123
c43a4bec00 [Features] support hugging face qwen3 dense and qwen2 model (#3574)
* support qwen2 and qwen3 hugging face

* fix moe

* defualt_v1 loader

* hugging_face_format deprecated

* modify hugging_face_foramt to model_format

* model_format auto

* fix environemt

* fix bug

* fix qwen3-0.6 bug

* model_format is str

* fix
2025-08-26 10:54:53 +08:00
chen
9cab3f47ff [Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552)
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing

* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs

* delete some code

* code check

* code check and add doc

* fix tokenizer.decoder(-1), return 'Invalid Token'

* add ci for temp_scaled and top_p logprobs

* check test

* check seq len time shape

* logprob clip inf

---------

Co-authored-by: sunlei1024 <sunlei5788@gmail.com>
2025-08-25 14:11:49 +08:00
zhink
df7c31012b Modified to support custom all reduce by default (#3538) 2025-08-22 16:59:05 +08:00
YuanRisheng
5b66462f0e Fix fdconfig bugs (#3528)
* fix config

* fix parallel

* fix ips

* fix rl

* open code
2025-08-22 16:17:15 +08:00
YuanRisheng
c389a4013c Unify server-side and model-side Config(Part-5) (#3497)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* move config

* fix xpu

* fix

* fix vl

* fix vl

* fix unitest

* fix args

* add unitest

* fix test
2025-08-21 19:00:21 +08:00
Yzc216
466cbb5a99 [Feature] Models api (#3073)
* add v1/models interface related

* add model parameters

* default model verification

* unit test

* check model err_msg

* unit test

* type annotation

* model parameter in response

* modify document description

* modify document description

* unit test

* verification

* verification update

* model_name

* pre-commit

* update test case

* update test case

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/entrypoints/openai/test_serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/entrypoints/openai/serving_models.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-21 17:02:56 +08:00
kevin
67298cf4c0 add error traceback info (#3419)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add error traceback info

* update error msg

* update code

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-19 19:32:04 +08:00
chen
6735626014 fix request_output sampling_params (#3154) (#3464) 2025-08-19 13:52:50 +08:00
luukunn
3a7a20d191 [Feature] Pass through the chat_template_kwargs to the data processing module (#3421)
* fix chat_template_args

* fix args

* add offline

* add offline

* fix

* fix

* fix default enable_thinking value

* fix default enable_thinking value

* modify condition

* Revert "modify condition"

This reverts commit 26430bdeb1.

* fix unit test
2025-08-19 10:50:01 +08:00
luukunn
9c129813f9 [Feature] add custom chat template (#3251)
* add custom chat_template

* add custom chat_template

* add unittest

* fix

* add docs

* fix comment

* add offline chat

* fix unit test

* fix unit test

* fix

* fix pre commit

* fix unit test

* add unit test

* add unit test

* add unit test

* fix pre_commit

* fix enable_thinking

* fix pre commit

* fix pre commit

* fix unit test

* add requirements
2025-08-18 16:34:08 +08:00
ming1753
396dba0d62 [Bug Fix] Fix V1 video bug (#3388)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-13 23:04:07 +08:00
luukunn
eda83ca672 add Tool Parser (#3272)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add tool-parser

* add tool-parser

* add tool parser

* add tool parser

* fix

* add offline

* add offline

* fix

* parsers:tool&reasoning

* 修改tool parser名称·

* update

* fix reasoning-parser

* add requirements

* fix finish reason

* fix

* fix reasoning-parser

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>
2025-08-13 01:06:55 +08:00
ming1753
f5164215be [Bug Fix] fix vl V1 schedule bug (#3323)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug Fix] fix vl V1 schedule bug

* fix format
2025-08-12 11:31:39 +08:00
chenjian
b21272d9ff [Bug fix] fix block num setting in scheduler v1 for develop (#3303)
* fix block num setting in scheduler v1

* fix block num setting in scheduler v1

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting

* fix max_block_num and max_num_batched_tokens setting
2025-08-12 10:38:51 +08:00
Zero Rains
b23af29d0b Launch expert_service before kv_cache initialization in worker_process (#3045)
* launch expert_service before kv_cache initialization

* add two signal make sure model loading and expert_service lauching finished

* fix the EP bug

* fix ep

* update launching way

* fix ep

* update

* roback ep

* pre-commit all files

---------

Co-authored-by: RAM <gstian5555@outlook.com>
Co-authored-by: Divano <dddivano@outlook.com>
2025-08-11 19:38:46 +08:00
chen
46c8491201 merge logprob into batch_output (#3266) 2025-08-11 10:03:00 +08:00
chenjian
c011cb8b16 [Bug Fix] Fix scheduler bug in develop (#3292)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fix scheduler bug in develop

* Fix scheduler bug in develop

* Fix scheduler bug in develop
2025-08-10 13:55:38 +08:00
Zero Rains
ce1f353c70 Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend (#3148)
* w4a8 bug

* fix w4a8 bug

* remove code

* modify the triton backend

* fix ep

* fix the bug with tensor_wise_fp8 in triton backend

* fix the RL

* fix bug by merge

* fix the bug in w4a8

* fix the tensor_wise_fp8 bug

* fix RL
2025-08-08 15:55:47 +08:00
yzwu
fbdd6b0663 [Iluvatar GPU] Optimze attention and moe performance (#3234) 2025-08-08 10:51:24 +08:00
JYChen
9423c577fe [stop_seq] fix out-bound value for stop sequence (#3216)
* fix out-bound value for stop sequence

* catch error if there are out-of-bounds value

* check in offline mode

* add ut tests
2025-08-07 15:40:21 +08:00
lizexu123
afff4d37ea [Feature] support seed parameter (#3161)
* support seed

* fix

* add SamplingMetadata seed test

* The next_tokens values are inconsistent!

* add air and rejection seed test

* fix

* add SamplingParams seed test

* fix seed=0

* Default to defualt

* fix

* fix args_utils

* fix review

* fix review

* fix

* fix

* add xpu,gcu,iluvatar support seed

* fix
2025-08-06 15:20:47 +08:00
Jiang-Jia-Jun
55939f7942 Update engine.py 2025-08-05 16:10:36 +08:00
Sunny-bot1
72ef5a9c93 [FIX]fix bad_words when sending requests consecutively (#3197)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix bad_words

* fix log

* fix log
2025-08-04 05:59:41 -07:00
周周周
2bd8a50649 remove useless code (#3166) 2025-08-04 18:03:08 +08:00
ApplEOFDiscord
b71cbb466d [Feature] remove dependency on enable_mm and refine multimodal's code (#3014)
* remove dependency on enable_mm

* fix codestyle check error

* fix codestyle check error

* update docs

* resolve conflicts on model config

* fix unit test error

* fix code style check error

---------

Co-authored-by: shige <1021937542@qq.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-01 20:01:18 +08:00
SunLei
dade19d7a4 [Feature] General support for logprobs (#2974)
* [Feature] support logprobs in chat/completions and completions endpoints

* Temporarily comment out text_offset due to incorrect logic

* Clean up temporary debug prints

* [Feature] support logprobs in offline mode via SamplingParams

* fix: serialize Logprob as dict before zmq send to fix msgpack error

* refactor: remove redundant methods to simplify codebase

* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization

* refactor: centralize param validation in engine_client to reduce duplication

* revert: rollback changes in offline_demo.py

* revert: rollback changes in offline_demo.py

* [bugfix] fix parameter validation for logprobs

* [bugfix] fix parameter validation for logprobs

* [bugfix] fix parameter validation for logprobs

* [bugfix] fix parameter validation for logprobs

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 20:25:56 +08:00
kevin
22cab724e8 [Feature] block scheduler v1 support prefix caching (#3061)
* block scheduler v1 support prefix cache

* update code

* update code

* fix code bug

* add timeout time

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 19:29:19 +08:00
chenjian
32307283f1 Fix bug for offline inference in scheduler v1 (#3117) 2025-07-31 17:54:24 +08:00
chenjian
fe0e3f508b [BUG FIX] Fix bug when preempted request rescheduled (#3080)
* Fix bug when preempted request rescheduled

* Fix bug when preempted request rescheduled

* Fix bug when preempted request rescheduled
2025-07-30 22:25:47 +08:00
YuanRisheng
7dfdd157ac [BugFix]Fix ep size (#3092)
* fix ep

* fix num_layer
2025-07-30 21:03:12 +08:00
ltd0924
d17886de19 [Feature] support ep in mixed mode (#3001)
* [LLM] support ep

* Update worker_process.py

* Update expert_service.py

* Update worker_process.py

* format files
2025-07-30 20:43:39 +08:00
bukejiyu
db698bda01 qwen loader (#3057) 2025-07-30 19:09:38 +08:00
ming1753
5acde4eb43 [Feature] Multimodal Scheduler V1 (#3019)
* [Feature] Support multimodal scheduler v1

* remove debug log

* fix bug

* fix format

* modify code

* fix bug

* fix bug

* fix bug

* modify code
2025-07-30 16:05:55 +08:00
YuanRisheng
99a70fc722 unify parallel config (#3070) 2025-07-30 11:41:23 +08:00
Sunny-bot1
74aa31d15b [Feature] support bad_words (#3055)
* support bad_words

* support online infer bad_words

* update

* add CI test

* update

* update

* update

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-07-30 09:31:29 +08:00
Zero Rains
b2f9a42d87 [Feature] Support repetition early stop (#3024)
* support repetition early stop and support user to set the parameter

* remove log

* fix codestyle

* add the early_stop_config to rollout_config

* update config and EarlyStopper class

* fix the bug for triton

* modify the stop method

* update description

* modify the usage for stop_flags

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-07-29 22:42:54 +08:00
YuanRisheng
502ee92a0a Unify server-side and model-side Config (Part3) (#3047)
* merge model config

* fix arch

* fix rl
2025-07-29 17:07:44 +08:00
JYChen
dafe02a7b9 [stop sequence] support stop sequence (#3025)
* stop seqs in multi-ends

* unittest for gpu stop op

* kernel tid==0
2025-07-29 14:17:37 +08:00
YuanRisheng
1a815b7a2a Fix Speculative Config bug (#3049)
* fix speculative bug

* fix rl
2025-07-29 10:50:48 +08:00
Yuan Xiaolan
b1d787a272 [fix] w4a8 model loading and hadamard config (#3013) 2025-07-28 18:17:59 +08:00
YuanRisheng
bddf403576 Unify server-side and model-side Config (Part2) (#3035)
* merge speculative and graph opt conifg

* add attr
2025-07-28 15:31:48 +08:00
YuanRisheng
6ccc10ad47 Unify server-side and model-side Config (Part1) (#3018)
* move cache config

* fix mtp
2025-07-28 10:51:52 +08:00
李泳桦
69996a40da [feat] add disable_chat_template in chat api as a substitute for previous raw_request (#3020)
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request

* [fix] pre-commit code check
2025-07-25 20:57:32 +08:00