luukunn
edf1ca07af
Feature/online/vs think 20250813 ( #3440 )
...
* add stream
* fix ernie_vl_reasoning_parsers
* fix bug
2025-08-15 18:33:58 +08:00
luukunn
bbd50c6717
add tool parser
2025-08-14 21:08:49 +08:00
luukunn
132a8ef425
Release/2.1 ( #3414 )
...
* Pre ce modified (#3335 ) (#3360 )
* Pre ce modified (#3335 )
* update
* update
* fix
* fix
* update
* update
* update
* fix
* update
* update
* update
* add ut fix pr(3367)
* [Bug Fix] Fix V1 video bug (#3387 )
* fix stopseq error info (#3342 )
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
* [BugFix] Fix default log level of paddleformers (#3377 )
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
* [Polish Code] Remove useless notes
* feat(log):add_request_and_response_log (#3392 )
* Optimize CI execution workflow. (#3371 ) (#3384 )
* fix
* [BugFix] fix control signal release failed (#3374 )
* [BugFix]
* [BugFix]
* [BugFix]
* [BugFix]
* fix
* fix
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1"
This reverts commit 02596fc537
, reversing
changes made to 03347626a6
.
* [XPU] Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3393 )
* fix v1 schedule oom bug
* fix v1 schedule oom bug
* [BugFix] fix ErnieProcessor not set raw_prediction (#3401 )
* [Doc]Release fastdeploy-xpu 2.1.0 (#3407 )
* fix v1 schedule oom bug
* fix v1 schedule oom bug
* update release note
* [Doc]Release fastdeploy-xpu 2.0.3 (#3408 )
* fix v1 schedule oom bug
* fix v1 schedule oom bug
* update release note
* update info
---------
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
Co-authored-by: xiaolei373 <zley373@gmail.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: yinwei <yinwei_hust@163.com >
Co-authored-by: memoryCoderC <1137889088@qq.com >
2025-08-14 20:53:47 +08:00
Jiang-Jia-Jun
e11331927f
[Sync Code] Update vs branch ( #3403 )
...
* Pre ce modified (#3335 ) (#3360 )
* Pre ce modified (#3335 )
* update
* update
* fix
* fix
* update
* update
* update
* fix
* update
* update
* update
* add ut fix pr(3367)
* [Bug Fix] Fix V1 video bug (#3387 )
* fix stopseq error info (#3342 )
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
* [BugFix] Fix default log level of paddleformers (#3377 )
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
* [Polish Code] Remove useless notes
* feat(log):add_request_and_response_log (#3392 )
* Optimize CI execution workflow. (#3371 ) (#3384 )
* fix
* [BugFix] fix control signal release failed (#3374 )
* [BugFix]
* [BugFix]
* [BugFix]
* [BugFix]
* fix
* fix
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com >
Co-authored-by: JYChen <zoooo0820@qq.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
Co-authored-by: xiaolei373 <zley373@gmail.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
2025-08-14 17:14:45 +08:00
luukunn
81092c0fe3
add tool parser
2025-08-13 16:06:22 +08:00
memoryCoderC
37b76158f9
Completion add raw_prediction/text_after_process ( #3362 )
2025-08-12 23:20:36 +08:00
memoryCoderC
fe2094609f
Release/2.1 ( #3361 )
...
* [BugFix] v1/completions add finish_reason
* update TestOpenAIServingCompletion for merge
2025-08-12 23:06:51 +08:00
gaoziyuan
b4bb54b56b
bugfix ( #3322 )
2025-08-12 16:16:37 +08:00
Jiang-Jia-Jun
eeec4bd15e
Remove useless code release/2.1 ( #3338 )
2025-08-12 11:32:50 +08:00
chenjian
25f51b0611
Fix block num in schduelr v1 for release 2.1 ( #3315 )
...
* fix bug for scheduler v0
* fix block num setting in scheduler v1 for release 2.1
* fix block num setting in scheduler v1 for release 2.1
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-12 00:41:05 +08:00
ming1753
9b07f85f6d
[Bug Fix] fix vl V1 schedule bug ( #3284 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-12 00:40:45 +08:00
Jiang-Jia-Jun
ca4e4ab911
Revert "[BugFix] fix ep ( #3290 )" ( #3317 )
...
This reverts commit 86ff68be4b
.
2025-08-11 16:17:58 +08:00
chenjian
c000cff744
fix scheduler bug in release2.1 ( #3295 )
2025-08-10 13:55:22 +08:00
lizexu123
86ff68be4b
[BugFix] fix ep ( #3290 )
...
* fix ep
* fix
2025-08-09 16:32:35 +08:00
yinwei
702c313ed1
revert pr ( #3286 )
2025-08-09 16:29:35 +08:00
ltd0924
6706ccb37e
[BugFix] fix too many open files problem ( #3275 )
2025-08-08 20:11:32 +08:00
JYChen
1b6f482c15
[Cherry-pick] fix stop seq ( #3263 )
...
* fix out-bound value for stop sequence
* catch error if there are out-of-bounds value
* check in offline mode
2025-08-07 19:11:37 +08:00
sg263
5d3bf308f6
merge develop trace FD_START ( #3253 )
...
Co-authored-by: shige <shige@baidu.com >
2025-08-07 11:10:55 +08:00
Sunny-bot1
f672a34f95
[FIX 2.1]fix bad_words when sending requests consecutively ( #3199 )
...
* fix bad_words
* fix log
* fix log
2025-08-06 15:47:27 +08:00
lizexu123
bc0b92bba4
[BugFix] support real batch_size ( #3109 ) ( #3217 )
...
* support real bsz
* fix
* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py
* add event_loop_ep
* fix
* Add comments
* fix
* support mtp real_batch_size
* fix
* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer
* fix
* fix VL real_seq_lens_this_time
* fix
* fix mtp
* fix
* fix mtp
* fix xpu
* fix
2025-08-06 14:30:33 +08:00
SunLei
3dd8492601
[Bugfix] Fix uninitialized decoded_token and add corresponding unit test ( #3201 )
...
* Update test_base_chat.py (#3183 )
* [Bugfix] Fix uninitialized decoded_token and add corresponding unit test.
---------
Co-authored-by: Divano <dddivano@outlook.com >
2025-08-05 10:55:22 +08:00
RAM
bd77a3a643
[Bug Fix] Fix bug of MLA Attention Backend ( #3178 )
...
* fix typo
* fix mla attention backend
2025-08-05 10:53:27 +08:00
yinwei
4367c09a5f
Fix out-of-memory issue during single-XPU deployment ( #3131 )
2025-08-04 16:02:43 +08:00
bukejiyu
8e789dcb67
fix load_pre_sharded_checkpoint ( #3152 ) ( #3169 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-04 15:44:10 +08:00
ltd0924
5f6fc7f7b9
Update cache_messager.py ( #3173 )
2025-08-04 15:09:17 +08:00
RAM
d4059cabf0
fix typo ( #3153 )
2025-08-01 22:34:59 +08:00
chen
c8dd5976ae
fix request_output sampling_params ( #3154 )
2025-08-01 22:34:33 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
chenjian
fe17410f9c
[BUG] Fix bug for pd in fd ( #3034 )
...
* Fix bug for pd in fd
* Fix bug for pd in fd
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:17:27 +08:00
Yuan Xiaolan
5f56d289a7
fix is_permuted ( #3098 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:58:05 +08:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
kevin
22cab724e8
[Feature] block scheduler v1 support prefix caching ( #3061 )
...
* block scheduler v1 support prefix cache
* update code
* update code
* fix code bug
* add timeout time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:29:19 +08:00
chenjian
32307283f1
Fix bug for offline inference in scheduler v1 ( #3117 )
2025-07-31 17:54:24 +08:00
RAM
d850660872
[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )
...
* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug
2025-07-31 00:09:31 +08:00
chenjian
fe0e3f508b
[BUG FIX] Fix bug when preempted request rescheduled ( #3080 )
...
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
2025-07-30 22:25:47 +08:00
Jiang-Jia-Jun
0616c208d2
[Feature] Support include_stop_str_in_output in completion api ( #3096 )
...
* [Feature] Support include_stop_str_in_output in completion api
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-30 22:18:48 +08:00
YuanRisheng
7dfdd157ac
[BugFix]Fix ep size ( #3092 )
...
* fix ep
* fix num_layer
2025-07-30 21:03:12 +08:00
ltd0924
d17886de19
[Feature] support ep in mixed mode ( #3001 )
...
* [LLM] support ep
* Update worker_process.py
* Update expert_service.py
* Update worker_process.py
* format files
2025-07-30 20:43:39 +08:00
Zhida Hu
3f8a41e68c
[*] fix the memory leak when modify qp to rts failed ( #3051 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-30 19:49:07 +08:00
李泳桦
b242150f94
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3058 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [fix] fix ci consistency test error with reasoning parser
* [doc] update docs related to metadata
* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
bukejiyu
db698bda01
qwen loader ( #3057 )
2025-07-30 19:09:38 +08:00
zhink
d89b6dd43f
adapter qwen3 moe attr for init ( #3066 )
...
adapter qwen3 moe attr for init
2025-07-30 16:49:28 +08:00
bukejiyu
8e203666d9
w4a8 offline ( #3074 )
...
* w4a8 offline
* update
* update
* update
2025-07-30 16:33:30 +08:00
ming1753
5acde4eb43
[Feature] Multimodal Scheduler V1 ( #3019 )
...
* [Feature] Support multimodal scheduler v1
* remove debug log
* fix bug
* fix format
* modify code
* fix bug
* fix bug
* fix bug
* modify code
2025-07-30 16:05:55 +08:00
Jiang-Jia-Jun
ffa0f4d99b
[Fix] Fix version function ( #3076 )
...
* [Fix] Fix version function
* Fix commit
* Fix commit
* fix code sync
* Update coverage_run.sh
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-30 16:05:24 +08:00
ltd0924
ecf2fd5b9a
[BugFix] vl encoder tokens dtype problem ( #3069 )
2025-07-30 15:20:53 +08:00
Yuan Xiaolan
35935da9e5
support W4A8 EPLB ( #3075 )
2025-07-30 14:34:12 +08:00
Yzc216
159767717d
[Feature] multi source download ( #3072 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
* modify model download path
2025-07-30 14:10:13 +08:00
YuanRisheng
99a70fc722
unify parallel config ( #3070 )
2025-07-30 11:41:23 +08:00
Sunny-bot1
74aa31d15b
[Feature] support bad_words ( #3055 )
...
* support bad_words
* support online infer bad_words
* update
* add CI test
* update
* update
* update
---------
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-07-30 09:31:29 +08:00