memoryCoderC
fe2094609f
Release/2.1 ( #3361 )
...
* [BugFix] v1/completions add finish_reason
* update TestOpenAIServingCompletion for merge
2025-08-12 23:06:51 +08:00
gaoziyuan
b4bb54b56b
bugfix ( #3322 )
2025-08-12 16:16:37 +08:00
Jiang-Jia-Jun
eeec4bd15e
Remove useless code release/2.1 ( #3338 )
2025-08-12 11:32:50 +08:00
chenjian
d2592750f7
fix bug for scheduler v0 ( #3306 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-12 00:41:15 +08:00
chenjian
25f51b0611
Fix block num in schduelr v1 for release 2.1 ( #3315 )
...
* fix bug for scheduler v0
* fix block num setting in scheduler v1 for release 2.1
* fix block num setting in scheduler v1 for release 2.1
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-12 00:41:05 +08:00
ming1753
9b07f85f6d
[Bug Fix] fix vl V1 schedule bug ( #3284 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YUNSHEN XIE <1084314248@qq.com >
2025-08-12 00:40:45 +08:00
Sunny-bot1
2fe31c6f0f
[Docs]fix sampling docs 2.1 ( #3333 )
...
* [Docs]fix sampling docs (#3113 )
* fix sampling docs
* fix sampling docs
* update
* fix docs
2025-08-11 21:04:10 +08:00
YUNSHEN XIE
a33e557732
fix ci pypi index error ( #3327 )
2025-08-11 20:24:27 +08:00
kevin
054c790642
fix uvicorn multi worker error ( #3309 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-11 20:19:31 +08:00
Jiang-Jia-Jun
ca4e4ab911
Revert "[BugFix] fix ep ( #3290 )" ( #3317 )
...
This reverts commit 86ff68be4b .
2025-08-11 16:17:58 +08:00
chenjian
c000cff744
fix scheduler bug in release2.1 ( #3295 )
2025-08-10 13:55:22 +08:00
lizexu123
86ff68be4b
[BugFix] fix ep ( #3290 )
...
* fix ep
* fix
2025-08-09 16:32:35 +08:00
yinwei
702c313ed1
revert pr ( #3286 )
2025-08-09 16:29:35 +08:00
ltd0924
6706ccb37e
[BugFix] fix too many open files problem ( #3275 )
2025-08-08 20:11:32 +08:00
JYChen
1b6f482c15
[Cherry-pick] fix stop seq ( #3263 )
...
* fix out-bound value for stop sequence
* catch error if there are out-of-bounds value
* check in offline mode
2025-08-07 19:11:37 +08:00
sg263
5d3bf308f6
merge develop trace FD_START ( #3253 )
...
Co-authored-by: shige <shige@baidu.com >
2025-08-07 11:10:55 +08:00
Sunny-bot1
f672a34f95
[FIX 2.1]fix bad_words when sending requests consecutively ( #3199 )
...
* fix bad_words
* fix log
* fix log
2025-08-06 15:47:27 +08:00
lizexu123
bc0b92bba4
[BugFix] support real batch_size ( #3109 ) ( #3217 )
...
* support real bsz
* fix
* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py
* add event_loop_ep
* fix
* Add comments
* fix
* support mtp real_batch_size
* fix
* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer
* fix
* fix VL real_seq_lens_this_time
* fix
* fix mtp
* fix
* fix mtp
* fix xpu
* fix
2025-08-06 14:30:33 +08:00
SunLei
3dd8492601
[Bugfix] Fix uninitialized decoded_token and add corresponding unit test ( #3201 )
...
* Update test_base_chat.py (#3183 )
* [Bugfix] Fix uninitialized decoded_token and add corresponding unit test.
---------
Co-authored-by: Divano <dddivano@outlook.com >
2025-08-05 10:55:22 +08:00
RAM
bd77a3a643
[Bug Fix] Fix bug of MLA Attention Backend ( #3178 )
...
* fix typo
* fix mla attention backend
2025-08-05 10:53:27 +08:00
YUNSHEN XIE
9561603ed9
Apply CI fix from Develop ( #3151 )
...
* fix ci approve
* Describe PR diff coverage using JSON file (#3114 )
* Refactored ci pipeline
* update
* Describe PR diff coverage using JSON file
* remove pip cache setting from Approve
* fix
* update
* fix ci (#3141 )
* fix
2025-08-04 16:30:56 +08:00
plusNew001
e26313a355
Update Dockerfile.xpu ( #3147 )
2025-08-04 16:25:33 +08:00
yinwei
4367c09a5f
Fix out-of-memory issue during single-XPU deployment ( #3131 )
2025-08-04 16:02:43 +08:00
bukejiyu
8e789dcb67
fix load_pre_sharded_checkpoint ( #3152 ) ( #3169 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-04 15:44:10 +08:00
ltd0924
5f6fc7f7b9
Update cache_messager.py ( #3173 )
2025-08-04 15:09:17 +08:00
RAM
d4059cabf0
fix typo ( #3153 )
2025-08-01 22:34:59 +08:00
chen
c8dd5976ae
fix request_output sampling_params ( #3154 )
2025-08-01 22:34:33 +08:00
Jiang-Jia-Jun
4880c16be3
Update setup.py
2025-07-31 20:30:24 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
chenjian
fe17410f9c
[BUG] Fix bug for pd in fd ( #3034 )
...
* Fix bug for pd in fd
* Fix bug for pd in fd
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:17:27 +08:00
Zhang Yulong
1a543bca29
Fix test_EB_Lite_serving.py ( #3119 )
...
* Fix test_EB_Lite_serving.py
* fix test_EB_Lite_serving.py
2025-07-31 20:15:25 +08:00
Yuan Xiaolan
5f56d289a7
fix is_permuted ( #3098 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:58:05 +08:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
kevin
22cab724e8
[Feature] block scheduler v1 support prefix caching ( #3061 )
...
* block scheduler v1 support prefix cache
* update code
* update code
* fix code bug
* add timeout time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:29:19 +08:00
chenjian
32307283f1
Fix bug for offline inference in scheduler v1 ( #3117 )
2025-07-31 17:54:24 +08:00
YUNSHEN XIE
583eae2fd1
fix ci ( #3106 )
...
* fix ci
* disable test_non_streaming_chat_with_min_tokens
2025-07-31 17:25:08 +08:00
JYChen
1ef38b1563
[doc] best practice for eb45 text models ( #3002 )
...
* [doc] best practice for eb45 text models
* fix docs
2025-07-31 17:21:55 +08:00
Jiang-Jia-Jun
4498058722
Update README.md
2025-07-31 15:33:12 +08:00
Jiang-Jia-Jun
66304cf921
Update sampling.md
2025-07-31 15:02:57 +08:00
yinwei
5b9aec1f10
xpu release 2.0.3 ( #3105 )
2025-07-31 14:26:07 +08:00
YUNSHEN XIE
66c3835a46
add approve ci ( #3093 )
...
* add approve ci
* fix
* fix
2025-07-31 10:10:10 +08:00
RAM
d850660872
[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )
...
* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug
2025-07-31 00:09:31 +08:00
Jiang-Jia-Jun
998968f1e8
[Doc] Update parameters of serving
2025-07-30 22:35:01 +08:00
chenjian
fe0e3f508b
[BUG FIX] Fix bug when preempted request rescheduled ( #3080 )
...
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
2025-07-30 22:25:47 +08:00
Jiang-Jia-Jun
0616c208d2
[Feature] Support include_stop_str_in_output in completion api ( #3096 )
...
* [Feature] Support include_stop_str_in_output in completion api
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-30 22:18:48 +08:00
YuanRisheng
7dfdd157ac
[BugFix]Fix ep size ( #3092 )
...
* fix ep
* fix num_layer
2025-07-30 21:03:12 +08:00
ltd0924
d17886de19
[Feature] support ep in mixed mode ( #3001 )
...
* [LLM] support ep
* Update worker_process.py
* Update expert_service.py
* Update worker_process.py
* format files
2025-07-30 20:43:39 +08:00
JYChen
bd29b2aaca
add stop_seqs doc ( #3090 )
2025-07-30 20:36:18 +08:00
Jiang-Jia-Jun
6ead7a3a49
Update setup.py
2025-07-30 20:21:41 +08:00
YUNSHEN XIE
e4ba9a0dde
debug use ( #3095 )
2025-07-30 20:18:36 +08:00