Yuanle Liu
fac2f64837
delete parallel_state.py ( #3250 )
2025-08-08 11:03:29 +08:00
yzwu
fbdd6b0663
[Iluvatar GPU] Optimze attention and moe performance ( #3234 )
2025-08-08 10:51:24 +08:00
bukejiyu
37569cca86
[feat]add fast_weights_iterator ( #3258 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add fast_weights_iterator
* update
* update
2025-08-07 22:36:46 +08:00
chenjian
5f0b30f6d0
support logprob in scheduler v1 ( #3249 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-07 20:14:01 +08:00
Yzc216
6037dd5d9c
[fix] multi source download ( #3259 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
* modify model download path
* add requirements
* error optimization
* 连接失败兜底
* 连接失败兜底
* 连接失败兜底
* unit test
* unit test
* unit test
* test
* test
* 兜底修改
* Trigger CI
2025-08-07 19:30:39 +08:00
JYChen
9423c577fe
[stop_seq] fix out-bound value for stop sequence ( #3216 )
...
* fix out-bound value for stop sequence
* catch error if there are out-of-bounds value
* check in offline mode
* add ut tests
2025-08-07 15:40:21 +08:00
李泳桦
09cc4e2802
[fix] fix completion stream api output_tokens not in usage ( #3247 )
2025-08-07 10:36:00 +08:00
Yzc216
d9e3f88f9e
[Feature] multi source download ( #3125 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
* modify model download path
* add requirements
* error optimization
* 连接失败兜底
* 连接失败兜底
* 连接失败兜底
* unit test
* unit test
* unit test
* test
* test
2025-08-07 00:40:27 +08:00
bukejiyu
9408e667a5
[bugfix]fix blockwisefp8 and all_reduce ( #3243 )
...
* fix
* update
* fix linear for prequant loader
2025-08-06 23:54:33 +08:00
yangjianfengo1
3a15e0c53e
【Fix Bug】 修复 fa3 支持集中式bug ( #3235 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix fa3 集中式bug
* 增加qknorm参数
2025-08-06 16:24:27 +08:00
lizexu123
afff4d37ea
[Feature] support seed parameter ( #3161 )
...
* support seed
* fix
* add SamplingMetadata seed test
* The next_tokens values are inconsistent!
* add air and rejection seed test
* fix
* add SamplingParams seed test
* fix seed=0
* Default to defualt
* fix
* fix args_utils
* fix review
* fix review
* fix
* fix
* add xpu,gcu,iluvatar support seed
* fix
2025-08-06 15:20:47 +08:00
bukejiyu
20839abccf
qwen3_moe ( #3084 )
2025-08-06 14:45:27 +08:00
Zero Rains
36dc73470d
Fix the confused enable_early_stop when only set early_stop_config ( #3214 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix the confused early_stop_config when only set early_stop_config
* pre-commit
* write a general method
2025-08-06 11:42:27 +08:00
sg263
841e831575
[Trace]add trace when fd start ( #3174 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* fix annotation
* fix annotation when add opentelemetry
* fix opentelemetry-instrumentation-fastapi
* fix pentelemetry-bootstrap
* fix opentelemetry can not work in uvicorn
* move conf to env
* fd start add trace
* fix pre-commit
* fix pre-commit
* change FD_JOB_ID
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: shige <shige@baidu.com >
2025-08-05 21:18:27 +08:00
Yuan Xiaolan
7ce00e597c
support qk norm ( #3145 )
2025-08-05 16:46:14 +08:00
RAM
4a10e29804
fix mla attention backend ( #3176 )
2025-08-05 16:43:15 +08:00
Yuan Xiaolan
af543b7f0f
revise get_moe_scores ( #3164 )
2025-08-05 16:43:07 +08:00
lizexu123
b01cfd6007
[BugFix] support real batch_size ( #3109 )
...
* support real bsz
* fix
* fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py
* add event_loop_ep
* fix
* Add comments
* fix
* support mtp real_batch_size
* fix
* self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer
* fix
* fix VL real_seq_lens_this_time
* fix
* fix mtp
* fix
* fix mtp
* fix xpu
* fix
2025-08-05 16:33:54 +08:00
Jiang-Jia-Jun
55939f7942
Update engine.py
2025-08-05 16:10:36 +08:00
RichardWooSJTU
1e9a8e8cef
fix lm head bias ( #3185 )
...
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-08-05 15:40:24 +08:00
RichardWooSJTU
f5c64a074c
[EP] Refactor DeepEP Engine Organization for Mixed Mode & Buffer Management Optimization ( #3182 )
...
* Add support for mixed-ep across multi nodes
* code refine
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-08-05 15:40:11 +08:00
lizhenyun01
fe540f6caa
[plugin] Custom model_runner/model support ( #3186 )
...
* support custom model&&model_runner
* fix merge
* add test && update doc
* fix codestyle
* fix unittest
* load model in rl
2025-08-04 18:52:39 -07:00
Sunny-bot1
72ef5a9c93
[FIX]fix bad_words when sending requests consecutively ( #3197 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix bad_words
* fix log
* fix log
2025-08-04 05:59:41 -07:00
Yuan Xiaolan
1f8289e106
fix expertwise_scale ( #3181 )
2025-08-04 20:06:15 +08:00
SunLei
68bc1d12c0
[Bugfix] Fix uninitialized decoded_token and add corresponding unit test. ( #3195 )
2025-08-04 19:23:58 +08:00
Longzhi Wang
01d7586661
[Bug fix] Fix cudagraph when use ep. ( #3130 )
...
* fix cudagraph when use ep
* fix typo
* reduce full length to adapt large bsz such 128/256
2025-08-04 18:06:18 +08:00
周周周
2bd8a50649
remove useless code ( #3166 )
2025-08-04 18:03:08 +08:00
gaoziyuan
0443587a57
【Feature】support qwen3 name_mapping ( #3179 )
...
* add fd plugins && rm model_classed
* fix reviews
* add docs
* fix
* fix unitest ci
* support qwen3 name_mapping
2025-08-04 01:34:07 -07:00
ltd0924
c9e6ce1518
Update cache_messager.py ( #3172 )
2025-08-04 14:32:34 +08:00
gaoziyuan
4021d66ea5
【Feature】add fd plugins && rm model_classes ( #3123 )
...
* add fd plugins && rm model_classed
* fix reviews
* add docs
* fix
* fix unitest ci
2025-08-03 19:53:20 -07:00
bukejiyu
1582814905
fix load_pre_sharded_checkpoint ( #3152 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-04 10:44:20 +08:00
ApplEOFDiscord
b71cbb466d
[Feature] remove dependency on enable_mm and refine multimodal's code ( #3014 )
...
* remove dependency on enable_mm
* fix codestyle check error
* fix codestyle check error
* update docs
* resolve conflicts on model config
* fix unit test error
* fix code style check error
---------
Co-authored-by: shige <1021937542@qq.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-01 20:01:18 +08:00
yangjianfengo1
64d7a3194d
集中式支持fa3 ( #3112 )
2025-08-01 18:03:36 +08:00
Ryan
94264bbf60
[Code Simplification] Refactor Post-processing in VL Model Forward Method ( #2937 )
...
* rm sth useless
* refactor model forward
* mv bool index to kernel
2025-08-01 17:28:07 +08:00
yinwei
3a4db15765
Fix out-of-memory issue during single-XPU deployment ( #3133 )
2025-08-01 17:12:03 +08:00
chen
a2f5cc54f8
moe preprocess op support 160 experts and fused_moe triton kernel name add K ( #3121 )
2025-08-01 10:46:20 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
chenjian
fe17410f9c
[BUG] Fix bug for pd in fd ( #3034 )
...
* Fix bug for pd in fd
* Fix bug for pd in fd
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:17:27 +08:00
Yuan Xiaolan
5f56d289a7
fix is_permuted ( #3098 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:58:05 +08:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
kevin
22cab724e8
[Feature] block scheduler v1 support prefix caching ( #3061 )
...
* block scheduler v1 support prefix cache
* update code
* update code
* fix code bug
* add timeout time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:29:19 +08:00
chenjian
32307283f1
Fix bug for offline inference in scheduler v1 ( #3117 )
2025-07-31 17:54:24 +08:00
RAM
d850660872
[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )
...
* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug
2025-07-31 00:09:31 +08:00
chenjian
fe0e3f508b
[BUG FIX] Fix bug when preempted request rescheduled ( #3080 )
...
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
2025-07-30 22:25:47 +08:00
Jiang-Jia-Jun
0616c208d2
[Feature] Support include_stop_str_in_output in completion api ( #3096 )
...
* [Feature] Support include_stop_str_in_output in completion api
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-30 22:18:48 +08:00
YuanRisheng
7dfdd157ac
[BugFix]Fix ep size ( #3092 )
...
* fix ep
* fix num_layer
2025-07-30 21:03:12 +08:00
ltd0924
d17886de19
[Feature] support ep in mixed mode ( #3001 )
...
* [LLM] support ep
* Update worker_process.py
* Update expert_service.py
* Update worker_process.py
* format files
2025-07-30 20:43:39 +08:00
Zhida Hu
3f8a41e68c
[*] fix the memory leak when modify qp to rts failed ( #3051 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-30 19:49:07 +08:00
李泳桦
b242150f94
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3058 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [fix] fix ci consistency test error with reasoning parser
* [doc] update docs related to metadata
* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
bukejiyu
db698bda01
qwen loader ( #3057 )
2025-07-30 19:09:38 +08:00