chenjian
c487b62ee0
[Bug fix] Fix memory allocation ( #3475 )
...
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Fix bug for memory allocation
2025-08-19 19:48:24 +08:00
chenjian
d2f6c3b998
[Bug fix] Fix bug for seq_len_encoder is 1 ( #3467 )
2025-08-19 15:21:32 +08:00
chenjian
aba94169dc
[Feature] Support batched tokens for EP ( #3415 )
...
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
* Support batched tokens for EP and fix bug
2025-08-18 11:43:36 +08:00
chenjian
3f86ae0007
fix cache messager bug when d restart ( #3386 )
2025-08-14 11:43:59 +08:00
chenjian
89177d881c
[Bug fix] Fix zmq core bug ( #3357 )
...
* [Bug fix] Fix zmq core bug due to concurrently used by threads
* Fix zmq core bug due to concurrently used by threads
2025-08-13 20:24:39 +08:00
chenjian
7573802a88
[Feature] Support mtp ep in fd ( #3340 )
...
* [Optimize] Add metrics for analysing perf
* Fix bug in mtp
2025-08-11 21:49:44 +08:00
chenjian
110f33a530
[Bug fix] Test td cache messager ( #3242 )
...
* support disable cache task in decode node
* fix busg
* Update engine.py
* Update expert_service.py
* Update splitwise_connector.py
* Optimize log for debug
* Optimize log for debug
* fix bug
---------
Co-authored-by: ltd0924 <ltd0924@sina.com >
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
2025-08-06 15:52:45 +08:00
chenjian
a4572a5e5d
fix bug for pd step signal ( #3230 )
2025-08-06 10:41:52 +08:00
chenjian
a9d231c900
Fix bug for concurrently visit zmq ( #3233 )
2025-08-06 10:41:10 +08:00
ltd0924
b20ffe3697
[Feature] optimize expert parallel ( #3196 )
...
* optimize
* Update expert_service.py
* Update worker_process.py
* optimize
2025-08-05 17:34:24 +08:00
ltd0924
dcf9c2daff
[Feature] Optimize prefix cache ( #3208 )
...
* [LLM] support ep
* Update worker_process.py
* Update expert_service.py
* Update worker_process.py
* format files
* optimize prefix cache
* optimize prefix cache
* optimize prefix cache
* pre commit format
* pre commit format
* pre commit format
* Update cache_messager.py
2025-08-05 17:13:11 +08:00
chenjian
9f9971844f
[Feature] Support ep pd with external module ( #3194 )
...
* Support external module
* Support external module
* Support external module
* Support external module
* refactor code to make it more clear
* refactor code to make it more clear
* refactor code to make it more clear
* refactor code to make it more clear
* fix according to review
* fix according to review
* fix according to review
* fix according to review
* fix according to review
* fix according to review
* fix bug
* fix bug
* fix bug
* merge
---------
Co-authored-by: root <root@tjdm-inf-sci-k8s-hzz2-h12ni8-0202.tjdm.baidu.com >
2025-08-04 20:32:41 +08:00
gaoziyuan
0443587a57
【Feature】support qwen3 name_mapping ( #3179 )
...
* add fd plugins && rm model_classed
* fix reviews
* add docs
* fix
* fix unitest ci
* support qwen3 name_mapping
2025-08-04 01:34:07 -07:00
Zero Rains
17f51f0c92
[unitest] fix the bug in test_sampler ( #3157 )
2025-08-04 01:23:25 -07:00
YuanRisheng
79bbacc152
Fix approve shell scripts ( #3108 )
...
* fix approve
* fix
2025-08-04 15:51:33 +08:00
Divano
3bfb2eca92
Update test_base_chat.py ( #3183 )
2025-08-04 15:09:53 +08:00
ltd0924
c9e6ce1518
Update cache_messager.py ( #3172 )
2025-08-04 14:32:34 +08:00
gaoziyuan
4021d66ea5
【Feature】add fd plugins && rm model_classes ( #3123 )
...
* add fd plugins && rm model_classed
* fix reviews
* add docs
* fix
* fix unitest ci
2025-08-03 19:53:20 -07:00
bukejiyu
1582814905
fix load_pre_sharded_checkpoint ( #3152 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-04 10:44:20 +08:00
Divano
66d3bb89ad
Update __init__.py ( #3163 )
...
升级测试基类兼容性
2025-08-04 09:40:09 +08:00
AIbin
22fe695f1c
【Inference Optimize】Support automatic generation of marlin kernel ( #3149 )
...
* Support automatic generation of marlin kernel
2025-08-01 22:43:18 +08:00
ApplEOFDiscord
b71cbb466d
[Feature] remove dependency on enable_mm and refine multimodal's code ( #3014 )
...
* remove dependency on enable_mm
* fix codestyle check error
* fix codestyle check error
* update docs
* resolve conflicts on model config
* fix unit test error
* fix code style check error
---------
Co-authored-by: shige <1021937542@qq.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-01 20:01:18 +08:00
plusNew001
243394044d
[XPU]Updata XPU dockerfiles ( #3144 )
...
* [CI] add xpu ci case
* [CI]Update run_ci_xpu.sh
* [XPU]Update Dockerfile.xpu
* Update Dockerfile.xpu
2025-08-01 19:41:59 +08:00
Zhang Yulong
0eb32bb9c8
add cases ( #3155 )
2025-08-01 18:38:57 +08:00
yangjianfengo1
64d7a3194d
集中式支持fa3 ( #3112 )
2025-08-01 18:03:36 +08:00
YUNSHEN XIE
bdb83e007d
fix ci ( #3141 )
2025-08-01 17:42:26 +08:00
Divano
50db0d7ba9
add case ( #3150 )
...
* add test base class
* fix codestyle
* fix codestyle
* add base chat
2025-08-01 17:30:58 +08:00
Ryan
94264bbf60
[Code Simplification] Refactor Post-processing in VL Model Forward Method ( #2937 )
...
* rm sth useless
* refactor model forward
* mv bool index to kernel
2025-08-01 17:28:07 +08:00
yinwei
3a4db15765
Fix out-of-memory issue during single-XPU deployment ( #3133 )
2025-08-01 17:12:03 +08:00
JYChen
c34088b0fd
fix stop seq unittest ( #3126 )
2025-08-01 16:50:05 +08:00
ming1753
fc5f43c6bc
[Docs] Optimal Deployment ( #2768 )
2025-08-01 11:56:27 +08:00
chen
a2f5cc54f8
moe preprocess op support 160 experts and fused_moe triton kernel name add K ( #3121 )
2025-08-01 10:46:20 +08:00
Divano
1d93565082
[CE] Add base test class for web server testing ( #3120 )
...
* add test base class
* fix codestyle
* fix codestyle
2025-07-31 23:28:50 +08:00
YUNSHEN XIE
e1011e92d9
disable test_cuda_graph.py ( #3124 )
2025-07-31 22:03:48 +08:00
plusNew001
8c63237cfa
[CI] add xpu ci case ( #3111 )
...
* [CI] add xpu ci case
* [CI]Update run_ci_xpu.sh
2025-07-31 22:03:34 +08:00
YUNSHEN XIE
ff6a109b4d
Describe PR diff coverage using JSON file ( #3114 )
...
* Refactored ci pipeline
* update
* Describe PR diff coverage using JSON file
* remove pip cache setting from Approve
* fix
* update
2025-07-31 21:59:20 +08:00
SunLei
dade19d7a4
[Feature] General support for logprobs ( #2974 )
...
* [Feature] support logprobs in chat/completions and completions endpoints
* Temporarily comment out text_offset due to incorrect logic
* Clean up temporary debug prints
* [Feature] support logprobs in offline mode via SamplingParams
* fix: serialize Logprob as dict before zmq send to fix msgpack error
* refactor: remove redundant methods to simplify codebase
* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization
* refactor: centralize param validation in engine_client to reduce duplication
* revert: rollback changes in offline_demo.py
* revert: rollback changes in offline_demo.py
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
* [bugfix] fix parameter validation for logprobs
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:25:56 +08:00
chenjian
fe17410f9c
[BUG] Fix bug for pd in fd ( #3034 )
...
* Fix bug for pd in fd
* Fix bug for pd in fd
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 20:17:27 +08:00
Zhang Yulong
1a543bca29
Fix test_EB_Lite_serving.py ( #3119 )
...
* Fix test_EB_Lite_serving.py
* fix test_EB_Lite_serving.py
2025-07-31 20:15:25 +08:00
Yuan Xiaolan
5f56d289a7
fix is_permuted ( #3098 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:58:05 +08:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
kevin
22cab724e8
[Feature] block scheduler v1 support prefix caching ( #3061 )
...
* block scheduler v1 support prefix cache
* update code
* update code
* fix code bug
* add timeout time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:29:19 +08:00
chenjian
32307283f1
Fix bug for offline inference in scheduler v1 ( #3117 )
2025-07-31 17:54:24 +08:00
YUNSHEN XIE
583eae2fd1
fix ci ( #3106 )
...
* fix ci
* disable test_non_streaming_chat_with_min_tokens
2025-07-31 17:25:08 +08:00
JYChen
1ef38b1563
[doc] best practice for eb45 text models ( #3002 )
...
* [doc] best practice for eb45 text models
* fix docs
2025-07-31 17:21:55 +08:00
Jiang-Jia-Jun
4498058722
Update README.md
2025-07-31 15:33:12 +08:00
Jiang-Jia-Jun
66304cf921
Update sampling.md
2025-07-31 15:02:57 +08:00
yinwei
5b9aec1f10
xpu release 2.0.3 ( #3105 )
2025-07-31 14:26:07 +08:00
YUNSHEN XIE
66c3835a46
add approve ci ( #3093 )
...
* add approve ci
* fix
* fix
2025-07-31 10:10:10 +08:00
RAM
d850660872
[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel ( #2989 )
...
* reset decoder_block_shape_q buffer
* refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch
* update decode_max_tile_size
* fix pre-commit
* update block_multihead_attn_backend
* update flas attn backend
* update MLA Attention
* update XPU Attention
* update gcu,iluvatar model runner
* Update MTP
* fix MTP bug
2025-07-31 00:09:31 +08:00