YuanRisheng
0253381fb9
fix config ( #2858 )
2025-07-16 11:40:10 +08:00
freeliuzc
2d1184aefe
[Fix] fix expert_parallel bug in decoder stage ( #2848 )
2025-07-16 11:08:18 +08:00
yulangz
17314ee126
[XPU] Update doc and add scripts for downloading dependencies ( #2845 )
...
* [XPU] update xvllm download
* update supported models
* fix xpu model runner in huge memory with small model
* update doc
2025-07-16 11:05:56 +08:00
YuanRisheng
101ad33332
[BugFix] Fix Configs ( #2849 )
...
* fix config
* fix config
2025-07-15 19:50:36 -07:00
RAM
0fad10b35a
[Executor] CUDA Graph support padding batch ( #2844 )
...
* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug
2025-07-15 19:49:01 -07:00
Yuanle Liu
61b3997b85
refactor rl get_name_mappings_to_training ( #2847 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix
2025-07-15 07:31:42 -07:00
Zero Rains
e7bcbbab52
Merge vl execution path into normal execution path ( #2829 )
...
* merge vl model into gpu_model runner
Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6
* fix chinese
Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a
* fix the parse parameter
Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe
* fix the bug in online_inference
Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559
* polish code
Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c
2025-07-15 22:20:03 +08:00
AIbin
fd91da7b41
【Inference Optimize】Support wint2 triton kernel about triton_utils_v2 ( #2842 )
...
* update supported_models doc
2025-07-15 14:35:40 +08:00
bukejiyu
15c8c240b5
[vl] Use top_k from config.json ( #2831 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-15 00:39:12 +08:00
freeliuzc
7cdd8d290d
[MTP] optimize mtp infer speed ( #2840 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-14 19:50:22 +08:00
YuanRisheng
4c7b8bc458
Simplify the Config code ( #2770 )
...
* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe
2025-07-14 19:50:05 +08:00
freeliuzc
7f64d408a9
[MTP] support expert-parellel in mtp ( #2835 )
2025-07-14 14:28:50 +08:00
lddfym
ece88596ed
fix spelling error ( #2827 )
2025-07-14 13:12:57 +08:00
bukejiyu
bad53c6b6e
[vl]remove duplicated load logic ( #2744 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-13 07:36:26 +08:00
zhenwenDang
d48c03413f
Feature/logprob bug fix ( #2817 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix: handle missing logprobs at step 0 and incorrect finish reason with max_completion_tokens
* Prevent response_logprobs.logprob_token_ids[0] from going out of bounds
2025-07-12 16:48:51 +08:00
gaoziyuan
e9e8443ea8
fix num_blocks_local when small size model in TP2 running mode ( #2792 )
2025-07-12 12:50:48 +08:00
gaoziyuan
749b2e9c89
support qwen3moe name_mapping ( #2820 )
2025-07-12 12:05:54 +08:00
Sunny-bot1
f6ad26fc08
fix topp default value ( #2814 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-11 17:10:21 +08:00
zhink
c08561c13a
[Feature] support tensor-parallel-size>num_key_value_heads for qwen3 ( #2799 )
2025-07-11 15:09:43 +08:00
chen
2c3607407f
check ( #2811 )
2025-07-11 13:54:52 +08:00
lddfym
b5e4288704
Global scheduler supports configuring hot updates ( #2807 )
...
* Check if the controller port is available
* Global scheduler supports configuring hot updates
* add interface: /controller/scheduler
* add interface: /controller/scheduler
2025-07-11 13:38:07 +08:00
yinwei
e98937cbba
delete useless file ( #2772 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-11 11:46:04 +08:00
Sunny-bot1
240d6236bc
[Fix]fix top_k_top_p sampling ( #2801 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix topk-topp
* update
* add base_non_truncated
2025-07-10 22:35:10 +08:00
littledgg
59071268b6
[Executor] Move forward_meta.py to fastdeploy/model_executor ( #2774 )
...
* Use PEP 563 in attention.py and fix conflict
* merge commit
* Change what was left out last time
2025-07-10 20:36:51 +08:00
lizexu123
8c660a0dfb
[BugFix] fix RMSNorm rms_norm_esp ( #2797 )
...
* fix rms
* add vl
* fix
* add vl
* fix
* fix
2025-07-10 20:02:24 +08:00
LiqinruiG
ce5adec877
[Doc] modify offline-inerence docs ( #2800 )
...
* modify offline-inerence docs
* [bug] remove tool_call_content
2025-07-10 19:41:12 +08:00
yulangz
830de5a925
[XPU] Supports TP4 deployment on 4,5,6,7 ( #2794 )
...
* 支持通过 XPU_VISIBLE_DEVICES 指定 4,5,6,7 卡运行
* 修改 XPU 文档中多卡说明
2025-07-10 16:48:08 +08:00
chen
d33105baeb
[Feature] Online Chat API Support Return logprobs ( #2777 )
...
* online chat support logprobs
* check xpu
* check vl_gpu_model_runner and xpu_model_runner
* get_worker() check platform
2025-07-10 16:33:40 +08:00
K11OntheBoat
24f934f1f9
[BugFix] Fix low prediction accuracy of deepseekv3 ( #2798 )
2025-07-10 16:16:44 +08:00
Sunny-bot1
1e2319cbef
Rename top_p_sampling to top_k_top_p_sampling ( #2791 )
2025-07-10 00:09:25 -07:00
Sunny-bot1
e45050cae3
[Feature] support top_k_top_p sampling ( #2753 )
...
* support top_k_top_p sampling
* fix
* add api param
* add api para
* fix
* fix
* fix
* fix
* fix
* fix
* fix
2025-07-09 20:58:58 -07:00
Ryan
b0f525955c
[SOT] Remove breakgraph in post processing && fix datatype ( #2780 )
2025-07-10 11:26:00 +08:00
Yuanle Liu
2ea267f624
assert prompt len > 0 ( #2773 )
2025-07-10 11:14:52 +08:00
0x3878f
1d8af7ab73
Add env variable for dy2st ( #2779 )
2025-07-10 11:06:06 +08:00
Jiang-Jia-Jun
a4fdb3970b
[BugFix] Fix vocab size error for ernie model ( #2785 )
...
* [BugFix] Fix vocab size error for ernie model
* [BugFix] Fix vocab size error for ernie model
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-10 01:05:51 +08:00
Jiang-Jia-Jun
2a86928657
[BugFix Revert] Fix vocab size error for ernie model
2025-07-09 22:14:54 +08:00
Jiang-Jia-Jun
b1c53fa779
[BugFix] Fix vocab size error for ernie model
2025-07-09 22:13:41 +08:00
lizexu123
da20cf681e
[Bug fix] Fixed the garbled text issues in Qwen3-8B ( #2783 )
2025-07-09 22:03:57 +08:00
chen
888780ffde
[Feature] block_wise_fp8 support triton_moe_backend ( #2767 )
2025-07-09 19:22:47 +08:00
RAM
e3768c5a83
[Executor] Fix bug of logger.debug ( #2778 )
2025-07-09 04:13:43 -07:00
lifulll
1f28bdf994
dcu adapter ernie45t ( #2756 )
...
Co-authored-by: lifu <lifu@sugon.com >
Co-authored-by: yongqiangma <xing.wo@163.com >
2025-07-09 18:56:27 +08:00
RAM
03a74995b8
Clear dead code And supplementary notes ( #2757 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* 1.supplementary notes 2.delete dead code
* fix bug of forward meta
* Global modification of forward meta
* fix vl model_runner bug
2025-07-09 16:17:34 +08:00
zhink
b89180f1cd
[Feature] support custom all-reduce ( #2758 )
...
* [Feature] support custom all-reduce
* add vllm adapted
2025-07-09 16:00:27 +08:00
yulangz
be21ef5047
[XPU] Supports BF16 for ERNIE-4.5-21B-A3B and ERNIE-4.5-0.3B ( #2765 )
...
* fix no quant xpu moe
* change dir of xpu moe weight only
2025-07-09 15:57:51 +08:00
yulangz
0350831c2b
fix xpu offline demo garbled output ( #2763 )
2025-07-09 14:51:20 +08:00
RichardWooSJTU
fee544e808
fix ep prefill ( #2762 )
2025-07-09 14:03:05 +08:00
Ryan
c4718fd693
Enable SOT D2St in Multimodal Model ( #2735 )
2025-07-09 12:26:18 +08:00
GoldPancake
f7cad30a38
[Feature] Add speculative decoding simulation benchmark. ( #2751 )
...
* Add speculative decoding simulation benchmark
* Fix the name of the parameter
2025-07-09 12:08:43 +08:00
gaoziyuan
6b10c19482
【Feature】add fd commit/branch info when start server ( #2752 )
...
* add_commit_config
* fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-09 11:52:22 +08:00
RichardWooSJTU
6610aa29d0
Revert "[Bug fix] fix attention rank init ( #2743 )" ( #2761 )
...
This reverts commit e8bbe7244b
.
2025-07-09 10:38:12 +08:00