gaoziyuan
5224f6c434
support qwen3 name_mapping ( #3170 )
2025-08-04 16:37:40 +08:00
yinwei
bfef09dd73
update user email ( #3087 )
2025-07-31 14:25:31 +08:00
LokeZhou
1d46420c49
[cherry-pick][MM_PROCESS] add _extract_labels ( #2879 ) ( #2993 )
v2.0.3
2025-07-24 11:04:43 +08:00
ltd0924
fb0f284e67
[BugFix] fix prompt token ids type ( #2994 )
...
* Update serving_completion.py
* fix
* fix
2025-07-23 21:00:56 +08:00
Zero Rains
5d1788c7b5
polish code for prefill restrictions ( #2992 )
2025-07-23 05:12:01 -07:00
Zero Rains
abd238fc12
[Cherry-Pick][BugFix] Add prefill restrictions for chunked_prefill+VL ( #2984 )
2025-07-23 01:53:26 -07:00
Jiang-Jia-Jun
e5804b1d98
Revert "[LLM] fix multinode bugs ( #2945 )" ( #2971 )
...
This reverts commit b0f1e0eef4
.
2025-07-22 21:23:48 +08:00
Sunny-bot1
8c43bc8176
[FIX 2.0.3]fix rejection sampling when topp=0 using _SAMPLING_EPS ( #2966 )
...
* fix rejection sampling when topp=0
* fix
* fix
2025-07-22 05:53:04 -07:00
ltd0924
b0f1e0eef4
[LLM] fix multinode bugs ( #2945 )
...
* [LLM] fix multinode bugs
* [LLM] fix multinode bugs
* [LLM] fix multinode bugs
* [LLM] fix ci bugs
* fix ci bugs
* fix ci bugs
2025-07-22 20:23:37 +08:00
ming1753
69be77c8c0
[Feature] support prompt repetition_penalty ( #2954 )
...
* [Feature] support prompt repetition_penalty (#2806 )
* [Bug Fix] fix bug of prompt penalty (#2888 )
2025-07-22 19:42:33 +08:00
gaoziyuan
535a15ab8f
[Fix]Fix vl when import fastdeploy and fix rl config rank bug ( #2953 )
...
* support vl ori_vacab_size
* support trainer_degree in name_mapping
* fix
* fix import error
* fix local rank
2025-07-22 04:40:27 -07:00
sg263
580460046f
merge 2.0.2 into 2.0.3 ( #2917 )
...
Co-authored-by: shige <shige@baidu.com >
2025-07-22 14:46:20 +08:00
Sunny-bot1
4dbc483713
[BugFix]Fix sample rejection ( #2908 ) ( #2949 )
...
* fix config
* fix rejection
Co-authored-by: YuanRisheng <yuanrisheng@baidu.com >
2025-07-22 13:39:34 +08:00
gaoziyuan
4ead15822c
【Sync develop】support vl model name_mapping and ori_vocab_size ( #2915 )
...
* support vl ori_vacab_size
* support trainer_degree in name_mapping
* fix
2025-07-20 23:14:15 -07:00
Jiang-Jia-Jun
f941124402
[Feature] Support include_stop_str_in_output ( #2930 )
...
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-21 10:58:32 +08:00
RAM
b89f083004
[Executor] Fix set capture sizes bug ( #2903 )
2025-07-18 10:58:05 +08:00
RAM
4d05ed596c
Update GraphOptimizationBackend docs ( #2899 )
2025-07-17 21:41:38 +08:00
ltd0924
bc1866af58
[LLM] delete fixed slot ( #2894 )
2025-07-17 20:44:55 +08:00
yulangz
fe237fe92b
[XPU][doc] pick xpu doc fix ( #2897 )
2025-07-17 20:01:40 +08:00
YUNSHEN XIE
3a480abcbb
enable CI workflow for pull requests targeting release/* branches ( #2886 )
...
* enable CI workflow for pull requests targeting release/* branches
* fix
2025-07-17 16:48:13 +08:00
Yuanle Liu
335609efb6
fix rollout_model and add rl ut ( #2882 )
2025-07-17 13:37:54 +08:00
Jiang-Jia-Jun
3464f75f98
Update setup.py
2025-07-16 23:45:05 +08:00
Jiang-Jia-Jun
09d0073fdc
[Sync Code] develop to release/2.0.3 ( #2873 )
...
* [LLM] support send batch data and aggregate data (#2860 )
* [LLM] support send batch data and aggregate data
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] update
* [LLM] Update Multinode Deployment (#2830 )
* [LLM] fix multinode bugs
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] fix ci bugs
* Update fastdeploy/engine/args_utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [LLM] update random port
* [LLM] update random port
* [LLM] fix ci bugs
* fix ci bugs
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-07-16 23:44:26 +08:00
Yuanle Liu
63d6e7ce06
fix and refine vl ( #2866 )
...
* refine vl config
* delete attn_sep
* fix vl accuracy
2025-07-16 05:59:28 -07:00
周周周
aa76085d1f
[Attention] remove cum_offsets from atten, and use cu_seqlens_q ( #2870 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 )
2025-07-16 20:10:57 +08:00
sg263
42b80182e0
[Trace] add opentelemetry ( #2852 )
...
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-16 15:33:25 +08:00
Yuanle Liu
dda4a9f848
rl update ( #2861 )
2025-07-16 00:33:10 -07:00
yangjianfengo1
a83a3eea5f
将FLAGS_max_partition_size修改为环境变量获取 ( #2854 )
2025-07-16 14:14:21 +08:00
xiaoxiaohehe001
0d0340392f
[Fix] Fix mm ep weight init. ( #2855 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix_45t_mm
* Update load_weight_utils.py
* Update load_weight_utils.py
2025-07-16 12:02:39 +08:00
YuanRisheng
0253381fb9
fix config ( #2858 )
2025-07-16 11:40:10 +08:00
freeliuzc
2d1184aefe
[Fix] fix expert_parallel bug in decoder stage ( #2848 )
2025-07-16 11:08:18 +08:00
yulangz
17314ee126
[XPU] Update doc and add scripts for downloading dependencies ( #2845 )
...
* [XPU] update xvllm download
* update supported models
* fix xpu model runner in huge memory with small model
* update doc
2025-07-16 11:05:56 +08:00
YuanRisheng
101ad33332
[BugFix] Fix Configs ( #2849 )
...
* fix config
* fix config
2025-07-15 19:50:36 -07:00
RAM
0fad10b35a
[Executor] CUDA Graph support padding batch ( #2844 )
...
* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug
2025-07-15 19:49:01 -07:00
Yuanle Liu
61b3997b85
refactor rl get_name_mappings_to_training ( #2847 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix
2025-07-15 07:31:42 -07:00
Zero Rains
e7bcbbab52
Merge vl execution path into normal execution path ( #2829 )
...
* merge vl model into gpu_model runner
Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6
* fix chinese
Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a
* fix the parse parameter
Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe
* fix the bug in online_inference
Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559
* polish code
Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c
2025-07-15 22:20:03 +08:00
zhenwenDang
5fc659b900
[Docs] add enable_logprob parameter description ( #2850 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-15 19:47:45 +08:00
ophilia-lee
33db137d0b
新增vLLM默认请求参数yaml
2025-07-15 19:31:27 +08:00
lijingning
9d6a42b334
适配vLLM无arrival_time;适配vLLM model必传;RequestFuncInput/RequestFuncOutput/SampleRequest新增用例编号no
2025-07-15 19:31:27 +08:00
Jiang-Jia-Jun
1b712bba82
Update setup.py
2025-07-15 14:57:23 +08:00
AIbin
fd91da7b41
【Inference Optimize】Support wint2 triton kernel about triton_utils_v2 ( #2842 )
...
* update supported_models doc
2025-07-15 14:35:40 +08:00
bukejiyu
15c8c240b5
[vl] Use top_k from config.json ( #2831 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-15 00:39:12 +08:00
freeliuzc
7cdd8d290d
[MTP] optimize mtp infer speed ( #2840 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-14 19:50:22 +08:00
YuanRisheng
4c7b8bc458
Simplify the Config code ( #2770 )
...
* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe
2025-07-14 19:50:05 +08:00
freeliuzc
2e81792d64
[fix] fix 'force-reinstall all-depe-packages in build' ( #2837 )
2025-07-14 16:50:54 +08:00
AIbin
b7858c22d9
【Update Docs】update supported_models doc ( #2836 )
...
* update supported_models doc
2025-07-14 16:01:34 +08:00
GoldPancake
09bbac6de0
Add DeepGEMM pre-compile tools ( #2819 )
...
This tool allows you to compile all possible kernels in advance through the model's config.json, and avoids the situation where uncompiled kernel is encountered and JIT is executed when certain requests arrive.
2025-07-14 14:56:41 +08:00
freeliuzc
7f64d408a9
[MTP] support expert-parellel in mtp ( #2835 )
2025-07-14 14:28:50 +08:00
lddfym
ece88596ed
fix spelling error ( #2827 )
2025-07-14 13:12:57 +08:00
bukejiyu
bad53c6b6e
[vl]remove duplicated load logic ( #2744 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-13 07:36:26 +08:00