周周周
d306944f4f
remove cum_offsets from get_block_shape_and_split_kv_block ( #2913 )
...
* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove cum_offsets from get_block_shape_and_split_kv_block
* remove cum_offsets from get_block_shape_and_split_kv_block
2025-07-18 16:13:32 +08:00
YUNSHEN XIE
e81137e581
fix ci workflow ( #2896 )
2025-07-18 16:01:00 +08:00
RAM
cd52dc0f65
[Executor] Fix set capture sizes bug ( #2902 )
2025-07-18 15:12:19 +08:00
周周周
1339e56282
[XPU] Remove padding_offsets from get_padding_offset.cu ( #2911 )
2025-07-18 14:16:44 +08:00
YuanRisheng
0eb5dc18d3
[BugFix]Fix sample rejection ( #2908 )
...
* fix config
* fix rejection
2025-07-18 13:44:30 +08:00
sg263
e679567d59
[Trace]fix opentelemetry can not work in uvicorn ( #2906 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* fix annotation
* fix annotation when add opentelemetry
* fix opentelemetry-instrumentation-fastapi
* fix pentelemetry-bootstrap
* fix opentelemetry can not work in uvicorn
* move conf to env
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 23:16:45 +08:00
RAM
bbe2c5c968
Update GraphOptimizationBackend docs ( #2898 )
2025-07-17 21:38:18 +08:00
ltd0924
4b14dca1d6
[LLM] delete fixed slots ( #2893 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 19:19:54 +08:00
yulangz
c8c280c4d3
[XPU][Doc] fix typo ( #2892 )
2025-07-17 19:13:54 +08:00
周周周
ddb10ac509
[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )
...
* remove padding_offsets from atten
2025-07-17 18:41:31 +08:00
freeliuzc
d49f8fb30a
[Feature][MTP] Support cacheKV transfer in per_chunk mode ( #2890 )
...
* support chunk_prefill both normal and speculative_decoding(mtp)
* optimize pd-disaggregation config
* fix bug
2025-07-17 17:58:08 +08:00
ming1753
67180c1ff9
[Bug Fix] fix bug of prompt penalty ( #2888 )
2025-07-17 17:21:37 +08:00
Xintong Yu
273efba76f
[Fix] remove misleading variables ( #2841 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 16:49:14 +08:00
YUNSHEN XIE
1cfba5ba3e
enable CI workflow for pull requests targeting release/* branches ( #2887 )
2025-07-17 16:48:03 +08:00
Jiang-Jia-Jun
31cab9f87b
Update test_openai.py
2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun
d3dfa1446c
Update test_openai.py
2025-07-17 16:07:07 +08:00
ltd0924
b630031414
[LLM] fix serval bugs ( #2878 )
2025-07-17 14:21:05 +08:00
LokeZhou
f50c25178b
[MM_PROCESS] add _extract_labels ( #2879 )
2025-07-17 14:20:01 +08:00
Yuanle Liu
dbb9e2506b
Fix rollout_model init ( #2881 )
2025-07-16 22:36:21 -07:00
ming1753
1f15ca21e4
[Feature] support prompt repetition_penalty ( #2806 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 12:05:52 +08:00
yulangz
7dfd2ea052
[XPU][doc] Update minimal fastdeploy required ( #2863 )
...
* [XPU][doc] update minimal fastdeploy required
2025-07-17 11:33:22 +08:00
GoldPancake
42d4001400
[Features] Add speculative metrics ( #2857 )
2025-07-17 11:08:55 +08:00
sg263
52aca233e8
[Trace] fix annotation when add opentelemetry ( #2869 )
...
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* fix annotation
* fix annotation when add opentelemetry
* fix opentelemetry-instrumentation-fastapi
* fix pentelemetry-bootstrap
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 10:29:16 +08:00
ltd0924
9c25dcca0b
[LLM] Update Multinode Deployment ( #2830 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] fix ci bugs
* Update fastdeploy/engine/args_utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [LLM] update random port
* [LLM] update random port
* [LLM] fix ci bugs
* fix ci bugs
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-07-16 23:42:54 +08:00
ltd0924
d245d1ca6c
[LLM] support send batch data and aggregate data ( #2860 )
...
* [LLM] support send batch data and aggregate data
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] update
2025-07-16 23:42:20 +08:00
Yuanle Liu
63d6e7ce06
fix and refine vl ( #2866 )
...
* refine vl config
* delete attn_sep
* fix vl accuracy
2025-07-16 05:59:28 -07:00
周周周
aa76085d1f
[Attention] remove cum_offsets from atten, and use cu_seqlens_q ( #2870 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
[Attention] remove cum_offsets from atten, and use cu_seqlens_q (#2870 )
2025-07-16 20:10:57 +08:00
sg263
42b80182e0
[Trace] add opentelemetry ( #2852 )
...
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-16 15:33:25 +08:00
Yuanle Liu
dda4a9f848
rl update ( #2861 )
2025-07-16 00:33:10 -07:00
yangjianfengo1
a83a3eea5f
将FLAGS_max_partition_size修改为环境变量获取 ( #2854 )
2025-07-16 14:14:21 +08:00
xiaoxiaohehe001
0d0340392f
[Fix] Fix mm ep weight init. ( #2855 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix_45t_mm
* Update load_weight_utils.py
* Update load_weight_utils.py
2025-07-16 12:02:39 +08:00
YuanRisheng
0253381fb9
fix config ( #2858 )
2025-07-16 11:40:10 +08:00
freeliuzc
2d1184aefe
[Fix] fix expert_parallel bug in decoder stage ( #2848 )
2025-07-16 11:08:18 +08:00
yulangz
17314ee126
[XPU] Update doc and add scripts for downloading dependencies ( #2845 )
...
* [XPU] update xvllm download
* update supported models
* fix xpu model runner in huge memory with small model
* update doc
2025-07-16 11:05:56 +08:00
YuanRisheng
101ad33332
[BugFix] Fix Configs ( #2849 )
...
* fix config
* fix config
2025-07-15 19:50:36 -07:00
RAM
0fad10b35a
[Executor] CUDA Graph support padding batch ( #2844 )
...
* cuda graph support padding batch
* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.
* Do not insert max_num_seqs when the user specifies a capture list
* Support set graph optimization config from YAML file
* update cuda graph ci
* fix ci bug
* fix ci bug
2025-07-15 19:49:01 -07:00
Yuanle Liu
61b3997b85
refactor rl get_name_mappings_to_training ( #2847 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix
2025-07-15 07:31:42 -07:00
Zero Rains
e7bcbbab52
Merge vl execution path into normal execution path ( #2829 )
...
* merge vl model into gpu_model runner
Change-Id: I9f4691a3d5f135e8d72b1d58abcd15ef3aa3f2a6
* fix chinese
Change-Id: Ic7405109b984c21e076fb3b01ff6feb571d0119a
* fix the parse parameter
Change-Id: I4cd62ee87c06220af580d91e347145d4394917fe
* fix the bug in online_inference
Change-Id: Idb111bb2114e83017c4050b2a68cf039c6d3c559
* polish code
Change-Id: I7d4194102c2f1b0743b74fbd5fc284eb8ef4d17c
2025-07-15 22:20:03 +08:00
zhenwenDang
5fc659b900
[Docs] add enable_logprob parameter description ( #2850 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-15 19:47:45 +08:00
ophilia-lee
33db137d0b
新增vLLM默认请求参数yaml
2025-07-15 19:31:27 +08:00
lijingning
9d6a42b334
适配vLLM无arrival_time;适配vLLM model必传;RequestFuncInput/RequestFuncOutput/SampleRequest新增用例编号no
2025-07-15 19:31:27 +08:00
Jiang-Jia-Jun
1b712bba82
Update setup.py
2025-07-15 14:57:23 +08:00
AIbin
fd91da7b41
【Inference Optimize】Support wint2 triton kernel about triton_utils_v2 ( #2842 )
...
* update supported_models doc
2025-07-15 14:35:40 +08:00
bukejiyu
15c8c240b5
[vl] Use top_k from config.json ( #2831 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-15 00:39:12 +08:00
freeliuzc
7cdd8d290d
[MTP] optimize mtp infer speed ( #2840 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-14 19:50:22 +08:00
YuanRisheng
4c7b8bc458
Simplify the Config code ( #2770 )
...
* simplify the code
* fix vl
* delete config
* fix
* perfect code
* fix ci
* fix xpu
* fix xpu
* fix server
* resolve conflict
* fix mtp
* resolve conflict
* fix xpu
* fix xpu
* fix vl
* fix log
* fix qwen moe
* fix qwen moe
* fix qwen moe
2025-07-14 19:50:05 +08:00
freeliuzc
2e81792d64
[fix] fix 'force-reinstall all-depe-packages in build' ( #2837 )
2025-07-14 16:50:54 +08:00
AIbin
b7858c22d9
【Update Docs】update supported_models doc ( #2836 )
...
* update supported_models doc
2025-07-14 16:01:34 +08:00
GoldPancake
09bbac6de0
Add DeepGEMM pre-compile tools ( #2819 )
...
This tool allows you to compile all possible kernels in advance through the model's config.json, and avoids the situation where uncompiled kernel is encountered and JIT is executed when certain requests arrive.
2025-07-14 14:56:41 +08:00
freeliuzc
7f64d408a9
[MTP] support expert-parellel in mtp ( #2835 )
2025-07-14 14:28:50 +08:00