K11OntheBoat
8020927f50
[BugFix] Rename attention params of deepseekv3 ( #2939 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-07-22 14:01:30 +08:00
Jiang-Jia-Jun
56102e91e1
[Polish] Return error message of raw_request ( #2946 )
...
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-22 10:21:32 +08:00
zhink
0262ef7eb3
custom all reduce support cuda graph ( #2938 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag
* rename communication_op to communication
2025-07-21 22:52:03 +08:00
周周周
ff4569f135
remove some code in ep.py ( #2947 )
2025-07-21 22:44:57 +08:00
李泳桦
8a619e9db5
[Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body ( #2940 )
...
* [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body
* [fix] return_token_ids not working in curl request
* [test] improve some test cases of return_token_ids and prompt_token_ids
* [fix] the server responds ok even if request.messages is an empty list
2025-07-21 19:31:14 +08:00
littledgg
2845bde964
[Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph ( #2936 )
...
* [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph
* Fix: Apply black formatting
2025-07-21 16:25:51 +08:00
Yuanle Liu
2f74e93d7e
use dist.all_reduce(min) to sync num_blocks_local ( #2933 )
...
* pre-commit all files check
* reduce min num_blocks_local
* fix nranks=1
* pre-commit when commit-msg
2025-07-21 01:23:36 -07:00
lizexu123
67990e0572
[Feature] support min_p_sampling ( #2872 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fastdeploy support min_p
* add test_min_p
* fix
* min_p_sampling
* update
* delete vl_gpu_model_runner.py
* fix
* Align usage of min_p with vLLM
* fix
* modified unit test
* fix test_min_sampling
* pre-commit all files
* fix
* fix
* fix
* fix xpu_model_runner.py
2025-07-20 23:17:59 -07:00
gaoziyuan
95a214ae43
support trainer_degree in name_mapping ( #2935 )
2025-07-20 23:12:55 -07:00
YuanRisheng
bce2c6cd7c
rename test dir ( #2934 )
2025-07-21 14:05:45 +08:00
ltd0924
cc4cec0a74
Update engine_client.py ( #2931 )
2025-07-21 11:42:16 +08:00
liddk1121
17c5d3a241
[Iluvatar GPU] Add CI scripts ( #2876 )
2025-07-21 09:44:42 +08:00
周周周
8c5407d9e4
remove cum_offsets from ForwardMeta ( #2925 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-19 23:57:27 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
ZhangYulongg
b8676d71a8
update ci cases
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-18 21:44:07 +08:00
ZhangYulongg
43976138de
update ci cases
2025-07-18 21:44:07 +08:00
ZhangYulongg
e546e6b1b0
update ci cases
2025-07-18 21:44:07 +08:00
ZhangYulongg
9c8292fb19
update ci cases
2025-07-18 21:44:07 +08:00
ZhangYulongg
a5e95013b5
update ci cases
2025-07-18 21:44:07 +08:00
ZhangYulongg
93481a5478
update ci cases
2025-07-18 21:44:07 +08:00
ZhangYulongg
eb77b1be6d
update ci cases
2025-07-18 21:44:07 +08:00
ming1753
5328daa333
[Bug Fix] fix ep config bug ( #2920 )
2025-07-18 19:12:56 +08:00
xiaoxiaohehe001
a42fc3f40b
[Feature] Support 45tVL EP FP8 Infer. ( #2909 )
...
* support_mm_ep_fp8
* support_mm_ep
2025-07-18 17:57:15 +08:00
Jiang-Jia-Jun
fbe3547c95
[Feature] Support include_stop_str_in_output in chat/completion ( #2910 )
...
* [Feature] Support include_stop_str_in_output in chat/completion
* Add ci test for include_stop_str_in_output
* Update version of openai
* Fix ci test
---------
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-18 16:59:18 +08:00
gaoziyuan
6efad14b95
support vl ori_vacab_size ( #2900 )
2025-07-18 16:26:14 +08:00
周周周
d306944f4f
remove cum_offsets from get_block_shape_and_split_kv_block ( #2913 )
...
* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove padding_offsets from get_padding_offset.cu
* remove cum_offsets from get_block_shape_and_split_kv_block
* remove cum_offsets from get_block_shape_and_split_kv_block
2025-07-18 16:13:32 +08:00
YUNSHEN XIE
e81137e581
fix ci workflow ( #2896 )
2025-07-18 16:01:00 +08:00
RAM
cd52dc0f65
[Executor] Fix set capture sizes bug ( #2902 )
2025-07-18 15:12:19 +08:00
周周周
1339e56282
[XPU] Remove padding_offsets from get_padding_offset.cu ( #2911 )
2025-07-18 14:16:44 +08:00
YuanRisheng
0eb5dc18d3
[BugFix]Fix sample rejection ( #2908 )
...
* fix config
* fix rejection
2025-07-18 13:44:30 +08:00
sg263
e679567d59
[Trace]fix opentelemetry can not work in uvicorn ( #2906 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* fix annotation
* fix annotation when add opentelemetry
* fix opentelemetry-instrumentation-fastapi
* fix pentelemetry-bootstrap
* fix opentelemetry can not work in uvicorn
* move conf to env
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 23:16:45 +08:00
RAM
bbe2c5c968
Update GraphOptimizationBackend docs ( #2898 )
2025-07-17 21:38:18 +08:00
ltd0924
4b14dca1d6
[LLM] delete fixed slots ( #2893 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 19:19:54 +08:00
yulangz
c8c280c4d3
[XPU][Doc] fix typo ( #2892 )
2025-07-17 19:13:54 +08:00
周周周
ddb10ac509
[Inference, rename] remove padding_offsets from atten use batch_id_per_token ( #2880 )
...
* remove padding_offsets from atten
2025-07-17 18:41:31 +08:00
freeliuzc
d49f8fb30a
[Feature][MTP] Support cacheKV transfer in per_chunk mode ( #2890 )
...
* support chunk_prefill both normal and speculative_decoding(mtp)
* optimize pd-disaggregation config
* fix bug
2025-07-17 17:58:08 +08:00
ming1753
67180c1ff9
[Bug Fix] fix bug of prompt penalty ( #2888 )
2025-07-17 17:21:37 +08:00
Xintong Yu
273efba76f
[Fix] remove misleading variables ( #2841 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 16:49:14 +08:00
YUNSHEN XIE
1cfba5ba3e
enable CI workflow for pull requests targeting release/* branches ( #2887 )
2025-07-17 16:48:03 +08:00
Jiang-Jia-Jun
31cab9f87b
Update test_openai.py
2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun
d3dfa1446c
Update test_openai.py
2025-07-17 16:07:07 +08:00
ltd0924
b630031414
[LLM] fix serval bugs ( #2878 )
2025-07-17 14:21:05 +08:00
LokeZhou
f50c25178b
[MM_PROCESS] add _extract_labels ( #2879 )
2025-07-17 14:20:01 +08:00
Yuanle Liu
dbb9e2506b
Fix rollout_model init ( #2881 )
2025-07-16 22:36:21 -07:00
ming1753
1f15ca21e4
[Feature] support prompt repetition_penalty ( #2806 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 12:05:52 +08:00
yulangz
7dfd2ea052
[XPU][doc] Update minimal fastdeploy required ( #2863 )
...
* [XPU][doc] update minimal fastdeploy required
2025-07-17 11:33:22 +08:00
GoldPancake
42d4001400
[Features] Add speculative metrics ( #2857 )
2025-07-17 11:08:55 +08:00
sg263
52aca233e8
[Trace] fix annotation when add opentelemetry ( #2869 )
...
* add opentelemetry
* add opentelemetry
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* add opentelemetry on dequeue
* fix annotation
* fix annotation when add opentelemetry
* fix opentelemetry-instrumentation-fastapi
* fix pentelemetry-bootstrap
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-17 10:29:16 +08:00
ltd0924
9c25dcca0b
[LLM] Update Multinode Deployment ( #2830 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] update multinode deployment
* [LLM] fix ci bugs
* Update fastdeploy/engine/args_utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* [LLM] update random port
* [LLM] update random port
* [LLM] fix ci bugs
* fix ci bugs
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-07-16 23:42:54 +08:00
ltd0924
d245d1ca6c
[LLM] support send batch data and aggregate data ( #2860 )
...
* [LLM] support send batch data and aggregate data
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] fix ci bugs
* [LLM] update
2025-07-16 23:42:20 +08:00