Commit Graph

3005 Commits

Author SHA1 Message Date
K11OntheBoat
8020927f50 [BugFix] Rename attention params of deepseekv3 (#2939)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 14:01:30 +08:00
Jiang-Jia-Jun
56102e91e1 [Polish] Return error message of raw_request (#2946)
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-22 10:21:32 +08:00
zhink
0262ef7eb3 custom all reduce support cuda graph (#2938)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag

* rename communication_op to communication
2025-07-21 22:52:03 +08:00
周周周
ff4569f135 remove some code in ep.py (#2947) 2025-07-21 22:44:57 +08:00
李泳桦
8a619e9db5 [Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body (#2940)
* [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body

* [fix] return_token_ids not working in curl request

* [test] improve some test cases of return_token_ids and prompt_token_ids

* [fix] the server responds ok even if request.messages is an empty list
2025-07-21 19:31:14 +08:00
littledgg
2845bde964 [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph (#2936)
* [Executor] Avoid OOM when start the service while Enable Chunked Prefill + CudaGraph

* Fix: Apply black formatting
2025-07-21 16:25:51 +08:00
Yuanle Liu
2f74e93d7e use dist.all_reduce(min) to sync num_blocks_local (#2933)
* pre-commit all files check

* reduce min num_blocks_local

* fix nranks=1

* pre-commit when commit-msg
2025-07-21 01:23:36 -07:00
lizexu123
67990e0572 [Feature] support min_p_sampling (#2872)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fastdeploy support min_p

* add test_min_p

* fix

* min_p_sampling

* update

* delete vl_gpu_model_runner.py

* fix

* Align usage of min_p with vLLM

* fix

* modified unit test

* fix test_min_sampling

* pre-commit all files

* fix

* fix

* fix

* fix xpu_model_runner.py
2025-07-20 23:17:59 -07:00
gaoziyuan
95a214ae43 support trainer_degree in name_mapping (#2935) 2025-07-20 23:12:55 -07:00
YuanRisheng
bce2c6cd7c rename test dir (#2934) 2025-07-21 14:05:45 +08:00
ltd0924
cc4cec0a74 Update engine_client.py (#2931) 2025-07-21 11:42:16 +08:00
liddk1121
17c5d3a241 [Iluvatar GPU] Add CI scripts (#2876) 2025-07-21 09:44:42 +08:00
周周周
8c5407d9e4 remove cum_offsets from ForwardMeta (#2925)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-19 23:57:27 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
ZhangYulongg
b8676d71a8 update ci cases
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-18 21:44:07 +08:00
ZhangYulongg
43976138de update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg
e546e6b1b0 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg
9c8292fb19 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg
a5e95013b5 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg
93481a5478 update ci cases 2025-07-18 21:44:07 +08:00
ZhangYulongg
eb77b1be6d update ci cases 2025-07-18 21:44:07 +08:00
ming1753
5328daa333 [Bug Fix] fix ep config bug (#2920) 2025-07-18 19:12:56 +08:00
xiaoxiaohehe001
a42fc3f40b [Feature] Support 45tVL EP FP8 Infer. (#2909)
* support_mm_ep_fp8

* support_mm_ep
2025-07-18 17:57:15 +08:00
Jiang-Jia-Jun
fbe3547c95 [Feature] Support include_stop_str_in_output in chat/completion (#2910)
* [Feature] Support include_stop_str_in_output in chat/completion

* Add ci test for include_stop_str_in_output

* Update version of openai

* Fix ci test

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-18 16:59:18 +08:00
gaoziyuan
6efad14b95 support vl ori_vacab_size (#2900) 2025-07-18 16:26:14 +08:00
周周周
d306944f4f remove cum_offsets from get_block_shape_and_split_kv_block (#2913)
* remove padding_offsets from get_padding_offset.cu

* remove padding_offsets from get_padding_offset.cu

* remove padding_offsets from get_padding_offset.cu

* remove cum_offsets from get_block_shape_and_split_kv_block

* remove cum_offsets from get_block_shape_and_split_kv_block
2025-07-18 16:13:32 +08:00
YUNSHEN XIE
e81137e581 fix ci workflow (#2896) 2025-07-18 16:01:00 +08:00
RAM
cd52dc0f65 [Executor] Fix set capture sizes bug (#2902) 2025-07-18 15:12:19 +08:00
周周周
1339e56282 [XPU] Remove padding_offsets from get_padding_offset.cu (#2911) 2025-07-18 14:16:44 +08:00
YuanRisheng
0eb5dc18d3 [BugFix]Fix sample rejection (#2908)
* fix config

* fix rejection
2025-07-18 13:44:30 +08:00
sg263
e679567d59 [Trace]fix opentelemetry can not work in uvicorn (#2906)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* fix annotation

* fix annotation when add opentelemetry

* fix opentelemetry-instrumentation-fastapi

* fix pentelemetry-bootstrap

* fix opentelemetry can not work in uvicorn

* move conf to env

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-17 23:16:45 +08:00
RAM
bbe2c5c968 Update GraphOptimizationBackend docs (#2898) 2025-07-17 21:38:18 +08:00
ltd0924
4b14dca1d6 [LLM] delete fixed slots (#2893)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 19:19:54 +08:00
yulangz
c8c280c4d3 [XPU][Doc] fix typo (#2892) 2025-07-17 19:13:54 +08:00
周周周
ddb10ac509 [Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880)
* remove padding_offsets from atten
2025-07-17 18:41:31 +08:00
freeliuzc
d49f8fb30a [Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890)
* support chunk_prefill both normal and speculative_decoding(mtp)

* optimize pd-disaggregation config

* fix bug
2025-07-17 17:58:08 +08:00
ming1753
67180c1ff9 [Bug Fix] fix bug of prompt penalty (#2888) 2025-07-17 17:21:37 +08:00
Xintong Yu
273efba76f [Fix] remove misleading variables (#2841)
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-17 16:49:14 +08:00
YUNSHEN XIE
1cfba5ba3e enable CI workflow for pull requests targeting release/* branches (#2887) 2025-07-17 16:48:03 +08:00
Jiang-Jia-Jun
31cab9f87b Update test_openai.py 2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun
d3dfa1446c Update test_openai.py 2025-07-17 16:07:07 +08:00
ltd0924
b630031414 [LLM] fix serval bugs (#2878) 2025-07-17 14:21:05 +08:00
LokeZhou
f50c25178b [MM_PROCESS] add _extract_labels (#2879) 2025-07-17 14:20:01 +08:00
Yuanle Liu
dbb9e2506b Fix rollout_model init (#2881) 2025-07-16 22:36:21 -07:00
ming1753
1f15ca21e4 [Feature] support prompt repetition_penalty (#2806)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-17 12:05:52 +08:00
yulangz
7dfd2ea052 [XPU][doc] Update minimal fastdeploy required (#2863)
* [XPU][doc] update minimal fastdeploy required
2025-07-17 11:33:22 +08:00
GoldPancake
42d4001400 [Features] Add speculative metrics (#2857) 2025-07-17 11:08:55 +08:00
sg263
52aca233e8 [Trace] fix annotation when add opentelemetry (#2869)
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* fix annotation

* fix annotation when add opentelemetry

* fix opentelemetry-instrumentation-fastapi

* fix pentelemetry-bootstrap

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-17 10:29:16 +08:00
ltd0924
9c25dcca0b [LLM] Update Multinode Deployment (#2830)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] fix ci bugs

* Update fastdeploy/engine/args_utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [LLM] update random port

* [LLM] update random port

* [LLM] fix ci bugs

* fix ci bugs

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-16 23:42:54 +08:00
ltd0924
d245d1ca6c [LLM] support send batch data and aggregate data (#2860)
* [LLM] support send batch data and aggregate data

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] update
2025-07-16 23:42:20 +08:00