Commit Graph

713 Commits

Author SHA1 Message Date
chenjian
bc388b65c7 [Bug fix] Fix bug in logprob in release 2.0.4 (#3445)
* fix bug for scheduler v0

* Fix logprob in release/2.0.4
2025-08-16 21:13:10 +08:00
Jiang-Jia-Jun
71af0ca04a [BugFix] Fix default log level of paddleformers (#3378) 2025-08-15 18:30:00 +08:00
xiaolei373
f0519aec67 feat(log):add_request_and_response_log (#3391)
* feat(log):add_request_and_response_log

* [ci] Retrigger

* [ci] Retrigger
2025-08-14 19:12:42 +08:00
gaoziyuan
1f5983290c fix mapping (#3321) 2025-08-12 16:17:59 +08:00
chenjian
c6a133d573 [Bug fix] Fix block num in scheduler v1 for release2.0.4 (#3314)
* fix bug for scheduler v0

* fix block num setting in scheduler v1

* fix block num setting in scheduler v1

* fix block num setting in scheduler v1

* fix block num setting in scheduler v1

* fix block num setting in scheduler v1
2025-08-11 23:55:45 +08:00
chenjian
4646aff25c fix bug for scheduler v0 (#3307) 2025-08-11 23:55:20 +08:00
chenjian
a84a98b107 fix scheduler bug due to async running (#3293) 2025-08-10 13:54:59 +08:00
chenjian
c208086f61 fix scheduler bug for bs=1 (#3288) 2025-08-09 12:22:12 +08:00
sg263
ce1d4944e7 merge develop trace FD_START (#3253) (#3260)
Co-authored-by: shige <shige@baidu.com>
2025-08-07 16:06:58 +08:00
chenjian
5439fb6336 [Cherry-pick] FIx bug for scheduler V1 (#3167)
* [BUG FIX] Fix bug when preempted request rescheduled (#3080)

* Fix bug when preempted request rescheduled

* Fix bug when preempted request rescheduled

* Fix bug when preempted request rescheduled

* Fix bug for offline inference in scheduler v1 (#3117)
2025-08-04 17:08:12 +08:00
gaoziyuan
a592d17615 support qwen3 name_mapping (#3180) 2025-08-04 16:37:34 +08:00
李泳桦
eca8fc7ca6 [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client (#3077)
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client

* [fix] delete ci test case for enable_thinking

* [fix] add reasoning_parser when server starts

* [doc] update docs related to metadata

* [fix] fix ci consistency test error with reasoning parser

* [fix] cancel enable_thinking default value
2025-07-30 19:25:39 +08:00
李泳桦
0463797fc2 [feat] add disable_chat_template in chat api as a substitute for previous raw_request (#3023)
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request

* [fix] pre-commit code check
2025-07-25 20:57:06 +08:00
xiaoxiaohehe001
2970b00dfa [Feature] Support_eplb (#2997)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb

* [Feature] support_eplb

* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
littledgg
f37d00e856 [Model] Provide clearer error for missing KV cache quantization scales (#3007) 2025-07-24 20:15:00 +08:00
EnflameGCU
c40df1802e [GCU] Update to develop (#2988) 2025-07-24 19:30:52 +08:00
Yzc216
980126b83a [Feature] multi source download (#3005)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit

* Change default download

* change requirements.txt

* modify English Documentation

* documentation
2025-07-24 17:42:09 +08:00
Zero Rains
0fb37ab7e4 update flake8 version to support pre-commit in python3.12 (#3000)
* update flake8 version to support pre-commit in python3.12

* polish code
2025-07-24 01:43:31 -07:00
ltd0924
f935d6f862 [BugFix] fix multinode deployment (#2977) 2025-07-24 15:04:04 +08:00
ltd0924
3792345c3a [LLM] update function name (#2985)
* [LLM] update function name
2025-07-24 15:03:40 +08:00
Yzc216
e14587a954 [Feature] multi-source download (#2986)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit
2025-07-24 14:26:37 +08:00
xiaoxiaohehe001
2c0ff068e2 [Fix] fix mm ep empty run (#2999) 2025-07-24 14:15:55 +08:00
lizhenyun01
29c3292f02 support c4 attn && fix cache 2025-07-24 12:00:52 +08:00
lizexu123
832d25334a [Code Simplification] fix init_distributed_environment() (#2982) 2025-07-24 11:43:28 +08:00
bukejiyu
bfeb664ab8 update (#2978)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
chenjian
85a78d695d [Feature] Support block scheduler v1 for FD (#2928)
* Support FD block scheduler v1

* Support FD block scheduler v1

* Support FD block scheduler v1

* Fix according to copilot review

* Fix according to review

* Remove is_dummy

* Fix bug when real_bsz=1

* Fix infer first token cost time

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-23 20:31:31 +08:00
Zero Rains
ca0f71bd39 polish code for prefill restrictions (#2991) 2025-07-23 05:10:14 -07:00
chen
172e69fe17 FA3 fix bug (#2987) 2025-07-23 19:07:43 +08:00
zhink
1272c7ce98 Fix performance degradation bug of custom_all_reduce (#2981) 2025-07-23 17:45:44 +08:00
Zero Rains
850c9d98d4 [BugFix] Add prefill restrictions for chunked_prefill+VL (#2983) 2025-07-23 01:45:57 -07:00
freeliuzc
a39a67334c fix mtp bug in pd-split mode (#2970)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-23 15:31:16 +08:00
lizexu123
9b22b8d2c3 delete max-len (#2959) 2025-07-23 15:11:39 +08:00
chen
ad202272ed 【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (#2942) 2025-07-23 13:02:50 +08:00
lizhenyun01
e51f018577 support chunk_prefill in fa3 2025-07-23 12:19:20 +08:00
Ryan
95b5af24db [SOT] Add sot warmup (NVIDIA GPU Only) (#2929)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add sot warmup

* fix code style

* change batch_size list

* add param to config

* rm free_list settings && set sot_warmup_sizes

* finish debug with dynamic dims by type annotations

* add profile_run guard

* rm sth useless
2025-07-22 21:36:14 +08:00
Sunny-bot1
7c5e34e72d [FIX]fix rejection sampling when topp=0 using _SAMPLING_EPS (#2967)
* fix rejection sampling when topp=0

* fix
2025-07-22 05:53:37 -07:00
gaoziyuan
dbe6225b33 fix rl config local rank (#2957) 2025-07-22 04:39:54 -07:00
GoldPancake
9b84d51e25 [MTP Fix] Fix code and register cpp operators (#2965) 2025-07-22 19:36:24 +08:00
K11OntheBoat
93bb68aa71 [Feature] Marlin MoE backend supports DeepseekV3 (#2962)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 18:11:15 +08:00
GoldPancake
dc67c10a7e [Feature][MTP]Support multi-step MTP (#2952) 2025-07-22 16:26:29 +08:00
luukunn
920e6b3f60 [Fix]fix empty prompt_token_ids,update the parser's triggering condit… (#2891) 2025-07-22 16:13:05 +08:00
Zero Rains
89a485b69f [Feature] Support using prefix-caching + cudagraph for inference (#2924)
* fix the bug in cudagraph+prefix-caching but still have some bug with profile

Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397

* add the signal to make sure cache manager launched

* fix judge condition

* reomove useless control

* update control stream

* update

* fix xpu

* change the do_profile flag

* update

* add new threads to init cache_manager

---------

Co-authored-by: RAM <gstian5555@outlook.com>
2025-07-22 00:59:45 -07:00
Nyakku Shigure
48e6a0ca26 [SOT] Mark dynamic dims by type annotations (#2771)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [SOT] Mark dynamic dims by type annotations

* fix conflict of forward_meta

* mark more attn backend

* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS

* auto infer implicit 0 dim dynamic dim

* revert manual marked dims

* revert missing update

* auto infer can use unsafe code in warmup stage

* check -> type_match

* fix codestyle

* restore blank line

* empty commit

* add need_warmup nonlocal;

* add doc for resolver

* add missing type hints

* unquote "ForwardMeta"
2025-07-22 00:23:52 -07:00
李泳桦
2a8a2c06de [fix] non-streaming api now returns full output ids if return_token_ids is enabled (#2951) 2025-07-22 14:35:56 +08:00
lifulll
2c6a9e887e native top_p_sampling (#2901) 2025-07-22 14:09:59 +08:00
gaoziyuan
0eedbdaee0 fix import error (#2944) 2025-07-22 14:06:01 +08:00
K11OntheBoat
8020927f50 [BugFix] Rename attention params of deepseekv3 (#2939)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-07-22 14:01:30 +08:00
Jiang-Jia-Jun
56102e91e1 [Polish] Return error message of raw_request (#2946)
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-22 10:21:32 +08:00
zhink
0262ef7eb3 custom all reduce support cuda graph (#2938)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag

* rename communication_op to communication
2025-07-21 22:52:03 +08:00
周周周
ff4569f135 remove some code in ep.py (#2947) 2025-07-21 22:44:57 +08:00