chenjian
bc388b65c7
[Bug fix] Fix bug in logprob in release 2.0.4 ( #3445 )
...
* fix bug for scheduler v0
* Fix logprob in release/2.0.4
2025-08-16 21:13:10 +08:00
Jiang-Jia-Jun
71af0ca04a
[BugFix] Fix default log level of paddleformers ( #3378 )
2025-08-15 18:30:00 +08:00
xiaolei373
f0519aec67
feat(log):add_request_and_response_log ( #3391 )
...
* feat(log):add_request_and_response_log
* [ci] Retrigger
* [ci] Retrigger
2025-08-14 19:12:42 +08:00
gaoziyuan
1f5983290c
fix mapping ( #3321 )
2025-08-12 16:17:59 +08:00
chenjian
c6a133d573
[Bug fix] Fix block num in scheduler v1 for release2.0.4 ( #3314 )
...
* fix bug for scheduler v0
* fix block num setting in scheduler v1
* fix block num setting in scheduler v1
* fix block num setting in scheduler v1
* fix block num setting in scheduler v1
* fix block num setting in scheduler v1
2025-08-11 23:55:45 +08:00
chenjian
4646aff25c
fix bug for scheduler v0 ( #3307 )
2025-08-11 23:55:20 +08:00
chenjian
a84a98b107
fix scheduler bug due to async running ( #3293 )
2025-08-10 13:54:59 +08:00
chenjian
c208086f61
fix scheduler bug for bs=1 ( #3288 )
2025-08-09 12:22:12 +08:00
sg263
ce1d4944e7
merge develop trace FD_START ( #3253 ) ( #3260 )
...
Co-authored-by: shige <shige@baidu.com >
2025-08-07 16:06:58 +08:00
chenjian
5439fb6336
[Cherry-pick] FIx bug for scheduler V1 ( #3167 )
...
* [BUG FIX] Fix bug when preempted request rescheduled (#3080 )
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
* Fix bug when preempted request rescheduled
* Fix bug for offline inference in scheduler v1 (#3117 )
2025-08-04 17:08:12 +08:00
gaoziyuan
a592d17615
support qwen3 name_mapping ( #3180 )
2025-08-04 16:37:34 +08:00
李泳桦
eca8fc7ca6
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3077 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [doc] update docs related to metadata
* [fix] fix ci consistency test error with reasoning parser
* [fix] cancel enable_thinking default value
2025-07-30 19:25:39 +08:00
李泳桦
0463797fc2
[feat] add disable_chat_template in chat api as a substitute for previous raw_request ( #3023 )
...
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request
* [fix] pre-commit code check
2025-07-25 20:57:06 +08:00
xiaoxiaohehe001
2970b00dfa
[Feature] Support_eplb ( #2997 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb
* [Feature] support_eplb
* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
littledgg
f37d00e856
[Model] Provide clearer error for missing KV cache quantization scales ( #3007 )
2025-07-24 20:15:00 +08:00
EnflameGCU
c40df1802e
[GCU] Update to develop ( #2988 )
2025-07-24 19:30:52 +08:00
Yzc216
980126b83a
[Feature] multi source download ( #3005 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
2025-07-24 17:42:09 +08:00
Zero Rains
0fb37ab7e4
update flake8 version to support pre-commit in python3.12 ( #3000 )
...
* update flake8 version to support pre-commit in python3.12
* polish code
2025-07-24 01:43:31 -07:00
ltd0924
f935d6f862
[BugFix] fix multinode deployment ( #2977 )
2025-07-24 15:04:04 +08:00
ltd0924
3792345c3a
[LLM] update function name ( #2985 )
...
* [LLM] update function name
2025-07-24 15:03:40 +08:00
Yzc216
e14587a954
[Feature] multi-source download ( #2986 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
2025-07-24 14:26:37 +08:00
xiaoxiaohehe001
2c0ff068e2
[Fix] fix mm ep empty run ( #2999 )
2025-07-24 14:15:55 +08:00
lizhenyun01
29c3292f02
support c4 attn && fix cache
2025-07-24 12:00:52 +08:00
lizexu123
832d25334a
[Code Simplification] fix init_distributed_environment() ( #2982 )
2025-07-24 11:43:28 +08:00
bukejiyu
bfeb664ab8
update ( #2978 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
chenjian
85a78d695d
[Feature] Support block scheduler v1 for FD ( #2928 )
...
* Support FD block scheduler v1
* Support FD block scheduler v1
* Support FD block scheduler v1
* Fix according to copilot review
* Fix according to review
* Remove is_dummy
* Fix bug when real_bsz=1
* Fix infer first token cost time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-23 20:31:31 +08:00
Zero Rains
ca0f71bd39
polish code for prefill restrictions ( #2991 )
2025-07-23 05:10:14 -07:00
chen
172e69fe17
FA3 fix bug ( #2987 )
2025-07-23 19:07:43 +08:00
zhink
1272c7ce98
Fix performance degradation bug of custom_all_reduce ( #2981 )
2025-07-23 17:45:44 +08:00
Zero Rains
850c9d98d4
[BugFix] Add prefill restrictions for chunked_prefill+VL ( #2983 )
2025-07-23 01:45:57 -07:00
freeliuzc
a39a67334c
fix mtp bug in pd-split mode ( #2970 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-23 15:31:16 +08:00
lizexu123
9b22b8d2c3
delete max-len ( #2959 )
2025-07-23 15:11:39 +08:00
chen
ad202272ed
【Infer】Improve the performance block_wise_fp8 of triton_moe_backend ( #2942 )
2025-07-23 13:02:50 +08:00
lizhenyun01
e51f018577
support chunk_prefill in fa3
2025-07-23 12:19:20 +08:00
Ryan
95b5af24db
[SOT] Add sot warmup (NVIDIA GPU Only) ( #2929 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add sot warmup
* fix code style
* change batch_size list
* add param to config
* rm free_list settings && set sot_warmup_sizes
* finish debug with dynamic dims by type annotations
* add profile_run guard
* rm sth useless
2025-07-22 21:36:14 +08:00
Sunny-bot1
7c5e34e72d
[FIX]fix rejection sampling when topp=0 using _SAMPLING_EPS ( #2967 )
...
* fix rejection sampling when topp=0
* fix
2025-07-22 05:53:37 -07:00
gaoziyuan
dbe6225b33
fix rl config local rank ( #2957 )
2025-07-22 04:39:54 -07:00
GoldPancake
9b84d51e25
[MTP Fix] Fix code and register cpp operators ( #2965 )
2025-07-22 19:36:24 +08:00
K11OntheBoat
93bb68aa71
[Feature] Marlin MoE backend supports DeepseekV3 ( #2962 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-07-22 18:11:15 +08:00
GoldPancake
dc67c10a7e
[Feature][MTP]Support multi-step MTP ( #2952 )
2025-07-22 16:26:29 +08:00
luukunn
920e6b3f60
[Fix]fix empty prompt_token_ids,update the parser's triggering condit… ( #2891 )
2025-07-22 16:13:05 +08:00
Zero Rains
89a485b69f
[Feature] Support using prefix-caching + cudagraph for inference ( #2924 )
...
* fix the bug in cudagraph+prefix-caching but still have some bug with profile
Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397
* add the signal to make sure cache manager launched
* fix judge condition
* reomove useless control
* update control stream
* update
* fix xpu
* change the do_profile flag
* update
* add new threads to init cache_manager
---------
Co-authored-by: RAM <gstian5555@outlook.com >
2025-07-22 00:59:45 -07:00
Nyakku Shigure
48e6a0ca26
[SOT] Mark dynamic dims by type annotations ( #2771 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [SOT] Mark dynamic dims by type annotations
* fix conflict of forward_meta
* mark more attn backend
* fix missing annotated and add env SOT_SPECIALIZED_DIM_NUMBERS
* auto infer implicit 0 dim dynamic dim
* revert manual marked dims
* revert missing update
* auto infer can use unsafe code in warmup stage
* check -> type_match
* fix codestyle
* restore blank line
* empty commit
* add need_warmup nonlocal;
* add doc for resolver
* add missing type hints
* unquote "ForwardMeta"
2025-07-22 00:23:52 -07:00
李泳桦
2a8a2c06de
[fix] non-streaming api now returns full output ids if return_token_ids is enabled ( #2951 )
2025-07-22 14:35:56 +08:00
lifulll
2c6a9e887e
native top_p_sampling ( #2901 )
2025-07-22 14:09:59 +08:00
gaoziyuan
0eedbdaee0
fix import error ( #2944 )
2025-07-22 14:06:01 +08:00
K11OntheBoat
8020927f50
[BugFix] Rename attention params of deepseekv3 ( #2939 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-07-22 14:01:30 +08:00
Jiang-Jia-Jun
56102e91e1
[Polish] Return error message of raw_request ( #2946 )
...
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com >
2025-07-22 10:21:32 +08:00
zhink
0262ef7eb3
custom all reduce support cuda graph ( #2938 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support enabling cuda graph and custom all reduce at the same time, and fix the overwritten custom all reduce flag
* rename communication_op to communication
2025-07-21 22:52:03 +08:00
周周周
ff4569f135
remove some code in ep.py ( #2947 )
2025-07-21 22:44:57 +08:00