xiaoxiaohehe001
6c15945e4d
fix_fa3 ( #4429 )
2025-10-15 16:19:39 +08:00
xiaoxiaohehe001
9f1882d9a8
fa3_rope ( #4190 )
2025-09-21 22:04:59 +08:00
xiaoxiaohehe001
5223065d59
support mm mtp ( #4013 )
2025-09-09 13:55:45 +08:00
freeliuzc
c753f1fc9e
[Feature][MTP]Support new mtp ( #3656 )
...
* update multi-draft-token strategy
* fix format
* support hybrid mtp with ngram speculative decoding method
2025-08-27 19:38:26 +08:00
Yuan Xiaolan
62659a7a73
support w4afp8 offline quant ( #3438 )
2025-08-15 17:32:12 +08:00
xiaoxiaohehe001
4f17f9aa6e
add w4a8 online quant eplb
2025-08-15 12:54:08 +08:00
Yuan Xiaolan
2513cd929b
support w4afp8 EP inference ( #3382 )
2025-08-13 21:41:34 +08:00
xiaoxiaohehe001
4dbaa3d74c
Fix w4a8 scale load ( #3334 )
...
* fix_eplb
* fix eplb part3
* support_fp8_rope3d
* fix w4a8 scale
2025-08-11 21:02:42 +08:00
xiaoxiaohehe001
794ab9705f
Fix eplb part3 ( #3206 )
...
* fix_eplb
* fix eplb part3
2025-08-05 10:58:17 +08:00
xiaoxiaohehe001
869626b0f4
fix_eplb ( #3160 )
2025-08-03 01:50:07 +08:00
freeliuzc
9307f2619b
【Fix】【MTP】fix mtp bug ( #3140 )
2025-08-01 15:45:00 +08:00
carryyu
fbe03866d1
fix eplb part 1
2025-07-31 17:11:48 +08:00
Yuan Xiaolan
89ad20bea2
fix w4a8 scale ( #3115 )
2025-07-31 16:50:06 +08:00
Yuan Xiaolan
02398135a8
fix is_permuted ( #3100 )
2025-07-30 22:35:22 +08:00
Yuan Xiaolan
d65a0a6a2c
support W4A8 EPLB ( #3075 ) ( #3094 )
2025-07-30 19:46:42 +08:00
Yuan Xiaolan
3214fb5393
support model loading for w4a8 offline quant ( #3064 )
...
支持W4A8 EP 对离线量化权重的load
2025-07-29 21:54:37 +08:00
Longzhi Wang
be0a0f2bb2
fix arguement error in ep when pd ( #3060 )
2025-07-29 17:17:24 +08:00
YuanRisheng
502ee92a0a
Unify server-side and model-side Config (Part3) ( #3047 )
...
* merge model config
* fix arch
* fix rl
2025-07-29 17:07:44 +08:00
Longzhi Wang
907d561523
fix ep when paddle version mismatch ( #3056 )
2025-07-29 15:06:49 +08:00
JYChen
dafe02a7b9
[stop sequence] support stop sequence ( #3025 )
...
* stop seqs in multi-ends
* unittest for gpu stop op
* kernel tid==0
2025-07-29 14:17:37 +08:00
YuanRisheng
1a815b7a2a
Fix Speculative Config bug ( #3049 )
...
* fix speculative bug
* fix rl
2025-07-29 10:50:48 +08:00
yinwei
f2a528f9ae
[XPU] Support kvblock centralized management ( #3017 )
2025-07-29 10:40:55 +08:00
Yuan Xiaolan
b1d787a272
[fix] w4a8 model loading and hadamard config ( #3013 )
2025-07-28 18:17:59 +08:00
K11OntheBoat
83048bbe55
[Feature] Deepseekv3 supports cudagraph ( #3041 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-07-28 17:12:54 +08:00
AIbin
ec52d39e68
【Inference Optimize】Update wint2 weight n-dim reorder ( #3042 )
2025-07-28 16:31:56 +08:00
YuanRisheng
bddf403576
Unify server-side and model-side Config (Part2) ( #3035 )
...
* merge speculative and graph opt conifg
* add attr
2025-07-28 15:31:48 +08:00
chen
01485cd28b
MTP rejection_topp add topk input ( #3031 )
2025-07-28 13:58:45 +08:00
begin2023
dd877f38b1
[Perf] Remove unnecessary operations in non-cuda_graph ( #3010 )
...
* [Perf] Remove unnecessary operations in non-cuda_graph
* fix code logic
* use suggestion comment
* reduce function call
* reduce function call
* reduce function call
* reduce function call
2025-07-27 20:38:29 -07:00
Longzhi Wang
247010d298
fix arguement error ( #3030 )
2025-07-28 11:03:29 +08:00
YuanRisheng
6ccc10ad47
Unify server-side and model-side Config (Part1) ( #3018 )
...
* move cache config
* fix mtp
2025-07-28 10:51:52 +08:00
李泳桦
69996a40da
[feat] add disable_chat_template in chat api as a substitute for previous raw_request ( #3020 )
...
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request
* [fix] pre-commit code check
2025-07-25 20:57:32 +08:00
Longzhi Wang
0700c90caa
[Feat] support mixed ep ( #2969 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* Support mixed ep
* fix comment
* fix comment
* update mixep
* fix conflict
* fix typo
* update
* fix typo
* fix code style
* fix conflict
2025-07-25 15:29:30 +08:00
chen
332154f504
[feature] Support FA2 ( #3009 )
2025-07-25 14:09:00 +08:00
EnflameGCU
8c167e130c
[GCU] Update post_process ( #3012 )
2025-07-25 11:03:03 +08:00
xiaoxiaohehe001
2970b00dfa
[Feature] Support_eplb ( #2997 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Feature] support_eplb
* [Feature] support_eplb
* [Fix] fix mm ep
2025-07-24 20:22:45 +08:00
littledgg
f37d00e856
[Model] Provide clearer error for missing KV cache quantization scales ( #3007 )
2025-07-24 20:15:00 +08:00
EnflameGCU
c40df1802e
[GCU] Update to develop ( #2988 )
2025-07-24 19:30:52 +08:00
Yzc216
980126b83a
[Feature] multi source download ( #3005 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
* Change default download
* change requirements.txt
* modify English Documentation
* documentation
2025-07-24 17:42:09 +08:00
Zero Rains
0fb37ab7e4
update flake8 version to support pre-commit in python3.12 ( #3000 )
...
* update flake8 version to support pre-commit in python3.12
* polish code
2025-07-24 01:43:31 -07:00
ltd0924
f935d6f862
[BugFix] fix multinode deployment ( #2977 )
2025-07-24 15:04:04 +08:00
ltd0924
3792345c3a
[LLM] update function name ( #2985 )
...
* [LLM] update function name
2025-07-24 15:03:40 +08:00
Yzc216
e14587a954
[Feature] multi-source download ( #2986 )
...
* multi-source download
* multi-source download
* huggingface download revision
* requirement
* style
* add revision arg
* test
* pre-commit
2025-07-24 14:26:37 +08:00
xiaoxiaohehe001
2c0ff068e2
[Fix] fix mm ep empty run ( #2999 )
2025-07-24 14:15:55 +08:00
lizhenyun01
29c3292f02
support c4 attn && fix cache
2025-07-24 12:00:52 +08:00
lizexu123
832d25334a
[Code Simplification] fix init_distributed_environment() ( #2982 )
2025-07-24 11:43:28 +08:00
bukejiyu
bfeb664ab8
update ( #2978 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
chenjian
85a78d695d
[Feature] Support block scheduler v1 for FD ( #2928 )
...
* Support FD block scheduler v1
* Support FD block scheduler v1
* Support FD block scheduler v1
* Fix according to copilot review
* Fix according to review
* Remove is_dummy
* Fix bug when real_bsz=1
* Fix infer first token cost time
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-23 20:31:31 +08:00
Zero Rains
ca0f71bd39
polish code for prefill restrictions ( #2991 )
2025-07-23 05:10:14 -07:00
chen
172e69fe17
FA3 fix bug ( #2987 )
2025-07-23 19:07:43 +08:00
zhink
1272c7ce98
Fix performance degradation bug of custom_all_reduce ( #2981 )
2025-07-23 17:45:44 +08:00