Jundong Liu
|
147b2e5eb0
|
[BugFix] Fix zero workspace returned by CUB size query under CUDA Graph in MoE dispatch (#5087)
* fix bug about CubKeyValueSorter::run
* pre-commit and add comment
* pre-commit
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix precommit
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2025-11-20 20:00:29 +08:00 |
|
yangjianfengo1
|
ae7bee8122
|
【New Feature】W4afp8 supports per group quantization (#4987)
* w4afp8 支持per group
* code style
* fix transpose
* revert fast hardmard
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
|
2025-11-13 19:17:27 +08:00 |
|
YuBaoku
|
819b2dbbae
|
Revert "【New Feature】W4afp8 supports per group quantization (#4272)" (#4854)
This reverts commit 93fcf7e4ec.
|
2025-11-06 17:48:28 +08:00 |
|
yangjianfengo1
|
93fcf7e4ec
|
【New Feature】W4afp8 supports per group quantization (#4272)
* w4afp8 支持per group
* code style
* 精度完成
* revert append attn utils
* ffn1 动态量化
* ffn2 支持动态量化
* code style
* code style
* 修改单测
* 修改单测
* fix bug
* Implement conditional parameter creation for layers
Add parameter creation for up_gate_proj_in_scale when ep_size > 1.
* code style
* fix conflict
* code style
* code style
* 修复w4aint8 精度
* fix ci
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
|
2025-11-05 21:00:23 +08:00 |
|
Ayakouji
|
453487d5b0
|
[Feat] ernie4_5_vl_moe support CudaGraph (#3226)
* delete dynamic control flow for decode
* coda-style
* fix scatter/gather typos and use input stream instead default stream
* support 0-Size Tensor
* update runner and model
* using static mem address as input
* fix mem leak
* refine code
* update mm_buffer
* fix typo
* fix buffersize
* fix unk token
* refine code
* refine
* support other arch
* open cudagraph in vlci
* fix
* update
* update
* update
* fix cmd
* update
---------
Co-authored-by: aquagull <hongyuh@qq.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
|
2025-09-10 13:11:57 +08:00 |
|
周周周
|
dbab579299
|
clean code (#4020)
|
2025-09-10 10:56:15 +08:00 |
|
co63oc
|
d6369b4d51
|
fix typos (#3684)
|
2025-09-01 17:50:17 +08:00 |
|
yangjianfengo1
|
e81046fdad
|
【New Feature】集中式支持w4afp8 (#3644)
* 支持tp w4afp8
* code style
|
2025-08-28 10:53:24 +08:00 |
|
Zero Rains
|
25698d56d1
|
polish code with new pre-commit rule (#2923)
|
2025-07-19 23:19:27 +08:00 |
|
Jiang-Jia-Jun
|
92c2cfa2e7
|
Sync v2.0 version of code to github repo
|
2025-06-29 23:29:37 +00:00 |
|
jiangjiajun
|
684703fd72
|
[LLM] First commit the llm deployment code
|
2025-06-09 19:20:15 +08:00 |
|