Commit Graph

11 Commits

Author SHA1 Message Date
Jundong Liu
147b2e5eb0 [BugFix] Fix zero workspace returned by CUB size query under CUDA Graph in MoE dispatch (#5087)
* fix bug about CubKeyValueSorter::run

* pre-commit and add comment

* pre-commit

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix precommit

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-11-20 20:00:29 +08:00
yangjianfengo1
ae7bee8122 【New Feature】W4afp8 supports per group quantization (#4987)
* w4afp8 支持per group

* code style

* fix transpose

* revert fast hardmard

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-11-13 19:17:27 +08:00
YuBaoku
819b2dbbae Revert "【New Feature】W4afp8 supports per group quantization (#4272)" (#4854)
This reverts commit 93fcf7e4ec.
2025-11-06 17:48:28 +08:00
yangjianfengo1
93fcf7e4ec 【New Feature】W4afp8 supports per group quantization (#4272)
* w4afp8 支持per group

* code style

* 精度完成

* revert append attn utils

* ffn1 动态量化

* ffn2 支持动态量化

* code style

* code style

* 修改单测

* 修改单测

* fix bug

* Implement conditional parameter creation for layers

Add parameter creation for up_gate_proj_in_scale when ep_size > 1.

* code style

* fix conflict

* code style

* code style

* 修复w4aint8 精度

* fix ci

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-11-05 21:00:23 +08:00
Ayakouji
453487d5b0 [Feat] ernie4_5_vl_moe support CudaGraph (#3226)
* delete dynamic control flow for decode

* coda-style

* fix scatter/gather typos and use input stream instead default stream

* support 0-Size Tensor

* update runner and model

* using static mem address as input

* fix mem leak

* refine code

* update mm_buffer

* fix typo

* fix buffersize

* fix unk token

* refine code

* refine

* support other arch

* open cudagraph in vlci

* fix

* update

* update

* update

* fix cmd

* update

---------

Co-authored-by: aquagull <hongyuh@qq.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-09-10 13:11:57 +08:00
周周周
dbab579299 clean code (#4020) 2025-09-10 10:56:15 +08:00
co63oc
d6369b4d51 fix typos (#3684) 2025-09-01 17:50:17 +08:00
yangjianfengo1
e81046fdad 【New Feature】集中式支持w4afp8 (#3644)
* 支持tp w4afp8

* code style
2025-08-28 10:53:24 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun
92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00
jiangjiajun
684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00