Ayakouji
|
453487d5b0
|
[Feat] ernie4_5_vl_moe support CudaGraph (#3226)
* delete dynamic control flow for decode
* coda-style
* fix scatter/gather typos and use input stream instead default stream
* support 0-Size Tensor
* update runner and model
* using static mem address as input
* fix mem leak
* refine code
* update mm_buffer
* fix typo
* fix buffersize
* fix unk token
* refine code
* refine
* support other arch
* open cudagraph in vlci
* fix
* update
* update
* update
* fix cmd
* update
---------
Co-authored-by: aquagull <hongyuh@qq.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
|
2025-09-10 13:11:57 +08:00 |
|
周周周
|
dbab579299
|
clean code (#4020)
|
2025-09-10 10:56:15 +08:00 |
|
co63oc
|
d6369b4d51
|
fix typos (#3684)
|
2025-09-01 17:50:17 +08:00 |
|
yangjianfengo1
|
e81046fdad
|
【New Feature】集中式支持w4afp8 (#3644)
* 支持tp w4afp8
* code style
|
2025-08-28 10:53:24 +08:00 |
|
Zero Rains
|
25698d56d1
|
polish code with new pre-commit rule (#2923)
|
2025-07-19 23:19:27 +08:00 |
|
Jiang-Jia-Jun
|
92c2cfa2e7
|
Sync v2.0 version of code to github repo
|
2025-06-29 23:29:37 +00:00 |
|
jiangjiajun
|
684703fd72
|
[LLM] First commit the llm deployment code
|
2025-06-09 19:20:15 +08:00 |
|