YuBaoku
|
819b2dbbae
|
Revert "【New Feature】W4afp8 supports per group quantization (#4272)" (#4854)
This reverts commit 93fcf7e4ec.
|
2025-11-06 17:48:28 +08:00 |
|
yangjianfengo1
|
93fcf7e4ec
|
【New Feature】W4afp8 supports per group quantization (#4272)
* w4afp8 支持per group
* code style
* 精度完成
* revert append attn utils
* ffn1 动态量化
* ffn2 支持动态量化
* code style
* code style
* 修改单测
* 修改单测
* fix bug
* Implement conditional parameter creation for layers
Add parameter creation for up_gate_proj_in_scale when ep_size > 1.
* code style
* fix conflict
* code style
* code style
* 修复w4aint8 精度
* fix ci
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
|
2025-11-05 21:00:23 +08:00 |
|
Zhenghai Zhang
|
1712e1351b
|
【Hackathon 9th No.86】autogen MoeFastHardamardImplWrapper template_instantiation (#4592)
* autogen MoeFastHardamardImplWrapper template_instantiation
* fix codestyle
* fix codestyle
* add impl cu files
|
2025-10-30 10:28:36 +08:00 |
|
Haonan Luo
|
1b9f351d21
|
Support GPT-OSS-BF16 (#4240)
* [Feature] AppendAtten support sinks & HEAD_DIM=64
* fix bug
* fix bug
* fix bug
* fix bug
* [Feature] support gpt-oss
* fix bug
* add mask
* support-gpt-oss
* support-gpt-oss
* fix long seq
* support wint8
* support wint8
* support wint8
* update test
* change sliding windows init pos
---------
Co-authored-by: ming1753 <ideaminghp@163.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
|
2025-10-20 14:44:58 +08:00 |
|
Ayakouji
|
453487d5b0
|
[Feat] ernie4_5_vl_moe support CudaGraph (#3226)
* delete dynamic control flow for decode
* coda-style
* fix scatter/gather typos and use input stream instead default stream
* support 0-Size Tensor
* update runner and model
* using static mem address as input
* fix mem leak
* refine code
* update mm_buffer
* fix typo
* fix buffersize
* fix unk token
* refine code
* refine
* support other arch
* open cudagraph in vlci
* fix
* update
* update
* update
* fix cmd
* update
---------
Co-authored-by: aquagull <hongyuh@qq.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
|
2025-09-10 13:11:57 +08:00 |
|
Yuan Xiaolan
|
2cf55168ca
|
load hadamard_block_size from config (#3797)
|
2025-09-05 17:07:58 +08:00 |
|
yangjianfengo1
|
e81046fdad
|
【New Feature】集中式支持w4afp8 (#3644)
* 支持tp w4afp8
* code style
|
2025-08-28 10:53:24 +08:00 |
|
Yuan Xiaolan
|
9205c88da1
|
support w4afp8 EP inference (#3044)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
|
2025-08-25 11:27:45 +08:00 |
|
Yuan Xiaolan
|
7d87aaace8
|
optimize w4a8 decoding (#3050)
|
2025-07-28 22:20:13 +08:00 |
|
Yuanle Liu
|
61b3997b85
|
refactor rl get_name_mappings_to_training (#2847)
Deploy GitHub Pages / deploy (push) Has been cancelled
* refactor rl get_name_mappings_to_training
* fix tp>1
* change variable name(ffn1->up_gate_proj/ffn2->down_proj)
* change variable name(linear_weight->weight/linear_bias->bias)
* add rl names mapping for vl
* fix ernie 0.3B error
* fix develop code
* fix
|
2025-07-15 07:31:42 -07:00 |
|
Jiang-Jia-Jun
|
92c2cfa2e7
|
Sync v2.0 version of code to github repo
|
2025-06-29 23:29:37 +00:00 |
|
jiangjiajun
|
684703fd72
|
[LLM] First commit the llm deployment code
|
2025-06-09 19:20:15 +08:00 |
|