【New Feature】W4afp8 supports per group quantization (#4272)

* w4afp8 支持per group

* code style

* 精度完成

* revert append attn utils

* ffn1 动态量化

* ffn2 支持动态量化

* code style

* code style

* 修改单测

* 修改单测

* fix bug

* Implement conditional parameter creation for layers

Add parameter creation for up_gate_proj_in_scale when ep_size > 1.

* code style

* fix conflict

* code style

* code style

* 修复w4aint8 精度

* fix ci

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
This commit is contained in:
yangjianfengo1
2025-11-05 21:00:23 +08:00
committed by GitHub
parent fcd2f05dff
commit 93fcf7e4ec
26 changed files with 4367 additions and 1707 deletions

View File

@@ -304,6 +304,7 @@ paddle::Tensor MoeExpertFFNFunc(
const paddle::Tensor& tokens_expert_prefix_sum,
const paddle::Tensor& up_gate_proj_weight,
const paddle::Tensor& down_proj_weight,
const paddle::optional<paddle::Tensor>& up_proj_in_scale,
const paddle::optional<paddle::Tensor>& up_gate_proj_bias,
const paddle::optional<paddle::Tensor>& up_gate_proj_scale,
const paddle::optional<paddle::Tensor>& down_proj_scale,