mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
【New Feature】W4afp8 supports per group quantization (#4272)
* w4afp8 支持per group * code style * 精度完成 * revert append attn utils * ffn1 动态量化 * ffn2 支持动态量化 * code style * code style * 修改单测 * 修改单测 * fix bug * Implement conditional parameter creation for layers Add parameter creation for up_gate_proj_in_scale when ep_size > 1. * code style * fix conflict * code style * code style * 修复w4aint8 精度 * fix ci --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
This commit is contained in:
@@ -304,6 +304,7 @@ paddle::Tensor MoeExpertFFNFunc(
|
||||
const paddle::Tensor& tokens_expert_prefix_sum,
|
||||
const paddle::Tensor& up_gate_proj_weight,
|
||||
const paddle::Tensor& down_proj_weight,
|
||||
const paddle::optional<paddle::Tensor>& up_proj_in_scale,
|
||||
const paddle::optional<paddle::Tensor>& up_gate_proj_bias,
|
||||
const paddle::optional<paddle::Tensor>& up_gate_proj_scale,
|
||||
const paddle::optional<paddle::Tensor>& down_proj_scale,
|
||||
|
||||
Reference in New Issue
Block a user