Files
FastDeploy/custom_ops/gpu_ops
yangjianfengo1 93fcf7e4ec 【New Feature】W4afp8 supports per group quantization (#4272)
* w4afp8 支持per group

* code style

* 精度完成

* revert append attn utils

* ffn1 动态量化

* ffn2 支持动态量化

* code style

* code style

* 修改单测

* 修改单测

* fix bug

* Implement conditional parameter creation for layers

Add parameter creation for up_gate_proj_in_scale when ep_size > 1.

* code style

* fix conflict

* code style

* code style

* 修复w4aint8 精度

* fix ci

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-11-05 21:00:23 +08:00
..
2025-11-01 19:13:50 +08:00
2025-09-01 17:50:17 +08:00
2025-09-01 17:50:17 +08:00
2025-09-01 17:50:17 +08:00
2025-10-24 10:14:53 +08:00
2025-10-31 21:25:11 +08:00
2025-09-01 17:50:17 +08:00
2025-09-01 17:50:17 +08:00
2025-07-09 18:56:27 +08:00
2025-09-01 17:50:17 +08:00
2025-07-07 16:53:14 +08:00
2025-09-01 17:50:17 +08:00
2025-09-01 17:50:17 +08:00