FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

yangjianfengo1 ae7bee8122 【New Feature】W4afp8 supports per group quantization (#4987 )

* w4afp8 支持per group

* code style

* fix transpose

* revert fast hardmard

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>

2025-11-13 19:17:27 +08:00

kernel_traits.h

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

mainloop_fwd.h

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

utils.hpp

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

w4afp8_gemm_kernel.hpp

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

w4afp8_gemm.cu

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

w4afp8_gemm.h

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

weight_kernel.hpp

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

weight_scale_kernel.hpp

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00