FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

yangjianfengo1 ae7bee8122 【New Feature】W4afp8 supports per group quantization (#4987 )

* w4afp8 支持per group

* code style

* fix transpose

* revert fast hardmard

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>

2025-11-13 19:17:27 +08:00

__init__.py

support w4afp8 EP inference (#3044 )

2025-08-25 11:27:45 +08:00

ep.py

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

fused_moe_backend_base.py

refactor pt loading (#4532 )

2025-11-11 21:30:39 +08:00

fused_moe_cutlass_backend.py

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

fused_moe_deepgemm_backend.py

refactor pt loading (#4532 )

2025-11-11 21:30:39 +08:00

fused_moe_marlin_backend.py

[noauxtc_kernel] remove useless code (#4643 )

2025-10-30 18:59:04 +08:00

fused_moe_triton_backend.py

[BugFix] fix VL fp8 bug when moe token_num is 0 (#4928 )

2025-11-12 21:19:36 +08:00

fused_moe_wint2_backend.py

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

moe.py

[Iluvatar][CI] fix safetensors_rust.SafetensorError: framework paddle is invalid (#4972 )

2025-11-12 14:13:40 +08:00

triton_moe_kernels.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00