FastDeploy/fastdeploy/model_executor/layers/moe at 75294bcfb1589e595e0a6effaff5b4a17d75fbbe - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

History

yzwu 3707af7a4f [Iluvatar] add vl into ci and support v1 loader (#4774 )

2025-11-11 10:50:17 +08:00

..

__init__.py

support w4afp8 EP inference (#3044 )

2025-08-25 11:27:45 +08:00

ep.py

[DeepEP] support P async_finish (#4899 )

2025-11-10 18:24:02 +08:00

fused_moe_backend_base.py

[Bug Fix] fix bug for PD EP (#4823 )

2025-11-10 15:33:29 +08:00

fused_moe_cutlass_backend.py

Revert "【New Feature】W4afp8 supports per group quantization (#4272 )" (#4854 )

2025-11-06 17:48:28 +08:00

fused_moe_deepgemm_backend.py

[DeepEP] support P async_finish (#4899 )

2025-11-10 18:24:02 +08:00

fused_moe_marlin_backend.py

[noauxtc_kernel] remove useless code (#4643 )

2025-10-30 18:59:04 +08:00

fused_moe_triton_backend.py

[BugFix]Dev fix custom ar unstable result (#4437 )

2025-10-17 11:47:16 +08:00

fused_moe_wint2_backend.py

Revert "【New Feature】W4afp8 supports per group quantization (#4272 )" (#4854 )

2025-11-06 17:48:28 +08:00

moe.py

[Iluvatar] add vl into ci and support v1 loader (#4774 )

2025-11-11 10:50:17 +08:00

triton_moe_kernels.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00