FastDeploy/moe at 87179cb744e5fb35e2190b19142d249ff970bd92 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-29 19:12:30 +08:00

Files

History

chen 7c1fd19f0f [OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00

..

__init__.py

support w4afp8 EP inference (#3044 )

2025-08-25 11:27:45 +08:00

ep.py

Fix noaux_tc cuda Error 700 in CUDAGraph (#4174 )

2025-09-23 18:41:33 +08:00

fused_moe_backend_base.py

[BugFix]fix v1 loader moe bf16, and supoort dynamic_load_weight create quant param (#4229 )

2025-09-24 14:12:05 +08:00

fused_moe_cutlass_backend.py

Fix noaux_tc cuda Error 700 in CUDAGraph (#4174 )

2025-09-23 18:41:33 +08:00

fused_moe_deepgemm_backend.py

Fix noaux_tc cuda Error 700 in CUDAGraph (#4174 )

2025-09-23 18:41:33 +08:00

fused_moe_marlin_backend.py

Fix noaux_tc cuda Error 700 in CUDAGraph (#4174 )

2025-09-23 18:41:33 +08:00

fused_moe_triton_backend.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00

fused_moe_wint2_backend.py

【New Feature】集中式支持w4afp8 (#3644 )

2025-08-28 10:53:24 +08:00

moe.py

[Intel HPU] Support intel hpu platform (#4161 )

2025-09-24 12:27:50 +08:00

triton_moe_kernels.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00