FastDeploy/moe at 4325b737e763345f86e69463e66d19e5cc6e8512 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-08 10:00:29 +08:00

Files

History

yzwu 504461b6b5 [Iluvatar GPU] Optimize attention performance and fix moe load ckpt error (#3651 )

2025-09-22 21:13:59 +08:00

..

__init__.py

support w4afp8 EP inference (#3044 )

2025-08-25 11:27:45 +08:00

ep.py

[Feature] Support pd ep deployment with yiyan adapter (#4029 )

2025-09-22 16:41:38 +08:00

fused_moe_backend_base.py

[FDConfig]Remove splitwise_role and engine_worker_queue_port in FDConfig (#4147 )

2025-09-19 17:01:52 +08:00

fused_moe_cutlass_backend.py

[Iluvatar GPU] Optimize attention performance and fix moe load ckpt error (#3651 )

2025-09-22 21:13:59 +08:00

fused_moe_deepgemm_backend.py

[NewFeture]add ep rollout model init and update/clear ep buffer (#4039 )

2025-09-17 20:24:53 +08:00

fused_moe_marlin_backend.py

[NewFeture]add ep rollout model init and update/clear ep buffer (#4039 )

2025-09-17 20:24:53 +08:00

fused_moe_triton_backend.py

[v1 loader]qwen Offline fp8 (#4036 )

2025-09-15 13:44:11 +08:00

fused_moe_wint2_backend.py

【New Feature】集中式支持w4afp8 (#3644 )

2025-08-28 10:53:24 +08:00

moe.py

[xpu] support ep (#4067 )

2025-09-15 13:53:11 +08:00

triton_moe_kernels.py

Support 45t fp8 8 GPU (#3659 )

2025-08-28 10:52:53 +08:00