FastDeploy/fastdeploy/model_executor/layers/moe at 4c76171b57f3f763b17c5cd8fa111bcfec8ae6cb - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

History

lzy 04b2c43806 [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5316 )

2025-12-02 13:03:55 +08:00

..

__init__.py

support w4afp8 EP inference (#3044 )

2025-08-25 11:27:45 +08:00

ep.py

[Feature] Support noaux for eplb (#5143 )

2025-11-21 14:10:32 +08:00

fused_moe_backend_base.py

refactor pt loading (#4532 )

2025-11-11 21:30:39 +08:00

fused_moe_cutlass_backend.py

[Fix] Fix eplb bug and support fp8 load weight (#5178 )

2025-11-24 15:31:37 +08:00

fused_moe_deepgemm_backend.py

[Fix] Fix eplb bug and support fp8 load weight (#5178 )

2025-11-24 15:31:37 +08:00

fused_moe_marlin_backend.py

[noauxtc_kernel] remove useless code (#4643 )

2025-10-30 18:59:04 +08:00

fused_moe_triton_backend.py

[RL]Resolve shape mismatch problems in RL-related modules (#5032 )

2025-11-19 11:12:48 +08:00

fused_moe_wint2_backend.py

mv import (#5146 )

2025-11-20 19:25:56 +08:00

moe.py

[Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5316 )

2025-12-02 13:03:55 +08:00

triton_moe_kernels.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00