FastDeploy/fastdeploy/model_executor/layers/moe at 690bcb8e5097c27284f9e22adfeb102df9ef8708 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

History

lzy 690bcb8e50 [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5315 )

2025-12-03 13:33:15 +08:00

..

__init__.py

support w4afp8 EP inference (#3044 )

2025-08-25 11:27:45 +08:00

ep.py

[Quantization] Support w4afp8 MoE dynamic quantization (#5282 )

2025-12-02 18:56:16 +08:00

fused_moe_backend_base.py

[Intel HPU] change MoE weights and scales from list to tensor and add… (#5289 )

2025-11-28 19:17:05 +08:00

fused_moe_cutlass_backend.py

[Quantization] Support w4afp8 MoE dynamic quantization (#5282 )

2025-12-02 18:56:16 +08:00

fused_moe_deepgemm_backend.py

[Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247 )

2025-11-26 05:09:09 -08:00

fused_moe_marlin_backend.py

[Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247 )

2025-11-26 05:09:09 -08:00

fused_moe_triton_backend.py

[Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247 )

2025-11-26 05:09:09 -08:00

fused_moe_wint2_backend.py

[Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247 )

2025-11-26 05:09:09 -08:00

moe.py

[Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5315 )

2025-12-03 13:33:15 +08:00

triton_moe_kernels.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00