FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

Sunny-bot1 3629db4129 [Quantization] Support w4afp8 MoE dynamic quantization (#5282 )

* support dynamic activation quant for w4afp8

* support dynamic w4afp8

* add test

* fix

* fix

---------

Co-authored-by: zhoutianzi666 <17801055074@163.com>

2025-12-02 18:56:16 +08:00

__init__.py

…

ep.py

[Quantization] Support w4afp8 MoE dynamic quantization (#5282 )

2025-12-02 18:56:16 +08:00

fused_moe_backend_base.py

[Intel HPU] change MoE weights and scales from list to tensor and add… (#5289 )

2025-11-28 19:17:05 +08:00

fused_moe_cutlass_backend.py

[Quantization] Support w4afp8 MoE dynamic quantization (#5282 )

2025-12-02 18:56:16 +08:00

fused_moe_deepgemm_backend.py

[Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247 )

2025-11-26 05:09:09 -08:00

fused_moe_marlin_backend.py

[Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247 )

2025-11-26 05:09:09 -08:00

fused_moe_triton_backend.py

[Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247 )

2025-11-26 05:09:09 -08:00

fused_moe_wint2_backend.py

[Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247 )

2025-11-26 05:09:09 -08:00

moe.py

[PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251 )

2025-12-02 14:11:50 +08:00

triton_moe_kernels.py

…