FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

Neil Zhu 4403a21d4b [Metax] refactor cutlass moe and optimize flash attention (#5361 )

* [Metax] refactor moe and flash attention backend
---------

Co-authored-by: zhangchenyi_dl <16219492+zhangchenyidl@user.noreply.gitee.com>

2025-12-10 17:15:17 +08:00

air_top_p_sampling.cu

2025-12-09 01:44:02 -08:00

min_p_sampling_from_probs.cu

2025-07-20 23:17:59 -07:00

rejection_top_p_sampling.cu

2025-08-14 22:40:44 +08:00

sampling.cuh

2025-12-10 17:15:17 +08:00

top_k_renorm_probs.cu

2025-07-09 20:58:58 -07:00

utils.cuh

2025-12-10 17:15:17 +08:00