FastDeploy/layers at b1a5b756a3566ff65bc49c7ec195db0d36fdbd08 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-16 05:30:58 +08:00

Files

History

Sunny-bot1 b1a5b756a3 [Optimize] Support WINT8 and group scale for Machete (#3905 )

2025-09-15 12:01:34 +08:00

..

fix typos (#3840 )

2025-09-12 11:04:38 +08:00

[Feature] refactor metax_gpu attention and moe and remove some useless code (#3688 )

2025-09-12 14:40:25 +08:00

[Feature] GLM-45-AIR Support Mix Quantization(Dense wfp8afp8 and wint8 triton_moe_backend) (#4051 )

2025-09-11 20:08:09 +08:00

[Optimize] Support WINT8 and group scale for Machete (#3905 )

2025-09-15 12:01:34 +08:00

fix typos (#3840 )

2025-09-12 11:04:38 +08:00

__init__.py

[LLM] First commit the llm deployment code

2025-06-09 19:20:15 +08:00

activation.py

[Polish Code] Remove useless notes

2025-08-14 14:04:52 +08:00

embeddings.py

[Feature] ernie4_5_vl_moe support huggingface safetensor loading (#3750 )

2025-09-03 02:58:59 -07:00

linear.py

【Inference Optimize】Update MergedReplicatedLinear for DSK qkv_a_proj_with_mqa. (#3673 )

2025-09-04 21:16:05 -07:00

lm_head.py

fix typos (#3840 )

2025-09-12 11:04:38 +08:00

mtp_linear.py

support tmp (#3675 )

2025-08-28 19:42:32 +08:00

normalization.py

adaptive rms_norm's dtype (#3617 )

2025-08-26 15:29:15 +08:00

rotary_embedding.py

[Feature] refactor metax_gpu attention and moe and remove some useless code (#3688 )

2025-09-12 14:40:25 +08:00

utils.py

fix mem boom in ep (#3854 )

2025-09-05 11:48:21 +08:00