FastDeploy/fastdeploy/model_executor/layers at 54b458fd980aa87c9fca849553eb78361b22eb24 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

History

co63oc ce998449e0 fix w8a8.py (#3733 )

2025-09-03 10:57:26 +08:00

..

【Fix bug] w4afp8 的nblock固定为256，并且fa3的append attn 增加mask参数 (#3771 )

2025-09-02 19:17:01 +08:00

fix typos (#3684 )

2025-09-01 17:50:17 +08:00

[v1loader]Reduce EB300B model loading time (#3700 )

2025-09-02 19:13:57 +08:00

fix w8a8.py (#3733 )

2025-09-03 10:57:26 +08:00

[Feature] mm and thinking model support structred output (#2749 )

2025-09-02 16:21:09 +08:00

__init__.py

[LLM] First commit the llm deployment code

2025-06-09 19:20:15 +08:00

activation.py

[Polish Code] Remove useless notes

2025-08-14 14:04:52 +08:00

embeddings.py

Supports DP+TP+EP hybrid parallel deployment strategy (#3489 )

2025-08-26 00:04:01 -07:00

linear.py

[v1loader]Reduce EB300B model loading time (#3700 )

2025-09-02 19:13:57 +08:00

lm_head.py

[Precision] Support lm_head layer running in float32 (#3597 )

2025-08-27 11:34:53 +08:00

mtp_linear.py

support tmp (#3675 )

2025-08-28 19:42:32 +08:00

normalization.py

adaptive rms_norm's dtype (#3617 )

2025-08-26 15:29:15 +08:00

rotary_embedding.py

[Model]support qwen2_5_vl (#3557 )

2025-08-29 18:28:39 +08:00

utils.py

add dtype int32 (#3692 )

2025-08-29 14:56:35 +08:00