FastDeploy/fastdeploy/model_executor/layers at bf03b6fceac8cd0f989edc6169a1ddefee437abb - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

History

yinwei bf03b6fcea fix vl bug (#4485 )

2025-10-20 20:13:34 +08:00

..

Support GPT-OSS-BF16 (#4240 )

2025-10-20 14:44:58 +08:00

fix vl bug (#4485 )

2025-10-20 20:13:34 +08:00

Support GPT-OSS-BF16 (#4240 )

2025-10-20 14:44:58 +08:00

[Feature] support pooling model dummy_run (#4345 )

2025-10-17 13:30:55 +08:00

[BugFix]Fix wfp8afp8 triton moe group_topk renormalized=True (#4449 )

2025-10-16 23:17:48 +08:00

[Feature] support mtp logprob (#4464 )

2025-10-20 15:18:12 +08:00

__init__.py

[LLM] First commit the llm deployment code

2025-06-09 19:20:15 +08:00

activation.py

[Intel HPU] Support intel hpu platform (#4161 )

2025-09-24 12:27:50 +08:00

embeddings.py

[Feature] support pooling model dummy_run (#4345 )

2025-10-17 13:30:55 +08:00

linear.py

add qwen-2.5-7B-PRM/ernie-rm (#4319 )

2025-10-20 15:31:03 +08:00

lm_head.py

[Feature] support qwen3-embedding model load (#4202 )

2025-09-23 00:14:35 -07:00

mtp_linear.py

support tmp (#3675 )

2025-08-28 19:42:32 +08:00

normalization.py

adaptive rms_norm's dtype (#3617 )

2025-08-26 15:29:15 +08:00

pooler.py

[Feature] support pooling model dummy_run (#4345 )

2025-10-17 13:30:55 +08:00

rotary_embedding.py

Support GPT-OSS-BF16 (#4240 )

2025-10-20 14:44:58 +08:00

utils.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00