FastDeploy

apps/FastDeploy

Fork 0

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

T

History

周周周 3729e910a6 remove dev sync in prefill (#4598 )

2025-10-27 19:54:43 +08:00

attention

Support GPT-OSS-BF16 (#4240 )

2025-10-20 14:44:58 +08:00

backends

[XPU] bind some OPs for VL model with pybind (#4522 )

2025-10-27 10:50:08 +08:00

moe

remove dev sync in prefill (#4598 )

2025-10-27 19:54:43 +08:00

pool

[Feature] support pooling model dummy_run (#4345 )

2025-10-17 13:30:55 +08:00

quantization

WINT4/WINT8 dense gemm default use Machete (#4451 )

2025-10-23 17:57:59 +08:00

sample

1.fix the bug of draft model with ep 2.fix sampler bug (#4589 )

2025-10-27 17:47:34 +08:00

__init__.py

…

activation.py

[Intel HPU] Support intel hpu platform (#4161 )

2025-09-24 12:27:50 +08:00

embeddings.py

[Feature] support pooling model dummy_run (#4345 )

2025-10-17 13:30:55 +08:00

linear.py

add qwen-2.5-7B-PRM/ernie-rm (#4319 )

2025-10-20 15:31:03 +08:00

lm_head.py

[Feature] support qwen3-embedding model load (#4202 )

2025-09-23 00:14:35 -07:00

mtp_linear.py

support tmp (#3675 )

2025-08-28 19:42:32 +08:00

normalization.py

adaptive rms_norm's dtype (#3617 )

2025-08-26 15:29:15 +08:00

pooler.py

[Feature] support pooling model dummy_run (#4345 )

2025-10-17 13:30:55 +08:00

rotary_embedding.py

[Feature] Support Paddle-OCR (#4396 )

2025-10-24 23:34:30 +08:00

utils.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00