FastDeploy

apps/FastDeploy

Fork 0

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

History

zhupengyang 3a43dbf82d [XPU] merge apply_tp, ops support token_num = 0 (#4507 )

2025-10-23 19:09:58 +08:00

attention

Support GPT-OSS-BF16 (#4240 )

2025-10-20 14:44:58 +08:00

backends

[XPU] merge apply_tp, ops support token_num = 0 (#4507 )

2025-10-23 19:09:58 +08:00

moe

【BugFix】fix ep buffer clear (#4450 )

2025-10-21 10:56:00 +08:00

pool

[Feature] support pooling model dummy_run (#4345 )

2025-10-17 13:30:55 +08:00

quantization

WINT4/WINT8 dense gemm default use Machete (#4451 )

2025-10-23 17:57:59 +08:00

sample

[Feature] support mtp logprob (#4464 )

2025-10-20 15:18:12 +08:00

__init__.py

…

activation.py

[Intel HPU] Support intel hpu platform (#4161 )

2025-09-24 12:27:50 +08:00

embeddings.py

[Feature] support pooling model dummy_run (#4345 )

2025-10-17 13:30:55 +08:00

linear.py

add qwen-2.5-7B-PRM/ernie-rm (#4319 )

2025-10-20 15:31:03 +08:00

lm_head.py

…

mtp_linear.py

…

normalization.py

…

pooler.py

[Feature] support pooling model dummy_run (#4345 )

2025-10-17 13:30:55 +08:00

rotary_embedding.py

Support GPT-OSS-BF16 (#4240 )

2025-10-20 14:44:58 +08:00

utils.py

[OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238 )

2025-09-24 16:39:51 +08:00