FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

yangjianfengo1 ae7bee8122 【New Feature】W4afp8 supports per group quantization (#4987 )

* w4afp8 支持per group

* code style

* fix transpose

* revert fast hardmard

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>

2025-11-13 19:17:27 +08:00

cpu_ops

c++ code format (#4527 )

2025-10-22 17:59:50 +08:00

gpu_ops

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

iluvatar_ops

c++ code format (#4527 )

2025-10-22 17:59:50 +08:00

metax_ops

[Metax] adapt cutlass moe and fix mla attention (#4602 )

2025-11-05 10:03:49 +08:00

third_party

[setup optimize]Support git submodule (#4033 )

2025-09-11 17:41:16 +08:00

utils

【New Feature】W4afp8 supports per group quantization (#4987 )

2025-11-13 19:17:27 +08:00

xpu_ops

[XPU] fix text_image_gather_scatter when image_token_num == token_num && text_token_num == 1 (#4882 )

2025-11-12 17:13:22 +08:00

0001-DeepGEMM-95e81b3.patch

[feat] support fa3 backend for pd disaggregated (#2695 )

2025-07-03 22:33:27 +08:00

MANIFEST.in

[LLM] First commit the llm deployment code

2025-06-09 19:20:15 +08:00

setup_ops_cpu.py

polish code with new pre-commit rule (#2923 )

2025-07-19 23:19:27 +08:00

setup_ops.py

[Iluvatar] add vl into ci and support v1 loader (#4774 )

2025-11-11 10:50:17 +08:00