FastDeploy/quantization at 7568b20098ae71fecf64af6bcef62c56b2b2a727 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 08:37:06 +08:00

Files

History

周周周 17b414c2df MoE Default use triton's blockwise fp8 in TP Case (#3678 )

2025-08-29 11:07:30 +08:00

..

[Optimize]support machete weight only gemm (#3561 )

2025-08-28 09:49:58 +08:00

__init__.py

Sync v2.0 version of code to github repo

2025-06-29 23:29:37 +00:00

block_wise_fp8.py

MoE Default use triton's blockwise fp8 in TP Case (#3678 )

2025-08-29 11:07:30 +08:00

kv_cache.py

support c4 attn && fix cache

2025-07-24 12:00:52 +08:00

mix_quant.py

[v1 loader]support fp8 (#3593 )

2025-08-26 02:42:46 -07:00

quant_base.py

polish code with new pre-commit rule (#2923 )

2025-07-19 23:19:27 +08:00

tensor_wise_fp8.py

[NewFeatures] support eplb (#3547 )

2025-08-26 16:19:30 +08:00

w4a8.py

fix is_permuted (#3098 )

2025-07-31 19:58:05 +08:00

w4afp8.py

support w4afp8 EP inference (#3044 )

2025-08-25 11:27:45 +08:00

w8a8.py

qwen3_moe (#3084 )

2025-08-06 14:45:27 +08:00

weight_only.py

[Optimize]support machete weight only gemm (#3561 )

2025-08-28 09:49:58 +08:00

wfp8afp8.py

qwen3_moe (#3084 )

2025-08-06 14:45:27 +08:00

wint2.py

polish code with new pre-commit rule (#2923 )

2025-07-19 23:19:27 +08:00