FastDeploy/benchmarks/yaml/qwen3moe235b-32k-wint8-h800-tp4.yaml at 25698d56d1c7e33344a61dbcf1615347b3a1ea80 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

Zero Rains 25698d56d1 polish code with new pre-commit rule (#2923 )

2025-07-19 23:19:27 +08:00

7 lines

131 B

YAML

Raw Blame History

 max_model_len: 32768
 max_num_seqs: 25
 gpu_memory_utilization: 0.9
 kv_cache_ratio: 0.75
 quantization: wint8
 tensor_parallel_size: 4