FastDeploy/benchmarks/yaml/GLM45-air-32k-wfp8afp8.yaml at develop - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

tianlef 8a964329f4 add glm benchmark yaml (#4289 )

2025-09-26 14:23:29 +08:00

7 lines

133 B

YAML

Raw Permalink Blame History

 max_model_len: 32768
 max_num_seqs: 128
 tensor_parallel_size: 4
 use_cudagraph: True
 load_choices: "default_v1"
 quantization: wfp8afp8