FastDeploy/benchmarks/yaml/qwen3_30b-32k-wint4-h800-tp1-static.yaml at 672620cdfeac535ff9a6fac23ed0a60472f12a5b - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

RAM 0fad10b35a [Executor] CUDA Graph support padding batch (#2844 )

* cuda graph support padding batch

* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.

* Do not insert max_num_seqs when the user specifies a capture list

* Support set graph optimization config from YAML file

* update cuda graph ci

* fix ci bug

* fix ci bug

2025-07-15 19:49:01 -07:00

8 lines

152 B

YAML

Raw Blame History

 max_model_len: 32768
 max_num_seqs: 128
 kv_cache_ratio: 0.75
 tensor_parallel_size: 1
 quantization: wint4
 graph_optimization_config:
   graph_opt_level: 1