Files
FastDeploy/benchmarks/yaml/eb45t_21b-32k-bf16-h800-tp1-static.yaml
RAM 0fad10b35a [Executor] CUDA Graph support padding batch (#2844)
* cuda graph support padding batch

* Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes.

* Do not insert max_num_seqs when the user specifies a capture list

* Support set graph optimization config from YAML file

* update cuda graph ci

* fix ci bug

* fix ci bug
2025-07-15 19:49:01 -07:00

7 lines
132 B
YAML

max_model_len: 32768
max_num_seqs: 128
kv_cache_ratio: 0.75
tensor_parallel_size: 1
graph_optimization_config:
graph_opt_level: 1