mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
[Executor] Default use CUDAGraph (#3594)
* add start intercept * Adjustment GraphOptConfig * pre-commit * default use cudagraph * set default value * default use cuda graph * pre-commit * fix test case bug * disable rl * fix moba attention * only support gpu * Temporarily disable PD Disaggregation * set max_num_seqs of test case as 1 * set max_num_seqs and temperature * fix max_num_batched_tokens bug * close cuda graph * success run wint2 * profile run with max_num_batched_tokens * 1.add c++ memchecker 2.success run wint2 * updatee a800 yaml * update docs * 1. delete check 2. fix plas attn test case * default use use_unique_memory_pool * add try-except for warmup * ban mtp, mm, rl * fix test case mock * fix ci bug * fix form_model_get_output_topp0 bug * fix ci bug * refine deepseek ci * refine code * Disable PD * fix sot yaml
This commit is contained in:
@@ -56,7 +56,9 @@ from typing import Callable, Optional
|
||||
|
||||
# [N,2] -> every line is [config_name, enable_xxx_name]
|
||||
# Make sure enable_xxx equal to config.enable_xxx
|
||||
ARGS_CORRECTION_LIST = [["early_stop_config", "enable_early_stop"], ["graph_optimization_config", "use_cudagraph"]]
|
||||
ARGS_CORRECTION_LIST = [
|
||||
["early_stop_config", "enable_early_stop"],
|
||||
]
|
||||
|
||||
FASTDEPLOY_SUBCMD_PARSER_EPILOG = (
|
||||
"Tip: Use `fastdeploy [serve|run-batch|bench <bench_type>] "
|
||||
|
||||
Reference in New Issue
Block a user