FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-11-01 20:32:52 +08:00

Files

RAM 775edcc09a [Executor] Default use CUDAGraph (#3594 )

* add start intercept

* Adjustment GraphOptConfig

* pre-commit

* default use cudagraph

* set default value

* default use cuda graph

* pre-commit

* fix test case bug

* disable rl

* fix moba attention

* only support gpu

* Temporarily disable PD Disaggregation

* set max_num_seqs of test case as 1

* set max_num_seqs and temperature

* fix max_num_batched_tokens bug

* close cuda graph

* success run wint2

* profile run with max_num_batched_tokens

* 1.add c++ memchecker 2.success run wint2

* updatee a800 yaml

* update docs

* 1. delete check 2. fix plas attn test case

* default use use_unique_memory_pool

* add try-except for warmup

* ban mtp, mm, rl

* fix test case mock

* fix ci bug

* fix form_model_get_output_topp0 bug

* fix ci bug

* refine deepseek ci

* refine code

* Disable PD

* fix sot yaml

2025-10-21 14:25:45 +08:00

cpu_ops

fix typos (#3951 )

2025-09-08 15:22:41 +08:00

gpu_ops

[Executor] Default use CUDAGraph (#3594 )

2025-10-21 14:25:45 +08:00

iluvatar_ops

[Iluvatar GPU] Optimize attention performance and fix moe load ckpt error (#3651 )

2025-09-22 21:13:59 +08:00

metax_ops

[fix] adjust mctlass moe api (#4474 )