FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

RAM 00d0da0c18 [Graph Optimization] Add the CUDAGraph usage switch for Draft Model (#4669 )

* add draft model using cudagraph switch

* set default as false

* capture draft model in ci

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

2025-10-31 17:34:09 +08:00

test_DeepSeek_V3_5layers_serving.py

[Executor] Default use CUDAGraph (#3594 )

2025-10-21 14:25:45 +08:00

test_EB_Lite_serving.py

[BugFix]Fix finish reason (#4543 )

2025-10-23 14:04:43 +08:00

test_EB_VL_Lite_serving.py

[DataProcessor] add reasoning_tokens into usage info (#4520 )

2025-10-25 16:57:58 +08:00

test_ernie_03b_pd.py

[CI] Relocate server test cases from ci_use directory to e2e (#4608 )

2025-10-28 11:37:30 +08:00

test_ernie_21b_mtp.py

[Graph Optimization] Add the CUDAGraph usage switch for Draft Model (#4669 )

2025-10-31 17:34:09 +08:00

test_fake_Glm45_AIR_serving.py

[BugFix]Fix wfp8afp8 triton moe group_topk renormalized=True (#4449 )

2025-10-16 23:17:48 +08:00

test_Qwen2_5_VL_serving.py

[Model] Qwen2.5VL support --use-cudagraph and unit testing (#4087 )

2025-09-24 19:45:01 +08:00

test_Qwen2_5_VL_torch_serving.py

[V1 loader] Qwen25 VL support v1 loader and torch style safetensors load (#4388 )

2025-10-27 10:54:15 +08:00

test_Qwen2-7B-Instruct_serving.py

[metrics] Add serveral observability metrics (#3868 )

2025-09-08 14:13:13 +08:00