FastDeploy/fastdeploy at 3d0aaa59232ec703dc1178767137327b71da5d91 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 08:37:06 +08:00

Files

History

Jundong Liu 3d0aaa5923 [Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 )

* Support prefill in Cudagraph

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4

* Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5

* Solve problem about encoder_num_blocks_x_cpu

* Add early-exit mechanism for attention kernel

* fix test case about append-attention

* Update testcode, Add annotations to related tensors

* move get_input_length_list

* solve test_code

* Add annotations about early-exit for attention kernel

* Add annotations about early-exit for attention kernel2

* solve comment

* solve mtp

---------

Co-authored-by: RAM <gstian5555@outlook.com>

2025-09-08 13:12:24 +08:00

..

fix typos (#3684 )

2025-09-01 17:50:17 +08:00

[Feature] Add AsyncTokenizerClient&ChatResponseProcessor with remote encode&decode support. (#3674 )

2025-08-30 17:06:26 +08:00

Supports DP+TP+EP hybrid parallel deployment strategy (#3489 )

2025-08-26 00:04:01 -07:00

[XPU]Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3897 )

2025-09-08 10:34:46 +08:00

[Feature] add HTTP GET retry (#3838 )

2025-09-08 10:11:14 +08:00

Update qwen_vl_processor.py (#3808 )

2025-09-04 20:31:48 +08:00

inter_communicator

add input_processor plugin (#3657 )

2025-08-28 22:53:57 +08:00

add error log to file (#3431 )

2025-08-20 09:52:34 +08:00

[feat] add metrics for yiyan adapter (#3219 ) (#3614 )

2025-08-30 23:20:58 +08:00

[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 )

2025-09-08 13:12:24 +08:00

[Model]support qwen2_5_vl (#3557 )

2025-08-29 18:28:39 +08:00

[Feat] Support streaming transfer data using ZMQ (#3521 )

2025-09-02 19:52:19 +08:00

[Feature] block sparse attention (#3668 )

2025-08-29 19:46:30 +08:00

add reasoning parser plugin (#3811 )

2025-09-03 18:31:27 +08:00

[Feature] add tool parser (#3483 )

2025-08-21 17:25:44 +08:00

add dp config (#3822 )

2025-09-04 11:46:48 +08:00

add error traceback info (#3419 )

2025-08-19 19:32:04 +08:00

[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 )

2025-09-08 13:12:24 +08:00

[feat] add metrics for yiyan adapter (#3219 ) (#3614 )

2025-08-30 23:20:58 +08:00

[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 )

2025-09-08 13:12:24 +08:00

__init__.py

[BugFix] Fix default log level of paddleformers (#3376 )

2025-08-14 11:36:24 +08:00

config.py

[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 )

2025-09-08 13:12:24 +08:00

envs.py

cache feature (#3857 )

2025-09-07 18:52:46 +08:00

import_ops.py

fix import error (#2944 )

2025-07-22 14:06:01 +08:00

stop.sh

polish code with new pre-commit rule (#2923 )

2025-07-19 23:19:27 +08:00

test.yaml

[Sync] Update to latest code (#2679 )

2025-07-03 15:43:53 +08:00

utils.py

[Feature] Setting number of apiserver workers automatically (#3790 )

2025-09-02 14:17:48 +08:00