FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-04 08:16:42 +08:00

Files

YuanRisheng 2e9e53ff7e [FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config (#4116 )

* remove max_num_batched_tokens in parallel config

* remove max_num_seqs

* update test case

* fix test

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

2025-09-17 10:43:35 +08:00

graph_optimization

[CUDAGraph] Support multi output buffers and merge some fixes from feature/exp_0908 (#4062 )

2025-09-15 16:21:30 +08:00

guided_decoding

[FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config (#4116 )

2025-09-17 10:43:35 +08:00

layers

[FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config (#4116 )

2025-09-17 10:43:35 +08:00

model_loader

cache feature (#3857 )

2025-09-07 18:52:46 +08:00

models

[FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config (#4116 )

2025-09-17 10:43:35 +08:00

ops

fix typos (#3840 )

2025-09-12 11:04:38 +08:00

__init__.py

polish code with new pre-commit rule (#2923 )

2025-07-19 23:19:27 +08:00

forward_meta.py

【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886 )

2025-09-11 10:46:09 +08:00

load_weight_utils.py

[v1 loader]qwen Offline fp8 (#4036 )

2025-09-15 13:44:11 +08:00

pre_and_post_process.py

support mtp in v1_scheduler mode (#3695 )

2025-09-04 17:39:59 +08:00

utils.py

fix bf16 and add comments (#4106 )

2025-09-15 17:23:07 +08:00