FastDeploy/fastdeploy/model_executor at 6efad14b95506c217f976858aa563f1dc8c297e3 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

History

gaoziyuan 6efad14b95 support vl ori_vacab_size (#2900 )

2025-07-18 16:26:14 +08:00

..

graph_optimization

[Executor] CUDA Graph support padding batch (#2844 )

2025-07-15 19:49:01 -07:00

guided_decoding

support vl ori_vacab_size (#2900 )

2025-07-18 16:26:14 +08:00

remove cum_offsets from get_block_shape_and_split_kv_block (#2913 )

2025-07-18 16:13:32 +08:00

support vl ori_vacab_size (#2900 )

2025-07-18 16:26:14 +08:00

refactor rl get_name_mappings_to_training (#2847 )

2025-07-15 07:31:42 -07:00

__init__.py

[LLM] First commit the llm deployment code

2025-06-09 19:20:15 +08:00

forward_meta.py

[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 )

2025-07-17 18:41:31 +08:00

load_weight_utils.py

[Feature][MTP] Support cacheKV transfer in per_chunk mode (#2890 )

2025-07-17 17:58:08 +08:00

model_loader.py

[vl]remove duplicated load logic (#2744 )

2025-07-13 07:36:26 +08:00

pre_and_post_process.py

[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 )

2025-07-17 18:41:31 +08:00