FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

AIbin a7392a0ff9 【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886 )

* support MLA chunk_size auto search & cuda_graph

2025-09-11 10:46:09 +08:00

__init__.py

…

dcu_model_runner.py

…

dcu_worker.py

…

eplb.py

…

experts_manager.py

…

gcu_model_runner.py

[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 )

2025-09-08 13:12:24 +08:00

gcu_worker.py

…

gpu_model_runner.py

【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886 )

2025-09-11 10:46:09 +08:00

gpu_worker.py

[BugFix] Fix the abnormal memory usage caused by shape errors in the triton moe backend (#4026 )

2025-09-09 20:05:54 -07:00

iluvatar_model_runner.py

…

iluvatar_worker.py

…

metax_model_runner.py

…

metax_worker.py

…

model_runner_base.py

…

output.py

…

utils.py

…

worker_base.py

…

worker_process.py

[V1 Loader] Ernie kv cache quant support v1 loader (#3899 )

2025-09-09 05:25:08 -07:00

xpu_model_runner.py

[XPU]Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3897 )

2025-09-08 10:34:46 +08:00

xpu_worker.py

…