FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 08:37:06 +08:00

Author	SHA1	Message	Date
lizexu123	b01cfd6007	[BugFix] support real batch_size (#3109 ) * support real bsz * fix * fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py * add event_loop_ep * fix * Add comments * fix * support mtp real_batch_size * fix * self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer * fix * fix VL real_seq_lens_this_time * fix * fix mtp * fix * fix mtp * fix xpu * fix	2025-08-05 16:33:54 +08:00
yinwei	3a4db15765	Fix out-of-memory issue during single-XPU deployment (#3133 )	2025-08-01 17:12:03 +08:00
Ryan	73cfe1fd37	[SOT] Extend SOT warmup support to new hardware (#3032 ) * add new hardware * add_sot_warmup4new_hardware * fix conflict * rm Optional	2025-07-29 22:45:20 +08:00
yinwei	f2a528f9ae	[XPU] Support kvblock centralized management (#3017 )	2025-07-29 10:40:55 +08:00
YuanRisheng	6ccc10ad47	Unify server-side and model-side Config (Part1) (#3018 ) * move cache config * fix mtp	2025-07-28 10:51:52 +08:00
ltd0924	3792345c3a	[LLM] update function name (#2985 ) * [LLM] update function name	2025-07-24 15:03:40 +08:00
Zero Rains	89a485b69f	[Feature] Support using prefix-caching + cudagraph for inference (#2924 ) * fix the bug in cudagraph+prefix-caching but still have some bug with profile Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397 * add the signal to make sure cache manager launched * fix judge condition * reomove useless control * update control stream * update * fix xpu * change the do_profile flag * update * add new threads to init cache_manager --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-07-22 00:59:45 -07:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
YuanRisheng	101ad33332	[BugFix] Fix Configs (#2849 ) * fix config * fix config	2025-07-15 19:50:36 -07:00
yulangz	0350831c2b	fix xpu offline demo garbled output (#2763 )	2025-07-09 14:51:20 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00

11 Commits