FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
Divano	c1aa66df02	Revert "[Optim] Remove limitation of number of kvcache blocks (#5612 )" (#5702 ) This reverts commit `9da89a374b`.	2025-12-23 15:41:33 +08:00
Jiang-Jia-Jun	9da89a374b	[Optim] Remove limitation of number of kvcache blocks (#5612 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Optim] Remove limitation of number of kvcache blocks * Update fastdeploy/envs.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/worker/iluvatar_worker.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add docs * Update fastdeploy/worker/worker_process.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix ci case --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-23 11:18:29 +08:00
yzwu	4b661512ca	[Iluvatar GPU] Adapt VL model (#4313 )	2025-10-17 16:13:38 +08:00
YuanRisheng	0355235fb9	[FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400 ) * delete some attr in parallel config * delete comment --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-16 20:00:37 +08:00
co63oc	d6369b4d51	fix typos (#3684 )	2025-09-01 17:50:17 +08:00
yzwu	ce9180241e	[Iluvatar GPU] Modify the names of some variables (#3273 )	2025-08-13 11:38:02 +08:00
yzwu	fbdd6b0663	[Iluvatar GPU] Optimze attention and moe performance (#3234 )	2025-08-08 10:51:24 +08:00
lizexu123	afff4d37ea	[Feature] support seed parameter (#3161 ) * support seed * fix * add SamplingMetadata seed test * The next_tokens values are inconsistent! * add air and rejection seed test * fix * add SamplingParams seed test * fix seed=0 * Default to defualt * fix * fix args_utils * fix review * fix review * fix * fix * add xpu,gcu,iluvatar support seed * fix	2025-08-06 15:20:47 +08:00
lizexu123	b01cfd6007	[BugFix] support real batch_size (#3109 ) * support real bsz * fix * fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py * add event_loop_ep * fix * Add comments * fix * support mtp real_batch_size * fix * self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer * fix * fix VL real_seq_lens_this_time * fix * fix mtp * fix * fix mtp * fix xpu * fix	2025-08-05 16:33:54 +08:00
Ryan	73cfe1fd37	[SOT] Extend SOT warmup support to new hardware (#3032 ) * add new hardware * add_sot_warmup4new_hardware * fix conflict * rm Optional	2025-07-29 22:45:20 +08:00
ltd0924	3792345c3a	[LLM] update function name (#2985 ) * [LLM] update function name	2025-07-24 15:03:40 +08:00
Zero Rains	89a485b69f	[Feature] Support using prefix-caching + cudagraph for inference (#2924 ) * fix the bug in cudagraph+prefix-caching but still have some bug with profile Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397 * add the signal to make sure cache manager launched * fix judge condition * reomove useless control * update control stream * update * fix xpu * change the do_profile flag * update * add new threads to init cache_manager --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-07-22 00:59:45 -07:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
liddk1121	1b54a2831e	Adapt for iluvatar gpu (#2684 )	2025-07-07 16:53:14 +08:00

14 Commits