FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-05 00:33:03 +08:00

Author	SHA1	Message	Date
zhupengyang	9409665713	[xpu] support ep (#4067 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-09-15 13:53:11 +08:00
yinwei	7833f2f6cb	[XPU]Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3897 ) * fix bug * fix bug * update * update * update	2025-09-08 10:34:46 +08:00
Ryan	45f81b34f0	add dtype int32 (#3692 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-08-29 14:56:35 +08:00
李泳桦	b2afdf4fc6	[fix] qwen output inconsistency when top_p=0 (#3634 ) * [fix] qwen output inconsistency when top_p=0 * [fix] remove decode pre_id code	2025-08-27 17:16:23 +08:00
Sunny-bot1	c68c3c4b8b	[Feature] bad words support v1 scheduler and specifiy token ids (#3608 ) * support bad_words_token_ids * docs * fix test * fix * bad words support kvcache v1 and token ids * fix	2025-08-25 20:14:51 -07:00
qw86972190	c83381d650	revert pr (#3481 ) Co-authored-by: iosmers <yinwei_hust@163.com>	2025-08-21 14:19:50 +08:00
chen	f0f00a6025	[OPs] Universal optimization and Fix early_stop cuda 700 (#3375 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * delete nonzero * delete setup_ops_base.py * check if * check gcp infer_seed.cpu() * fix repetition_early_stopper_kernel cuda 700	2025-08-14 22:40:44 +08:00
Yuanle Liu	9571c458f0	enhance eos_tokens (#3274 ) * enhance eos_tokens * update * update	2025-08-11 14:47:52 +08:00
lizexu123	afff4d37ea	[Feature] support seed parameter (#3161 ) * support seed * fix * add SamplingMetadata seed test * The next_tokens values are inconsistent! * add air and rejection seed test * fix * add SamplingParams seed test * fix seed=0 * Default to defualt * fix * fix args_utils * fix review * fix review * fix * fix * add xpu,gcu,iluvatar support seed * fix	2025-08-06 15:20:47 +08:00
lizexu123	b01cfd6007	[BugFix] support real batch_size (#3109 ) * support real bsz * fix * fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py * add event_loop_ep * fix * Add comments * fix * support mtp real_batch_size * fix * self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer * fix * fix VL real_seq_lens_this_time * fix * fix mtp * fix * fix mtp * fix xpu * fix	2025-08-05 16:33:54 +08:00
Sunny-bot1	72ef5a9c93	[FIX]fix bad_words when sending requests consecutively (#3197 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix bad_words * fix log * fix log	2025-08-04 05:59:41 -07:00
bukejiyu	db698bda01	qwen loader (#3057 )	2025-07-30 19:09:38 +08:00
Sunny-bot1	74aa31d15b	[Feature] support bad_words (#3055 ) * support bad_words * support online infer bad_words * update * add CI test * update * update * update --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-30 09:31:29 +08:00
Ryan	73cfe1fd37	[SOT] Extend SOT warmup support to new hardware (#3032 ) * add new hardware * add_sot_warmup4new_hardware * fix conflict * rm Optional	2025-07-29 22:45:20 +08:00
yinwei	f2a528f9ae	[XPU] Support kvblock centralized management (#3017 )	2025-07-29 10:40:55 +08:00
YuanRisheng	6ccc10ad47	Unify server-side and model-side Config (Part1) (#3018 ) * move cache config * fix mtp	2025-07-28 10:51:52 +08:00
ltd0924	3792345c3a	[LLM] update function name (#2985 ) * [LLM] update function name	2025-07-24 15:03:40 +08:00
lizhenyun01	29c3292f02	support c4 attn && fix cache	2025-07-24 12:00:52 +08:00
lizexu123	9b22b8d2c3	delete max-len (#2959 )	2025-07-23 15:11:39 +08:00
lizexu123	67990e0572	[Feature] support min_p_sampling (#2872 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Fastdeploy support min_p * add test_min_p * fix * min_p_sampling * update * delete vl_gpu_model_runner.py * fix * Align usage of min_p with vLLM * fix * modified unit test * fix test_min_sampling * pre-commit all files * fix * fix * fix * fix xpu_model_runner.py	2025-07-20 23:17:59 -07:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
周周周	1339e56282	[XPU] Remove padding_offsets from get_padding_offset.cu (#2911 )	2025-07-18 14:16:44 +08:00
yulangz	17314ee126	[XPU] Update doc and add scripts for downloading dependencies (#2845 ) * [XPU] update xvllm download * update supported models * fix xpu model runner in huge memory with small model * update doc	2025-07-16 11:05:56 +08:00
YuanRisheng	101ad33332	[BugFix] Fix Configs (#2849 ) * fix config * fix config	2025-07-15 19:50:36 -07:00
YuanRisheng	4c7b8bc458	Simplify the Config code (#2770 ) * simplify the code * fix vl * delete config * fix * perfect code * fix ci * fix xpu * fix xpu * fix server * resolve conflict * fix mtp * resolve conflict * fix xpu * fix xpu * fix vl * fix log * fix qwen moe * fix qwen moe * fix qwen moe	2025-07-14 19:50:05 +08:00
Sunny-bot1	f6ad26fc08	fix topp default value (#2814 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-11 17:10:21 +08:00
Sunny-bot1	240d6236bc	[Fix]fix top_k_top_p sampling (#2801 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix topk-topp * update * add base_non_truncated	2025-07-10 22:35:10 +08:00
littledgg	59071268b6	[Executor] Move forward_meta.py to fastdeploy/model_executor (#2774 ) * Use PEP 563 in attention.py and fix conflict * merge commit * Change what was left out last time	2025-07-10 20:36:51 +08:00
chen	d33105baeb	[Feature] Online Chat API Support Return logprobs (#2777 ) * online chat support logprobs * check xpu * check vl_gpu_model_runner and xpu_model_runner * get_worker() check platform	2025-07-10 16:33:40 +08:00
Sunny-bot1	e45050cae3	[Feature] support top_k_top_p sampling (#2753 ) * support top_k_top_p sampling * fix * add api param * add api para * fix * fix * fix * fix * fix * fix * fix	2025-07-09 20:58:58 -07:00
RAM	03a74995b8	Clear dead code And supplementary notes (#2757 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * 1.supplementary notes 2.delete dead code * fix bug of forward meta * Global modification of forward meta * fix vl model_runner bug	2025-07-09 16:17:34 +08:00
yulangz	0350831c2b	fix xpu offline demo garbled output (#2763 )	2025-07-09 14:51:20 +08:00
Yuanle Liu	240bdac2a4	[feat] support fa3 backend for pd disaggregated (#2695 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * support fa3 backend run in pd disaggregated * delete use_fast_ffn	2025-07-03 22:33:27 +08:00
Jiang-Jia-Jun	05c670e593	[Sync] Update to latest code (#2679 ) * [Sync] Update to latest code * Add new code files * Add new code files * update code * Try to fix build.sh * Try to fix build.sh * Update code * Update requirements.txt * Update code --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00

35 Commits