FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-18 22:44:39 +08:00

Author	SHA1	Message	Date
Sunny-bot1	c68c3c4b8b	[Feature] bad words support v1 scheduler and specifiy token ids (#3608 ) * support bad_words_token_ids * docs * fix test * fix * bad words support kvcache v1 and token ids * fix	2025-08-25 20:14:51 -07:00
lizexu123	c43a4bec00	[Features] support hugging face qwen3 dense and qwen2 model (#3574 ) * support qwen2 and qwen3 hugging face * fix moe * defualt_v1 loader * hugging_face_format deprecated * modify hugging_face_foramt to model_format * model_format auto * fix environemt * fix bug * fix qwen3-0.6 bug * model_format is str * fix	2025-08-26 10:54:53 +08:00
chen	9cab3f47ff	[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552 ) * [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing * infer engine support temp_scaled_logprobs and top_p_normalized_logprobs * delete some code * code check * code check and add doc * fix tokenizer.decoder(-1), return 'Invalid Token' * add ci for temp_scaled and top_p logprobs * check test * check seq len time shape * logprob clip inf --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com>	2025-08-25 14:11:49 +08:00
zhink	df7c31012b	Modified to support custom all reduce by default (#3538 )	2025-08-22 16:59:05 +08:00
YuanRisheng	5b66462f0e	Fix fdconfig bugs (#3528 ) * fix config * fix parallel * fix ips * fix rl * open code	2025-08-22 16:17:15 +08:00
YuanRisheng	c389a4013c	Unify server-side and model-side Config(Part-5) (#3497 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details * move config * fix xpu * fix * fix vl * fix vl * fix unitest * fix args * add unitest * fix test	2025-08-21 19:00:21 +08:00
Yzc216	466cbb5a99	[Feature] Models api (#3073 ) * add v1/models interface related * add model parameters * default model verification * unit test * check model err_msg * unit test * type annotation * model parameter in response * modify document description * modify document description * unit test * verification * verification update * model_name * pre-commit * update test case * update test case * Update tests/entrypoints/openai/test_serving_models.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/entrypoints/openai/test_serving_models.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/entrypoints/openai/test_serving_models.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/entrypoints/openai/test_serving_models.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/entrypoints/openai/serving_models.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-08-21 17:02:56 +08:00
kevin	67298cf4c0	add error traceback info (#3419 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add error traceback info * update error msg * update code --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-19 19:32:04 +08:00
chen	6735626014	fix request_output sampling_params (#3154 ) (#3464 )	2025-08-19 13:52:50 +08:00
luukunn	3a7a20d191	[Feature] Pass through the `chat_template_kwargs` to the data processing module (#3421 ) * fix chat_template_args * fix args * add offline * add offline * fix * fix * fix default enable_thinking value * fix default enable_thinking value * modify condition * Revert "modify condition" This reverts commit `26430bdeb1`. * fix unit test	2025-08-19 10:50:01 +08:00
luukunn	9c129813f9	[Feature] add custom chat template (#3251 ) * add custom chat_template * add custom chat_template * add unittest * fix * add docs * fix comment * add offline chat * fix unit test * fix unit test * fix * fix pre commit * fix unit test * add unit test * add unit test * add unit test * fix pre_commit * fix enable_thinking * fix pre commit * fix pre commit * fix unit test * add requirements	2025-08-18 16:34:08 +08:00
ming1753	396dba0d62	[Bug Fix] Fix V1 video bug (#3388 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-08-13 23:04:07 +08:00
luukunn	eda83ca672	add Tool Parser (#3272 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add tool-parser * add tool-parser * add tool parser * add tool parser * fix * add offline * add offline * fix * parsers:tool&reasoning * 修改tool parser名称· * update * fix reasoning-parser * add requirements * fix finish reason * fix * fix reasoning-parser * fix * fix * fix * fix * fix --------- Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>	2025-08-13 01:06:55 +08:00
ming1753	f5164215be	[Bug Fix] fix vl V1 schedule bug (#3323 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Bug Fix] fix vl V1 schedule bug * fix format	2025-08-12 11:31:39 +08:00
chenjian	b21272d9ff	[Bug fix] fix block num setting in scheduler v1 for develop (#3303 ) * fix block num setting in scheduler v1 * fix block num setting in scheduler v1 * fix max_block_num and max_num_batched_tokens setting * fix max_block_num and max_num_batched_tokens setting * fix max_block_num and max_num_batched_tokens setting * fix max_block_num and max_num_batched_tokens setting	2025-08-12 10:38:51 +08:00
Zero Rains	b23af29d0b	Launch expert_service before kv_cache initialization in worker_process (#3045 ) * launch expert_service before kv_cache initialization * add two signal make sure model loading and expert_service lauching finished * fix the EP bug * fix ep * update launching way * fix ep * update * roback ep * pre-commit all files --------- Co-authored-by: RAM <gstian5555@outlook.com> Co-authored-by: Divano <dddivano@outlook.com>	2025-08-11 19:38:46 +08:00
chen	46c8491201	merge logprob into batch_output (#3266 )	2025-08-11 10:03:00 +08:00
chenjian	c011cb8b16	[Bug Fix] Fix scheduler bug in develop (#3292 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Fix scheduler bug in develop * Fix scheduler bug in develop * Fix scheduler bug in develop	2025-08-10 13:55:38 +08:00
Zero Rains	ce1f353c70	Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend (#3148 ) * w4a8 bug * fix w4a8 bug * remove code * modify the triton backend * fix ep * fix the bug with tensor_wise_fp8 in triton backend * fix the RL * fix bug by merge * fix the bug in w4a8 * fix the tensor_wise_fp8 bug * fix RL	2025-08-08 15:55:47 +08:00
yzwu	fbdd6b0663	[Iluvatar GPU] Optimze attention and moe performance (#3234 )	2025-08-08 10:51:24 +08:00
JYChen	9423c577fe	[stop_seq] fix out-bound value for stop sequence (#3216 ) * fix out-bound value for stop sequence * catch error if there are out-of-bounds value * check in offline mode * add ut tests	2025-08-07 15:40:21 +08:00
lizexu123	afff4d37ea	[Feature] support seed parameter (#3161 ) * support seed * fix * add SamplingMetadata seed test * The next_tokens values are inconsistent! * add air and rejection seed test * fix * add SamplingParams seed test * fix seed=0 * Default to defualt * fix * fix args_utils * fix review * fix review * fix * fix * add xpu,gcu,iluvatar support seed * fix	2025-08-06 15:20:47 +08:00
Jiang-Jia-Jun	55939f7942	Update engine.py	2025-08-05 16:10:36 +08:00
Sunny-bot1	72ef5a9c93	[FIX]fix bad_words when sending requests consecutively (#3197 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * fix bad_words * fix log * fix log	2025-08-04 05:59:41 -07:00
周周周	2bd8a50649	remove useless code (#3166 )	2025-08-04 18:03:08 +08:00
ApplEOFDiscord	b71cbb466d	[Feature] remove dependency on enable_mm and refine multimodal's code (#3014 ) * remove dependency on enable_mm * fix codestyle check error * fix codestyle check error * update docs * resolve conflicts on model config * fix unit test error * fix code style check error --------- Co-authored-by: shige <1021937542@qq.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-01 20:01:18 +08:00
SunLei	dade19d7a4	[Feature] General support for logprobs (#2974 ) * [Feature] support logprobs in chat/completions and completions endpoints * Temporarily comment out text_offset due to incorrect logic * Clean up temporary debug prints * [Feature] support logprobs in offline mode via SamplingParams * fix: serialize Logprob as dict before zmq send to fix msgpack error * refactor: remove redundant methods to simplify codebase * Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization * refactor: centralize param validation in engine_client to reduce duplication * revert: rollback changes in offline_demo.py * revert: rollback changes in offline_demo.py * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 20:25:56 +08:00
kevin	22cab724e8	[Feature] block scheduler v1 support prefix caching (#3061 ) * block scheduler v1 support prefix cache * update code * update code * fix code bug * add timeout time --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:29:19 +08:00
chenjian	32307283f1	Fix bug for offline inference in scheduler v1 (#3117 )	2025-07-31 17:54:24 +08:00
chenjian	fe0e3f508b	[BUG FIX] Fix bug when preempted request rescheduled (#3080 ) * Fix bug when preempted request rescheduled * Fix bug when preempted request rescheduled * Fix bug when preempted request rescheduled	2025-07-30 22:25:47 +08:00
YuanRisheng	7dfdd157ac	[BugFix]Fix ep size (#3092 ) * fix ep * fix num_layer	2025-07-30 21:03:12 +08:00
ltd0924	d17886de19	[Feature] support ep in mixed mode (#3001 ) * [LLM] support ep * Update worker_process.py * Update expert_service.py * Update worker_process.py * format files	2025-07-30 20:43:39 +08:00
bukejiyu	db698bda01	qwen loader (#3057 )	2025-07-30 19:09:38 +08:00
ming1753	5acde4eb43	[Feature] Multimodal Scheduler V1 (#3019 ) * [Feature] Support multimodal scheduler v1 * remove debug log * fix bug * fix format * modify code * fix bug * fix bug * fix bug * modify code	2025-07-30 16:05:55 +08:00
YuanRisheng	99a70fc722	unify parallel config (#3070 )	2025-07-30 11:41:23 +08:00
Sunny-bot1	74aa31d15b	[Feature] support bad_words (#3055 ) * support bad_words * support online infer bad_words * update * add CI test * update * update * update --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-30 09:31:29 +08:00
Zero Rains	b2f9a42d87	[Feature] Support repetition early stop (#3024 ) * support repetition early stop and support user to set the parameter * remove log * fix codestyle * add the early_stop_config to rollout_config * update config and EarlyStopper class * fix the bug for triton * modify the stop method * update description * modify the usage for stop_flags --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-29 22:42:54 +08:00
YuanRisheng	502ee92a0a	Unify server-side and model-side Config (Part3) (#3047 ) * merge model config * fix arch * fix rl	2025-07-29 17:07:44 +08:00
JYChen	dafe02a7b9	[stop sequence] support stop sequence (#3025 ) * stop seqs in multi-ends * unittest for gpu stop op * kernel tid==0	2025-07-29 14:17:37 +08:00
YuanRisheng	1a815b7a2a	Fix Speculative Config bug (#3049 ) * fix speculative bug * fix rl	2025-07-29 10:50:48 +08:00
Yuan Xiaolan	b1d787a272	[fix] w4a8 model loading and hadamard config (#3013 )	2025-07-28 18:17:59 +08:00
YuanRisheng	bddf403576	Unify server-side and model-side Config (Part2) (#3035 ) * merge speculative and graph opt conifg * add attr	2025-07-28 15:31:48 +08:00
YuanRisheng	6ccc10ad47	Unify server-side and model-side Config (Part1) (#3018 ) * move cache config * fix mtp	2025-07-28 10:51:52 +08:00
李泳桦	69996a40da	[feat] add disable_chat_template in chat api as a substitute for previous raw_request (#3020 ) * [feat] add disable_chat_template in chat api as a substitute for previous raw_request * [fix] pre-commit code check	2025-07-25 20:57:32 +08:00
Zero Rains	0fb37ab7e4	update flake8 version to support pre-commit in python3.12 (#3000 ) * update flake8 version to support pre-commit in python3.12 * polish code	2025-07-24 01:43:31 -07:00
ltd0924	f935d6f862	[BugFix] fix multinode deployment (#2977 )	2025-07-24 15:04:04 +08:00
Yzc216	e14587a954	[Feature] multi-source download (#2986 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit	2025-07-24 14:26:37 +08:00
chenjian	85a78d695d	[Feature] Support block scheduler v1 for FD (#2928 ) * Support FD block scheduler v1 * Support FD block scheduler v1 * Support FD block scheduler v1 * Fix according to copilot review * Fix according to review * Remove is_dummy * Fix bug when real_bsz=1 * Fix infer first token cost time --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-23 20:31:31 +08:00
Ryan	95b5af24db	[SOT] Add sot warmup (NVIDIA GPU Only) (#2929 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add sot warmup * fix code style * change batch_size list * add param to config * rm free_list settings && set sot_warmup_sizes * finish debug with dynamic dims by type annotations * add profile_run guard * rm sth useless	2025-07-22 21:36:14 +08:00
Zero Rains	89a485b69f	[Feature] Support using prefix-caching + cudagraph for inference (#2924 ) * fix the bug in cudagraph+prefix-caching but still have some bug with profile Change-Id: Ibf2ba3f2e3b08641d03f4b1391d7c862c3efa397 * add the signal to make sure cache manager launched * fix judge condition * reomove useless control * update control stream * update * fix xpu * change the do_profile flag * update * add new threads to init cache_manager --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-07-22 00:59:45 -07:00

1 2

84 Commits