FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-06 00:57:33 +08:00

Author	SHA1	Message	Date
chen	1e19833ba5	[CP] CP Lm head fp32 and temp_logprob to release/2.1 (#3766 ) * [Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552) * [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing * infer engine support temp_scaled_logprobs and top_p_normalized_logprobs * delete some code * code check * code check and add doc * fix tokenizer.decoder(-1), return 'Invalid Token' * add ci for temp_scaled and top_p logprobs * check test * check seq len time shape * logprob clip inf --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com> * [Precision] Support lm_head layer running in float32 (#3597) * support lm_head fp32 bf16 fp16 * support lm_head fp32 bf16 fp16 * add doc and check code * lm_head_fp32 specify lm_head as fp32 * code check * check doc * code check --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com>	2025-09-01 19:56:54 +08:00
luukunn	4a9c04a746	[Feature] add tool parser (#3518 ) * [Feature] Pass through the `chat_template_kwargs` to the data processing module (#3421) * fix chat_template_args * fix args * add offline * add offline * fix * fix * fix default enable_thinking value * fix default enable_thinking value * modify condition * Revert "modify condition" This reverts commit `26430bdeb1`. * fix unit test * add Tool Parser (#3272) * add tool-parser * add tool-parser * add tool parser * add tool parser * fix * add offline * add offline * fix * parsers:tool&reasoning * 修改tool parser名称· * update * fix reasoning-parser * add requirements * fix finish reason * fix * fix reasoning-parser * fix * fix * fix * fix * fix --------- Co-authored-by: zhuzixuan <zhuzixuan@baidu.com> * [Feature] add tool parser (#3483) * add tool parser * add x1 enable_thinking * restart ci * fix vl reasoning parser * modify call style * modify call style * add offline enablethinking * fix completion * fix * fix unit test * fix unit test * fix unit test * fix vl reasoning parser * fix vl reasoning parser * fix unit test --------- Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>	2025-08-22 11:14:35 +08:00
李泳桦	1b399b91c0	[fix] setting disable_chat_template while passing prompt_token_ids led to response error (#3511 ) * [fix] setting disable_chat_template while passing prompt_token_ids led to response error * [fix] code syntax * [test] add test case for this bug * [test] add test case for empty message list * [test] fix test case for empty message list	2025-08-21 17:33:10 +08:00
memoryCoderC	8bf48dfab8	[Feature] add prompt_tokens and completion_tokens (#3505 ) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-08-21 14:10:06 +08:00
Jiang-Jia-Jun	28918702c2	Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1" This reverts commit `02596fc537`, reversing changes made to `03347626a6`.	2025-08-14 17:20:29 +08:00
luukunn	81092c0fe3	add tool parser	2025-08-13 16:06:22 +08:00
memoryCoderC	37b76158f9	Completion add raw_prediction/text_after_process (#3362 )	2025-08-12 23:20:36 +08:00
SunLei	dade19d7a4	[Feature] General support for logprobs (#2974 ) * [Feature] support logprobs in chat/completions and completions endpoints * Temporarily comment out text_offset due to incorrect logic * Clean up temporary debug prints * [Feature] support logprobs in offline mode via SamplingParams * fix: serialize Logprob as dict before zmq send to fix msgpack error * refactor: remove redundant methods to simplify codebase * Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization * refactor: centralize param validation in engine_client to reduce duplication * revert: rollback changes in offline_demo.py * revert: rollback changes in offline_demo.py * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 20:25:56 +08:00
LiqinruiG	25005fee30	[Doc] add chat_template_kwagrs and update params docs (#3103 ) * add chat_template_kwagrs and update params docs * add chat_template_kwagrs and update params docs * update enable_thinking * pre-commit * update test case --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:44:06 +08:00
Jiang-Jia-Jun	0616c208d2	[Feature] Support include_stop_str_in_output in completion api (#3096 ) * [Feature] Support include_stop_str_in_output in completion api * Fix ci test --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-30 22:18:48 +08:00
李泳桦	b242150f94	[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client (#3058 ) * [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client * [fix] delete ci test case for enable_thinking * [fix] add reasoning_parser when server starts * [fix] fix ci consistency test error with reasoning parser * [doc] update docs related to metadata * [fix] cancel enable_thinking default value	2025-07-30 19:25:20 +08:00
Sunny-bot1	74aa31d15b	[Feature] support bad_words (#3055 ) * support bad_words * support online infer bad_words * update * add CI test * update * update * update --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-30 09:31:29 +08:00
李泳桦	69996a40da	[feat] add disable_chat_template in chat api as a substitute for previous raw_request (#3020 ) * [feat] add disable_chat_template in chat api as a substitute for previous raw_request * [fix] pre-commit code check	2025-07-25 20:57:32 +08:00
Zero Rains	0fb37ab7e4	update flake8 version to support pre-commit in python3.12 (#3000 ) * update flake8 version to support pre-commit in python3.12 * polish code	2025-07-24 01:43:31 -07:00
李泳桦	2a8a2c06de	[fix] non-streaming api now returns full output ids if return_token_ids is enabled (#2951 )	2025-07-22 14:35:56 +08:00
Jiang-Jia-Jun	56102e91e1	[Polish] Return error message of raw_request (#2946 ) Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-22 10:21:32 +08:00
李泳桦	8a619e9db5	[Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body (#2940 ) * [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body * [fix] return_token_ids not working in curl request * [test] improve some test cases of return_token_ids and prompt_token_ids * [fix] the server responds ok even if request.messages is an empty list	2025-07-21 19:31:14 +08:00
lizexu123	67990e0572	[Feature] support min_p_sampling (#2872 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Fastdeploy support min_p * add test_min_p * fix * min_p_sampling * update * delete vl_gpu_model_runner.py * fix * Align usage of min_p with vLLM * fix * modified unit test * fix test_min_sampling * pre-commit all files * fix * fix * fix * fix xpu_model_runner.py	2025-07-20 23:17:59 -07:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
lddfym	b5e4288704	Global scheduler supports configuring hot updates (#2807 ) * Check if the controller port is available * Global scheduler supports configuring hot updates * add interface: /controller/scheduler * add interface: /controller/scheduler	2025-07-11 13:38:07 +08:00
chen	d33105baeb	[Feature] Online Chat API Support Return logprobs (#2777 ) * online chat support logprobs * check xpu * check vl_gpu_model_runner and xpu_model_runner * get_worker() check platform	2025-07-10 16:33:40 +08:00
Sunny-bot1	e45050cae3	[Feature] support top_k_top_p sampling (#2753 ) * support top_k_top_p sampling * fix * add api param * add api para * fix * fix * fix * fix * fix * fix * fix	2025-07-09 20:58:58 -07:00
ltd0924	87e638498c	[RL] update reschedule finish reason (#2709 )	2025-07-04 13:47:36 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00
jiangjiajun	684703fd72	[LLM] First commit the llm deployment code	2025-06-09 19:20:15 +08:00

25 Commits