FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
kxz2002	24b85b752b	[Cherry-Pick] Unify the registration name recognition for tool_parser and reasoning_parser to “-” (#4668 ) (#4737 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details * [Feature] add a new reasoning parser (#4571) * add new reasoning_parser initial commit * add parser file content * add register * ernie_test_reasoning_parser * support <tool_call> token and add tool_parser * add and fix unit tests * modify reasoning_parser * modify reasoning parser and tool parser * modify unit tests * modify reasoning_parser and tool_parser * modify unit tests * fix tool_parser * modify the logic of reasoning_parser and tool_parser * add and modify unit tests * standardize code style * simplify reasoning_parser and tool_parser * modify unit test * [BugFix] Fix finish reason in _create_chat_completion_choice (#4582) * fix n_param _create_chat_completion_choicel * fix unit test * fix final_res * modify unit tests * [BugFix] fix offline llm chat "enable_thinking" is always "False" (#4686) * fix enable_thinking * recover ernie4_5_vl_processor * [Feature] Unify the registration name recognition for tool_parser and reasoning_parser to “-” (#4668) * parser register name unify * change ernie_x1 to ernie-x1 * change ernie4_5_vl to ernie-45-vl * fix unit test	2025-10-31 23:27:21 +08:00
RAM	86d5006a57	[Graph Optimization][Speculative Decoding] Update yaml and fix typo (#4612 )	2025-10-28 11:43:26 +08:00
ophilia-lee	70aa7423f8	benchmark工具适配SGLang框架 (#4607 ) * benchmark工具适配SGLang框架 * benchmark工具适配SGLang框架 * benchmark工具适配SGLang框架	2025-10-27 18:52:56 +08:00
tianlef	2676a918f0	[Doc]fix deepseek ce (#4560 )	2025-10-23 14:09:11 +08:00
tianlef	153f15db39	[Doc]add deepseek wint4 ce (#4517 )	2025-10-21 16:41:51 +08:00
RAM	775edcc09a	[Executor] Default use CUDAGraph (#3594 ) * add start intercept * Adjustment GraphOptConfig * pre-commit * default use cudagraph * set default value * default use cuda graph * pre-commit * fix test case bug * disable rl * fix moba attention * only support gpu * Temporarily disable PD Disaggregation * set max_num_seqs of test case as 1 * set max_num_seqs and temperature * fix max_num_batched_tokens bug * close cuda graph * success run wint2 * profile run with max_num_batched_tokens * 1.add c++ memchecker 2.success run wint2 * updatee a800 yaml * update docs * 1. delete check 2. fix plas attn test case * default use use_unique_memory_pool * add try-except for warmup * ban mtp, mm, rl * fix test case mock * fix ci bug * fix form_model_get_output_topp0 bug * fix ci bug * refine deepseek ci * refine code * Disable PD * fix sot yaml	2025-10-21 14:25:45 +08:00
tianlef	14eb8b4f8b	add x1 a3b quantization (#4397 )	2025-10-14 15:04:06 +08:00
tianlef	8a964329f4	add glm benchmark yaml (#4289 )	2025-09-26 14:23:29 +08:00
tianlef	e79a1a7938	x1_a3b config (#4135 )	2025-09-16 19:44:46 +08:00
xiegegege	d682c97dd3	[benchmark]add lite-vl and x1 yaml (#4130 )	2025-09-16 16:38:36 +08:00
tianlef	83bf1fd5aa	[Doc]add plas attention config (#4128 )	2025-09-16 15:55:12 +08:00
tianlef	0bc7d076fc	[CE]add x1 w4a8c8 benchamrk config (#3607 ) * [CE]add x1 w4a8c8 benchamrk config * [CE]add x1 w4a8c8 benchamrk config * [CE]add x1 w4a8c8 benchamrk config	2025-08-26 11:27:32 +08:00
Zhang Yulong	9ff2dfb162	Create eb45-8k-fp8-tp1-dp8_ep.yaml (#3485 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details 混合架构EP并行yaml	2025-08-20 14:33:54 +08:00
xiegegege	e3a843f2c5	[benchmark] add quantization for benchmark yaml (#2995 )	2025-07-24 13:26:34 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
RAM	0fad10b35a	[Executor] CUDA Graph support padding batch (#2844 ) * cuda graph support padding batch * Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes. * Do not insert max_num_seqs when the user specifies a capture list * Support set graph optimization config from YAML file * update cuda graph ci * fix ci bug * fix ci bug	2025-07-15 19:49:01 -07:00
ophilia-lee	33db137d0b	新增vLLM默认请求参数yaml	2025-07-15 19:31:27 +08:00
Divano	be5cabaf80	add quick benchmark (#2703 ) 测试脚本不需要过CI	2025-07-04 09:32:36 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00

19 Commits