FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
xiegegege	b7e1e6c953	[CE]change yaml name	2025-12-04 19:14:11 +08:00
tianlef	04d35ace5e	[CE]add wint4 ep (#5355 )	2025-12-03 15:17:47 +08:00
Zhang Yulong	5b49142988	update (#5298 )	2025-11-28 18:29:16 +08:00
xiegegege	eae34a416c	[benchmark]add qwen3-235b pd+ep yaml (#5225 )	2025-11-25 19:53:30 +08:00
tianlef	de43577a7c	[Docs] add ebvlthinking yaml (#5120 )	2025-11-19 15:27:46 +08:00
Zhang Yulong	83532e1d01	[Benchmark] Enhance benchmark output logging (#4682 ) * Enhance benchmark output logging Add print statements to display the number of discarded outputs before and after filtering. * Update benchmark_serving.py	2025-11-06 16:53:31 +08:00
Juncai	08ca0f6aea	[Feature] [PD] add simple router and refine splitwise deployment (#4709 ) * add simple router and refine splitwise deployment * fix	2025-11-06 14:56:02 +08:00
zhang-prog	4c2ad15258	add paddleocr_vl benchmark (#4833 ) * add paddleocr_vl benchmark * fix * fix * fix * fix	2025-11-05 19:37:45 +08:00
ophilia-lee	412097c1b8	benchmark工具支持受限解码场景指定response_format (#4718 )	2025-10-31 12:26:24 +08:00
Ryan	28de91b50f	[Graph Optimization] SOT+CUDAGraph support ERNIE4.5T VL 28B / 424B (#4645 ) * 45TVL support sot+CUDAGraph * mv unitest from ce_deploy 2 e2e * add test_EB_VL_Lite_sot_serving * rm useless line * add openai_client * fix unitest && reduce computing resources	2025-10-31 11:38:43 +08:00
kxz2002	a2870ed4a9	[Feature] Unify the registration name recognition for tool_parser and reasoning_parser to “-” (#4668 ) * parser register name unify * change ernie_x1 to ernie-x1 * change ernie4_5_vl to ernie-45-vl * fix unit test	2025-10-31 10:45:27 +08:00
xjkmfa	19df1aec2b	[Docs] add Qwen25vl yaml (#4662 ) * Add ci case for min token and max token * 【CI case】include total_tokens in the last packet of completion interface stream output * 【CE】add qwen25-vl * 【CE】add qwen25-vl --------- Co-authored-by: xujing43 <xujing43@baidu.com>	2025-10-29 17:39:40 +08:00
RAM	86d5006a57	[Graph Optimization][Speculative Decoding] Update yaml and fix typo (#4612 )	2025-10-28 11:43:26 +08:00
ophilia-lee	70aa7423f8	benchmark工具适配SGLang框架 (#4607 ) * benchmark工具适配SGLang框架 * benchmark工具适配SGLang框架 * benchmark工具适配SGLang框架	2025-10-27 18:52:56 +08:00
tianlef	2676a918f0	[Doc]fix deepseek ce (#4560 )	2025-10-23 14:09:11 +08:00
tianlef	153f15db39	[Doc]add deepseek wint4 ce (#4517 )	2025-10-21 16:41:51 +08:00
RAM	775edcc09a	[Executor] Default use CUDAGraph (#3594 ) * add start intercept * Adjustment GraphOptConfig * pre-commit * default use cudagraph * set default value * default use cuda graph * pre-commit * fix test case bug * disable rl * fix moba attention * only support gpu * Temporarily disable PD Disaggregation * set max_num_seqs of test case as 1 * set max_num_seqs and temperature * fix max_num_batched_tokens bug * close cuda graph * success run wint2 * profile run with max_num_batched_tokens * 1.add c++ memchecker 2.success run wint2 * updatee a800 yaml * update docs * 1. delete check 2. fix plas attn test case * default use use_unique_memory_pool * add try-except for warmup * ban mtp, mm, rl * fix test case mock * fix ci bug * fix form_model_get_output_topp0 bug * fix ci bug * refine deepseek ci * refine code * Disable PD * fix sot yaml	2025-10-21 14:25:45 +08:00
Zhang Yulong	10e85daf15	update benchmark scripts (#4497 )	2025-10-20 17:03:10 +08:00
Zhang Yulong	8f77adc381	Add data dictionary for API response processing (#4454 ) Initialize data dictionary for response handling.	2025-10-16 17:23:11 +08:00
Zhang Yulong	98f8c3703a	Add filtering for failed requests in benchmark outputs (#4448 ) Filter out requests with end_timestamp == 0.0	2025-10-16 14:57:47 +08:00
Zhang Yulong	9dc3968c13	[benchmark] Fix benchmark duration calculation logic (#4446 ) * Fix benchmark duration calculation logic Calculate benchmark duration using filtered outputs. * Fix benchmark duration calculation using benchmark_outputs	2025-10-16 14:36:29 +08:00
Zhang Yulong	7f94f063ff	Update benchmark_serving.py (#4438 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Run Accuracy Tests (push) Has been cancelled Details CI Images Build / Run Stable Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details 丢弃的请求依旧保存，用于结果分析	2025-10-15 20:36:19 +08:00
Zhang Yulong	c4f866c457	update benchmark tools (#4416 )	2025-10-15 11:15:25 +08:00
tianlef	14eb8b4f8b	add x1 a3b quantization (#4397 )	2025-10-14 15:04:06 +08:00
tianlef	8a964329f4	add glm benchmark yaml (#4289 )	2025-09-26 14:23:29 +08:00
Zhang Yulong	5532e8a323	[FD CLI] Add bench cli (#4160 ) * add bench cli * Update test_main.py	2025-09-22 20:37:30 +08:00
co63oc	c4830ef24c	fix typos (#4176 ) * fix typos * fix	2025-09-22 14:27:17 +08:00
tianlef	e79a1a7938	x1_a3b config (#4135 )	2025-09-16 19:44:46 +08:00
xiegegege	d682c97dd3	[benchmark]add lite-vl and x1 yaml (#4130 )	2025-09-16 16:38:36 +08:00
tianlef	83bf1fd5aa	[Doc]add plas attention config (#4128 )	2025-09-16 15:55:12 +08:00
tianlef	0bc7d076fc	[CE]add x1 w4a8c8 benchamrk config (#3607 ) * [CE]add x1 w4a8c8 benchamrk config * [CE]add x1 w4a8c8 benchamrk config * [CE]add x1 w4a8c8 benchamrk config	2025-08-26 11:27:32 +08:00
Zhang Yulong	9ff2dfb162	Create eb45-8k-fp8-tp1-dp8_ep.yaml (#3485 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details 混合架构EP并行yaml	2025-08-20 14:33:54 +08:00
yinwei	776fb03250	add error info (#3040 )	2025-07-28 15:10:28 +08:00
Zhang Yulong	5151bc92c8	Update benchmark tools (#3004 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * update benchmark tools * update benchmark tools	2025-07-24 15:19:23 +08:00
xiegegege	e3a843f2c5	[benchmark] add quantization for benchmark yaml (#2995 )	2025-07-24 13:26:34 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
RAM	0fad10b35a	[Executor] CUDA Graph support padding batch (#2844 ) * cuda graph support padding batch * Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes. * Do not insert max_num_seqs when the user specifies a capture list * Support set graph optimization config from YAML file * update cuda graph ci * fix ci bug * fix ci bug	2025-07-15 19:49:01 -07:00
ophilia-lee	33db137d0b	新增vLLM默认请求参数yaml	2025-07-15 19:31:27 +08:00
lijingning	9d6a42b334	适配vLLM无arrival_time；适配vLLM model必传；RequestFuncInput/RequestFuncOutput/SampleRequest新增用例编号no	2025-07-15 19:31:27 +08:00
GoldPancake	f7cad30a38	[Feature] Add speculative decoding simulation benchmark. (#2751 ) * Add speculative decoding simulation benchmark * Fix the name of the parameter	2025-07-09 12:08:43 +08:00
Divano	050d9658a5	Update requirements.txt	2025-07-04 09:53:03 +08:00
Divano	be5cabaf80	add quick benchmark (#2703 ) 测试脚本不需要过CI	2025-07-04 09:32:36 +08:00
Zhang Yulong	264ddfdf8a	Update README.md	2025-06-30 10:28:15 +08:00
Jiang-Jia-Jun	92c2cfa2e7	Sync v2.0 version of code to github repo	2025-06-29 23:29:37 +00:00

44 Commits