FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Author	SHA1	Message	Date
Divano	c1aa66df02	Revert "[Optim] Remove limitation of number of kvcache blocks (#5612 )" (#5702 ) This reverts commit `9da89a374b`.	2025-12-23 15:41:33 +08:00
Jiang-Jia-Jun	9da89a374b	[Optim] Remove limitation of number of kvcache blocks (#5612 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [Optim] Remove limitation of number of kvcache blocks * Update fastdeploy/envs.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/worker/iluvatar_worker.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add docs * Update fastdeploy/worker/worker_process.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix ci case --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-23 11:18:29 +08:00
YuBaoku	fe55baae47	[CI] Fix unit_test error of unstable execution (#5660 ) * [CI] Fix unit_test error of unstable execution	2025-12-19 22:59:53 +08:00
MingkunZhang	46d83be065	[Metax] update ci test (#5652 )	2025-12-19 17:25:47 +08:00
yzwu	ac013803f3	[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555 )	2025-12-18 02:14:25 -08:00
Yonghua Li	0c8c6369ed	[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 ) * [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports * [fix] fix some bugs * [fix] fix rdma port for cache manager/messager * [fix] temporarily cancel port availability check to see if it can pass ci test * [feat] simplify args for multi api server * [fix] fix dp * [fix] fix port for xpu * [fix] add tests for ports post processing & fix ci * [test] fix test_multi_api_server * [fix] fix rdma_comm_ports args for multi_api_server * [fix] fix test_common_engine * [fix] fix test_cache_transfer_manager * [chore] automatically setting FD_ENABLE_MULTI_API_SERVER * [fix] avoid api server from creating engine_args twice * [fix] fix test_run_batch * [fix] fix test_metrics * [fix] fix splitwise connector init * [test] add test_rdma_transfer and test_expert_service * [fix] fix code syntax * [fix] fix test_rdma_transfer and build wheel with rdma script	2025-12-17 15:50:42 +08:00
YuBaoku	5d2b16e6f3	[CI] Remove test_metrics.py due to incompatible forced merge (#5578 ) * [CI] Remove test_metrics.py due to incompatible forced merge	2025-12-16 14:04:46 +08:00
YuBaoku	63fff8df70	[CI] Adapt vl_model baseline changes due to Paddle update (#5576 )	2025-12-16 11:42:31 +08:00
MingkunZhang	f32e331ef5	[Metax] add ci yaml (#5520 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-12-12 13:35:38 +08:00
luukunn	fbc9bce1e9	[Feature]Optimization of Thinking Pattern Framework (#4302 ) * add model status in vl * add x1 parser * add model_status * fix parser * fix parser * fix parser * fix parser * Revert "fix parser" This reverts commit `300f446d8a`. * fix parser * fix * fix * fix * fix * fix parser * fix unit test * fix unit test * add unit test * fix * fix * add unit test * fix unit test * add unit test * add unit test * fix unit test * fix unit test * fix bug * fix unit test * x1 tool parser * fix unit test * fix unit test * fix unit test * fix n * fix unit test * add unit test * add unit test * remove pring	2025-12-10 16:17:06 +08:00
Echo-Nie	1b1bfab341	[CI] Add unittest (#5328 ) * add test_worker_eplb * remove tesnsor_wise_fp8 * add copyright	2025-12-09 19:19:42 +08:00
lizexu123	95eab9f9ee	[Feature] support stop_token_ids (#5399 ) * support stop_token_ids * fix * delete chinese * support both * delete print	2025-12-09 17:49:12 +08:00
YuBaoku	dfeabee123	[CI] Allow occasional distributed worker exit_code (#5341 )	2025-12-03 10:56:59 +08:00
YuBaoku	3e2c13d8c5	[CI] Disable queue state assertion temporarily (#5329 )	2025-12-02 18:57:29 +08:00
Jiaxin Sui	b0113cb0fc	[XPU][CI] Change XPU CI Base Value (#5318 ) * Add '小度' keyword to assertion in run_w4a8.py * Add keywords to assertion in run_ep_online.py * Add keywords to assertion in run_w4a8.py * Update run_45T.py * Update run_ep_online.py * Refactor assertion for response content keywords * Update run_w4a8.py * Update run_w4a8.py	2025-12-01 21:02:09 +08:00
Jiaxin Sui	b467e9dadc	[XPU][CI]Change W4A8 Case Base Value (#5309 )	2025-12-01 15:25:33 +08:00
ddchenhao66	fc88eebc32	[CI][XPU] add pd disaggregation (#5179 ) * [CI][XPU] add pd disaggregation * Clarify comments and install iproute2 Updated comments to clarify script purpose and added installation of iproute2. --------- Co-authored-by: ddchenhao66 <dhaochen163.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-11-28 10:44:27 +08:00
YuBaoku	6a6bf4ea24	[CI] Fix test streaming with stop str (#5275 ) * [CI] add output for last_token in test_streaming_with_stop_str * [CI] Adapt empty last_token check	2025-11-27 20:51:39 +08:00
Jiaxin Sui	5ff93d4998	[XPU][CI] change VL model to 28B-VL-thinking (#5169 ) * Enhance run_ci_xpu.sh with caching and prefill options * Update model path and configuration in run_ci_xpu.sh * Add '北朝' keyword to assertion in run_45vl.py * Enhance process termination logic in run_ci_xpu.sh * Set timeout for CI_XPU job to 60 minutes * Remove extra newline in stop_processes function	2025-11-24 16:50:18 +08:00
YuBaoku	98f1ab46a9	[CI] add output for last_token in test_streaming_with_stop_str (#5170 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-11-24 10:49:17 +08:00
chenjian	3ea1b44a58	[Optimization] Improve perf for fd response token with internal adapter (#4992 ) * [Optimize] Improve perf for fd response token with internal adapter * fix * fix bug * fix ci * fix ci * fix ci * fix ci	2025-11-21 19:02:03 +08:00
Zhang Yulong	be9541a97b	[CI] add metrics case (#5115 ) * add case * add case	2025-11-19 11:50:12 +08:00
FocusLuo	c2c1942db9	[INTEL_HPU] [CI] enabled fastdeploy PR testing (#4596 ) * [INTEL HPU] added hpu ci work flow support Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] added run ci hpu test scripts Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] enabled HPU ernie test case Signed-off-by: Luo, Focus <focus.luo@intel.com> * [INTEL HPU] updated Intel Gaudi Readme with Warmup disable cmdline Signed-off-by: Luo, Focus <focus.luo@intel.com> * Modify paddlepaddle installation command Updated paddlepaddle installation command to use a specific index URL. * Update run_ci_hpu.sh * Rename json directory to nlohmann_json Rename extracted json directory to nlohmann_json. * Update ci_hpu.yml * Set pip global index URL to Tsinghua mirror * Update CI workflow to use self-hosted runner and paths * Update Docker image in CI workflow * Modify HPU installation URLs in run_ci_hpu.sh Updated the installation URL for paddle_intel_hpu and added paddlenlp_ops installation. * Fix paddle_intel_hpu installation URL Corrected the URL for paddle_intel_hpu wheel installation. --------- Signed-off-by: Luo, Focus <focus.luo@intel.com> Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>	2025-11-17 19:24:41 +08:00
plusNew001	7f94d77e08	[XPU][CI] fix ci case bug (#5084 ) * Ignore markdown and text files in CI workflow * Change GPU_ID to XPU_ID in run_ci_xpu.sh * Change GPU_ID to XPU_ID in test configuration * Change GPU_ID to XPU_ID for service port calculation * Change GPU_ID to XPU_ID for device identification * Change GPU_ID to XPU_ID in test_ep function * Update run_w4a8.py * Redirect stop_processes output to kill.log Redirect output of stop_processes to kill.log to capture logs. * Log server output for failed test cases Added logging of server.log for failed tests. * Add '-s' option to pytest commands in run_ci_xpu.sh * Refactor assertion to validate multiple keywords Updated assertion to check for multiple keywords in response. * Fix assertany to assert any in run_45vl.py	2025-11-17 16:01:27 +08:00
plusNew001	0e819cd596	[CI][XPU] Optimize CI logs and variable names (#5025 ) * Ignore markdown and text files in CI workflow * Change GPU_ID to XPU_ID in run_ci_xpu.sh * Change GPU_ID to XPU_ID in test configuration * Change GPU_ID to XPU_ID for service port calculation * Change GPU_ID to XPU_ID for device identification * Change GPU_ID to XPU_ID in test_ep function * Update run_w4a8.py * Redirect stop_processes output to kill.log Redirect output of stop_processes to kill.log to capture logs. * Log server output for failed test cases Added logging of server.log for failed tests. * Add '-s' option to pytest commands in run_ci_xpu.sh	2025-11-14 19:35:35 +08:00
zccjjj	88da9d9788	[XPU] [CI] Change CI ep test from offline to online (#4885 ) * change CI ep test from offline to online * add ep all2all ci's changes, from offline to online * change env var in ep-all2all ci test * add expected response for ep8tp8 all2all * Adapt to CI refactoring and support dual-concurrent code execution * Adapt to CI refactoring and support dual-concurrent, second * Explicitly specify the #port * change the startup method of all2all * Modify the command of all2all * Update assertion to check multiple keywords * Update assertion to check multiple keywords * Update run_w4a8.py * Update run_w4a8.py --------- Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>	2025-11-13 16:15:45 +08:00
yzwu	76e60e98f8	[Iluvatar][CI] fix safetensors_rust.SafetensorError: framework paddle is invalid (#4972 )	2025-11-12 14:13:40 +08:00
yzwu	3707af7a4f	[Iluvatar] add vl into ci and support v1 loader (#4774 )	2025-11-11 10:50:17 +08:00
Yuanle Liu	3dc0ffa46d	[TSP] Support qwen3 moe tsp + cudagraph (#4871 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support qwen3_moe tsp mode * fix * fix * update * update * update * fix * support external_rmsnorm * update * fix	2025-11-10 23:37:51 +08:00
plusNew001	3665c283b5	[XPU] [CI]Change CI to multi-concurrency (#4866 ) * Refactor GPU ID logic in CI workflow Updated GPU ID assignment logic and removed unused port calculations. * Refactor GPU device and port configuration * Update engine_worker_queue_port calculation logic * Refactor XPU_VISIBLE_DEVICES export logic * Adjust service port based on GPU ID * Adjust service HTTP port based on GPU ID * Adjust service_http_port based on GPU_ID * Add import for os module in run_45T.py * Update run_45vl.py * Import os module in run_w4a8.py Added import for os module to use environment variables. * Remove duplicate import of os module * Remove duplicate import of os module * Update run_45T.py * Update run_w4a8.py * fix bug * fix bug * Update run_w4a8.py * Fix directory change command in run_ci_xpu.sh	2025-11-10 21:09:48 +08:00
plusNew001	0a3bc84f71	[XPU][CI]Update test assertion and base response value (#4907 )	2025-11-10 11:44:54 +08:00
plusNew001	fa098383f6	[XPU][CI] Ci bug fix (#4889 ) * Refactor test_45t by commenting out responses Comment out base response variables and update assertion. * Update run_w4a8.py * Fix assertion syntax in run_45T.py	2025-11-07 17:50:11 +08:00
YuBaoku	fa28745f19	[CI] Update ERNIE-4.5-VL baseline to adapt to MoE changes (#4867 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-11-06 22:02:10 +08:00
YuBaoku	a139f8f3cb	[CI] Optimize port cleanup logic (#4860 )	2025-11-06 19:13:48 +08:00
plusNew001	fc8bef2c95	[XPU][CI]Change ci vl model to 28 b (#4764 ) * Update XPU_VISIBLE_DEVICES and model parameters * Update base response and adjust max tokens * Implement process cleanup in CI workflow Add process cleanup commands to prevent port conflicts * Remove process cleanup commands from CI workflow Removed old process cleanup commands to prevent port conflicts.	2025-11-06 14:12:23 +08:00
zhupengyang	2fd254e5b7	support ep+tp at op layer (#4688 )	2025-11-05 11:15:57 +08:00
YuBaoku	722110a952	[CI] Refactor CE wheel upload for multiple target paths (#4790 ) * [CI] Refactor CE wheel upload for multiple target paths * [CI] fix test_streaming_with_stop_str error	2025-11-04 18:56:38 +08:00
plusNew001	9887025926	Update run_w4a8.py (#4783 ) Some checks failed CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-11-03 21:41:00 +08:00
yinwei	377f3bf5f2	[XPU] add v1 support for bf16 (#4744 ) * support v1 loader * update code style * update code	2025-11-03 14:13:17 +08:00
YuBaoku	acef624049	[CI] Fix rollout_model test logic (#4730 )	2025-10-31 22:25:24 +08:00
plusNew001	ea866e4b34	[XPU] [CI] Add Vl case (#4649 ) * Enhance CI script with health checks and logging Updated the CI script to include health checks and logging for the VL model testing process. * Add test for OpenAI chat completions * Refactor chat completion user message structure * Fix variable name for exit code in CI script * Update text prompt to Chinese for artifact question * Update service port and response assertions in tests * Refactor assertion for response content comparison * Update run_45vl.py * Change service HTTP port from 8123 to 8188	2025-10-31 10:38:09 +08:00
李泳桦	a012e3608b	[Feature] support logits processors (#4515 ) * [feat] provide an interface for logits processors and a builtin LogitBiasLogitsProcessor * [chore] fix code style * [fix] add unit test & fix existing bugs * [feat] add engine/worker arg --logits-processors * [fix] redefine user args as logits_processors_args and fix some bugs * [fix] fix test_sampler * Update fastdeploy/model_executor/logits_processor/builtin.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/model_executor/logits_processor/__init__.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/model_executor/test_logits_processor.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [fix] fix typo * Update fastdeploy/engine/sampling_params.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [fix] fix bracelet * [chore] redefine logits processor interface: pass the entire share_inputs into LP, do not copy share_inputs and logits * [doc] add docs * [fix] fix logit bias processor not applied when decoding is too fast & add docs and tests * [fix] fix redundant code * [feat] skip apply() if no bias is specified --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-29 00:08:53 +08:00
YuBaoku	e1ac90d787	[CI] Revert test_rollout_model directory change (#4626 )	2025-10-28 20:14:00 +08:00
yyssys	cd6d1f633c	[XPU]add xpu ci w4a8 case (#4501 )	2025-10-28 19:02:29 +08:00
YuBaoku	b2c6c41447	[CI] Relocate server test cases from ci_use directory to e2e (#4608 )	2025-10-28 11:37:30 +08:00
yyssys	822dea8d5f	[XPU]Moe uses a new operator (#4585 ) * [XPU]Moe uses a new operator * [XPU]Moe uses a new operator * update response	2025-10-24 23:01:46 +08:00
Sunny-bot1	4ffe41a747	WINT4/WINT8 dense gemm default use Machete (#4451 )	2025-10-23 17:57:59 +08:00
Yuanle Liu	8e02a509c3	[CI] stable test_rollout_model.py (#4536 ) * stable test_rollout_model.py * update baseline * update baseline	2025-10-22 01:59:44 -07:00
yzwu	dc7facaa7f	[Iluvatar GPU] fix ci error caused by rebuild_padding param and cuda graph (#4504 )	2025-10-21 21:41:41 +08:00
plusNew001	2bd3fb6315	[XPU]add xpu ci ep case (#4432 ) * add xpu ci case * Add xDeepEP download and build steps Download and build xDeepEP before running tests. * Fix formatting and add missing sleep command * Update Docker image version in CI workflow * Modify run_ci_xpu.sh for log cleanup and error handling Clean up log files before running tests and output worker log on failure. * Enhance test_ep.py with process management and assertions Refactor test function to include process cleanup and assertions. * Replace test_fastdeploy_llm with test_fd_ep * Fix conditional statement in run_ci_xpu.sh * Update test_ep.py for string handling and formatting Fix string encoding issues and improve readability. * Rename test_ep.py to run_ep.py * Change test script from test_ep.py to run_ep.py	2025-10-21 19:19:40 +08:00

1 2

84 Commits