FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-07 17:41:52 +08:00

Author	SHA1	Message	Date
Zhang Yulong	33ff0bfe38	Update disaggregated.md (#3495 ) 修复文档错误	2025-08-20 19:39:18 +08:00
YUNSHEN XIE	e197894977	add e2e cases (#3476 ) * add e2e cases * fix	2025-08-20 18:50:14 +08:00
Zhang Yulong	9ff2dfb162	Create eb45-8k-fp8-tp1-dp8_ep.yaml (#3485 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details 混合架构EP并行yaml	2025-08-20 14:33:54 +08:00
YuBaoku	33d369586b	[CI] remove useless case (#3482 )	2025-08-20 14:20:30 +08:00
xiaolei373	5d131485d8	add error log to file (#3431 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * feat(log):add_request_and_response_log * feat[log]:add error log to file	2025-08-20 09:52:34 +08:00
YUNSHEN XIE	3a6058e445	Add stable ci (#3460 ) * add stable ci * fix * update * fix * rename tests dir;fix stable ci bug * add timeout limit * update	2025-08-20 08:57:17 +08:00
kevin	67298cf4c0	add error traceback info (#3419 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add error traceback info * update error msg * update code --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-19 19:32:04 +08:00
yangjianfengo1	b047681c5d	【New Feature】支持Fp8 group Gemm 24稀疏 (#3463 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details * 支持24稀疏 * code style * 增加stmatrix 宏定义判断 * code style	2025-08-19 02:54:47 -07:00
ltd0924	d587fb257f	[CI] add test generation demo (#3270 ) * Create test_generation.py * update * update * format * Update test_generation.py * Update test_generation.py * Update test_generation.py * Update test_generation.py * Update test_generation.py * Update test_generation.py * Update test_generation.py * Update test_generation.py * Update setup.py * Delete test/plugins/test_model_runner_register.py --------- Co-authored-by: YUNSHEN XIE <1084314248@qq.com>	2025-08-19 17:12:40 +08:00
Zero Rains	fef447e350	[V1 Loader] Support MOE parameters create and load for DeepGemm and marlin backend (#3447 ) * support deepgemm backend * support marlin backend * remove print * fix process_prequanted_weights	2025-08-19 14:15:53 +08:00
chen	6735626014	fix request_output sampling_params (#3154 ) (#3464 )	2025-08-19 13:52:50 +08:00
ltd0924	bca8905b40	[BugFix] fix control signal release failed (#3390 ) * [BugFix] fix control signal release failed * [BugFix] fix control signal release failed * update * update * update	2025-08-19 13:51:38 +08:00
Zero Rains	8b12c80f90	[FixBug] compute early stopping with real batch size (#3418 ) * [FixBug] compute early stopping with real batch size * update * fix test_sampler	2025-08-18 22:09:21 -07:00
luukunn	3a7a20d191	[Feature] Pass through the `chat_template_kwargs` to the data processing module (#3421 ) * fix chat_template_args * fix args * add offline * add offline * fix * fix * fix default enable_thinking value * fix default enable_thinking value * modify condition * Revert "modify condition" This reverts commit `26430bdeb1`. * fix unit test	2025-08-19 10:50:01 +08:00
lizexu123	a053ab889b	[BugFix] fix num_running_requests in cuda_graph (#3457 ) * fix cuda_grpah * add note --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-08-19 10:47:22 +08:00
AIbin	beec24fd89	【Inference Optimize】DeepSeek-v3 model inference performance optimization (#3455 ) * DSK_OPT_01 * update FA3	2025-08-19 10:42:42 +08:00
zhuzixuan	c95b3395e9	【BugFix】completion接口echo回显支持 (#3245 ) * wenxin-tools-511,修复v1/completion无法回显的问题。 * 支持多prompt的回显 * 支持多prompt情况下的流式回显 * 补充了 completion 接口支持 echo 的单元测试 * pre-commit * 移除了多余的test文件 * 修复了completion接口echo支持的单测方法 * 补充了单元测试文件 * 补充单测 * unittest * 补充单测 * 修复单测 * 删除不必要的assert. * 重新提交 * 更新测试方法 * ut * 验证是否是正确思路单测 * 验证是否是正确思路单测 * 验证是否是正确思路单测3 * 优化单测代码，有针对性地缩小单测范围。 * 优化单测代码2，有针对性地缩小单测范围。 * 优化单测代码3，有针对性地缩小单测范围。 * support 'echo' in chat/completion. * update * update * update * update * update * update * 补充了关于tokenid的单元测试 * update * 修正index错误 * 修正index错误	2025-08-19 10:41:51 +08:00
lizexu123	32b39620bc	[Code Simplification] remove cum_offsets (#3410 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details	2025-08-18 20:21:25 +08:00
YUNSHEN XIE	2cf96ddd68	add publish workflow (#3063 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add publish job * update * update	2025-08-18 16:42:36 +08:00
luukunn	9c129813f9	[Feature] add custom chat template (#3251 ) * add custom chat_template * add custom chat_template * add unittest * fix * add docs * fix comment * add offline chat * fix unit test * fix unit test * fix * fix pre commit * fix unit test * add unit test * add unit test * add unit test * fix pre_commit * fix enable_thinking * fix pre commit * fix pre commit * fix unit test * add requirements	2025-08-18 16:34:08 +08:00
Jundong Liu	70ee910cd5	[Excutor] Change cudagraph hashkey from batch size to num_tokens (#3454 )	2025-08-18 16:16:48 +08:00
Jundong Liu	ea4a3b479c	[Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool (#3404 ) * 修复buffer申请不够大，增加打印forwardmetadata的工具 * fix mistake * Make CPU tensor in CPUPlace * Add test about forward_meta_str and Add unitest_requirement --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-08-18 16:14:09 +08:00
chen	5585cf7aa5	fix mtp_rej_topp input (#3450 )	2025-08-18 16:12:42 +08:00
Divano	246cd7b3a5	Perf (#3453 ) * add repitation early stop cases * add repitation early stop cases * add stress tool	2025-08-18 15:37:46 +08:00
gaoziyuan	6fdd83da10	fix some bug (#3434 )	2025-08-18 14:39:13 +08:00
freeliuzc	a12d0bc549	[Feature][MTP]update multi-draft-token strategy (#3369 ) * update multi-draft-token strategy * fix format --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-08-18 13:59:56 +08:00
Zhang Yulong	3ee6053e5d	Add ci case (#3355 ) * add ci cases * debug debug H20 baseline * Update run_pre_ce.sh * Update test_EB_Lite_serving.py * Update test_EB_VL_Lite_serving.py * Update test_EB_Lite_serving_mtp.py * Update test_Qwen3-MoE_serving.py * Update test_Qwen2-7B-Instruct_serving.py * Update run_pre_ce.sh	2025-08-18 11:35:56 +08:00
chen	e88f5552db	fix cpu __ini__.py (#3448 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-08-17 12:38:54 +08:00
RAM	33c0197ebe	[Docs] Update mkdocs.yml (#3444 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * Updata docs of graph opt backend * update best_practices * update mkdocs.yaml * [Docs]Update mkdocs.yml	2025-08-15 21:57:40 +08:00
RAM	154308102e	[Docs]Updata docs of graph opt backend (#3442 ) * Updata docs of graph opt backend * update best_practices	2025-08-15 21:30:32 +08:00
yongqiangma	5703d7aa0f	update installation readme (#3429 )	2025-08-15 19:09:41 +08:00
yangjianfengo1	615930bc05	Update README (#3426 ) * 修改READMe * code style * code style	2025-08-15 18:46:28 +08:00
JYChen	6f11171478	fix some docs error (#3439 )	2025-08-15 18:45:27 +08:00
yinwei	354575b6d1	[Docs]Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95 (#3428 ) * XPU Update 2.1 Release Documentation * code style check * Modify the gpu-memory-utilization of the 128K 8-card Wint4 model to 0.95	2025-08-15 18:34:37 +08:00
YUNSHEN XIE	cc8ee50f27	add accuracy check ci (#3389 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * add accuracy ci * fix * fix * update * rename ci jobs	2025-08-15 15:17:43 +08:00
GoldPancake	4bd6a9fa7d	[Bugs] Fix DeepGEMM pre-compile tools. (#3351 ) Fix some miss cache problems. Add README.md.	2025-08-15 14:37:49 +08:00
ming1753	d4e3a20300	[Docs] Release 2.1 docs and fix some description (#3424 )	2025-08-15 14:27:19 +08:00
yinwei	fbb6dcb9e4	[Docs]XPU Update 2.1 Release Documentation (#3423 ) * XPU Update 2.1 Release Documentation * code style check	2025-08-15 14:07:47 +08:00
JYChen	562e01c979	update docs (#3420 )	2025-08-15 13:00:08 +08:00
Jiang-Jia-Jun	cca96ab1e4	Update Dockerfile.gpu	2025-08-15 12:29:20 +08:00
Jiang-Jia-Jun	7132fa9ec2	Update dockerfile	2025-08-15 12:28:08 +08:00
Sunny-bot1	6c1f3ff897	topk_gating_softmax support bias (#3405 )	2025-08-15 11:57:45 +08:00
ltd0924	5a84324798	[Doc] Add multinode deployment documents (#3417 ) * Create multi-node_deployment.md * Create multi-node_deployment.md * Update mkdocs.yml	2025-08-15 10:37:04 +08:00
chen	f0f00a6025	[OPs] Universal optimization and Fix early_stop cuda 700 (#3375 ) Some checks failed Deploy GitHub Pages / deploy (push) Has been cancelled Details * delete nonzero * delete setup_ops_base.py * check if * check gcp infer_seed.cpu() * fix repetition_early_stopper_kernel cuda 700	2025-08-14 22:40:44 +08:00
YuanRisheng	09c979f3dd	[V1 Loader] Support Ernie text（moe and dense） (#3110 ) * new loader support 0.3B * fix weight * support parallel load * support parallel load * fix slice * support moe * delete code * perfect code * perfect code	2025-08-14 20:25:28 +08:00
xjkmfa	ab60292f89	【CI】 evil case (#3359 ) * Add ci case for min token and max token * 【CI case】include total_tokens in the last packet of completion interface stream output * 边缘检测，攻击性测试 * 边缘检测，攻击性测试 * 边缘检测，攻击性测试 * 边缘检测，攻击性测试 --------- Co-authored-by: xujing43 <xujing43@baidu.com>	2025-08-14 20:00:47 +08:00
freeliuzc	cacc52bf21	modify readme (#3409 )	2025-08-14 19:47:36 +08:00
Sunny-bot1	79d8ae4c38	[UT Fix] Fix bad_words test (#3385 ) * fix bad_words test * add streaming * fix * fix	2025-08-14 03:55:02 -07:00
lzy	1e06b9fa6d	make append_attn supports mask_offset (#3138 ) * make append_attn supports mask_offset * add unittest	2025-08-14 03:40:55 -07:00
memoryCoderC	6031f9a5f5	[BugFix] fix ErnieProcessor not set raw_prediction (#3400 )	2025-08-14 18:07:49 +08:00

1 2 3 4 5 ...

3104 Commits