FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-30 03:22:05 +08:00

Author	SHA1	Message	Date
chenjian	c487b62ee0	[Bug fix] Fix memory allocation (#3475 ) * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug * Fix bug for memory allocation	2025-08-19 19:48:24 +08:00
chenjian	d2f6c3b998	[Bug fix] Fix bug for seq_len_encoder is 1 (#3467 )	2025-08-19 15:21:32 +08:00
chenjian	aba94169dc	[Feature] Support batched tokens for EP (#3415 ) * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug	2025-08-18 11:43:36 +08:00
chenjian	3f86ae0007	fix cache messager bug when d restart (#3386 )	2025-08-14 11:43:59 +08:00
chenjian	89177d881c	[Bug fix] Fix zmq core bug (#3357 ) * [Bug fix] Fix zmq core bug due to concurrently used by threads * Fix zmq core bug due to concurrently used by threads	2025-08-13 20:24:39 +08:00
chenjian	7573802a88	[Feature] Support mtp ep in fd (#3340 ) * [Optimize] Add metrics for analysing perf * Fix bug in mtp	2025-08-11 21:49:44 +08:00
chenjian	110f33a530	[Bug fix] Test td cache messager (#3242 ) * support disable cache task in decode node * fix busg * Update engine.py * Update expert_service.py * Update splitwise_connector.py * Optimize log for debug * Optimize log for debug * fix bug --------- Co-authored-by: ltd0924 <ltd0924@sina.com> Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>	2025-08-06 15:52:45 +08:00
chenjian	a4572a5e5d	fix bug for pd step signal (#3230 )	2025-08-06 10:41:52 +08:00
chenjian	a9d231c900	Fix bug for concurrently visit zmq (#3233 )	2025-08-06 10:41:10 +08:00
ltd0924	b20ffe3697	[Feature] optimize expert parallel (#3196 ) * optimize * Update expert_service.py * Update worker_process.py * optimize	2025-08-05 17:34:24 +08:00
ltd0924	dcf9c2daff	[Feature] Optimize prefix cache (#3208 ) * [LLM] support ep * Update worker_process.py * Update expert_service.py * Update worker_process.py * format files * optimize prefix cache * optimize prefix cache * optimize prefix cache * pre commit format * pre commit format * pre commit format * Update cache_messager.py	2025-08-05 17:13:11 +08:00
chenjian	9f9971844f	[Feature] Support ep pd with external module (#3194 ) * Support external module * Support external module * Support external module * Support external module * refactor code to make it more clear * refactor code to make it more clear * refactor code to make it more clear * refactor code to make it more clear * fix according to review * fix according to review * fix according to review * fix according to review * fix according to review * fix according to review * fix bug * fix bug * fix bug * merge --------- Co-authored-by: root <root@tjdm-inf-sci-k8s-hzz2-h12ni8-0202.tjdm.baidu.com>	2025-08-04 20:32:41 +08:00
gaoziyuan	0443587a57	【Feature】support qwen3 name_mapping (#3179 ) * add fd plugins && rm model_classed * fix reviews * add docs * fix * fix unitest ci * support qwen3 name_mapping	2025-08-04 01:34:07 -07:00
Zero Rains	17f51f0c92	[unitest] fix the bug in test_sampler (#3157 )	2025-08-04 01:23:25 -07:00
YuanRisheng	79bbacc152	Fix approve shell scripts (#3108 ) * fix approve * fix	2025-08-04 15:51:33 +08:00
Divano	3bfb2eca92	Update test_base_chat.py (#3183 )	2025-08-04 15:09:53 +08:00
ltd0924	c9e6ce1518	Update cache_messager.py (#3172 )	2025-08-04 14:32:34 +08:00
gaoziyuan	4021d66ea5	【Feature】add fd plugins && rm model_classes (#3123 ) * add fd plugins && rm model_classed * fix reviews * add docs * fix * fix unitest ci	2025-08-03 19:53:20 -07:00
bukejiyu	1582814905	fix load_pre_sharded_checkpoint (#3152 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-04 10:44:20 +08:00
Divano	66d3bb89ad	Update __init__.py (#3163 ) 升级测试基类兼容性	2025-08-04 09:40:09 +08:00
AIbin	22fe695f1c	【Inference Optimize】Support automatic generation of marlin kernel (#3149 ) * Support automatic generation of marlin kernel	2025-08-01 22:43:18 +08:00
ApplEOFDiscord	b71cbb466d	[Feature] remove dependency on enable_mm and refine multimodal's code (#3014 ) * remove dependency on enable_mm * fix codestyle check error * fix codestyle check error * update docs * resolve conflicts on model config * fix unit test error * fix code style check error --------- Co-authored-by: shige <1021937542@qq.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-01 20:01:18 +08:00
plusNew001	243394044d	[XPU]Updata XPU dockerfiles (#3144 ) * [CI] add xpu ci case * [CI]Update run_ci_xpu.sh * [XPU]Update Dockerfile.xpu * Update Dockerfile.xpu	2025-08-01 19:41:59 +08:00
Zhang Yulong	0eb32bb9c8	add cases (#3155 )	2025-08-01 18:38:57 +08:00
yangjianfengo1	64d7a3194d	集中式支持fa3 (#3112 )	2025-08-01 18:03:36 +08:00
YUNSHEN XIE	bdb83e007d	fix ci (#3141 )	2025-08-01 17:42:26 +08:00
Divano	50db0d7ba9	add case (#3150 ) * add test base class * fix codestyle * fix codestyle * add base chat	2025-08-01 17:30:58 +08:00
Ryan	94264bbf60	[Code Simplification] Refactor Post-processing in VL Model Forward Method (#2937 ) * rm sth useless * refactor model forward * mv bool index to kernel	2025-08-01 17:28:07 +08:00
yinwei	3a4db15765	Fix out-of-memory issue during single-XPU deployment (#3133 )	2025-08-01 17:12:03 +08:00
JYChen	c34088b0fd	fix stop seq unittest (#3126 )	2025-08-01 16:50:05 +08:00
ming1753	fc5f43c6bc	[Docs] Optimal Deployment (#2768 )	2025-08-01 11:56:27 +08:00
chen	a2f5cc54f8	moe preprocess op support 160 experts and fused_moe triton kernel name add K (#3121 )	2025-08-01 10:46:20 +08:00
Divano	1d93565082	[CE] Add base test class for web server testing (#3120 ) * add test base class * fix codestyle * fix codestyle	2025-07-31 23:28:50 +08:00
YUNSHEN XIE	e1011e92d9	disable test_cuda_graph.py (#3124 )	2025-07-31 22:03:48 +08:00
plusNew001	8c63237cfa	[CI] add xpu ci case (#3111 ) * [CI] add xpu ci case * [CI]Update run_ci_xpu.sh	2025-07-31 22:03:34 +08:00
YUNSHEN XIE	ff6a109b4d	Describe PR diff coverage using JSON file (#3114 ) * Refactored ci pipeline * update * Describe PR diff coverage using JSON file * remove pip cache setting from Approve * fix * update	2025-07-31 21:59:20 +08:00
SunLei	dade19d7a4	[Feature] General support for logprobs (#2974 ) * [Feature] support logprobs in chat/completions and completions endpoints * Temporarily comment out text_offset due to incorrect logic * Clean up temporary debug prints * [Feature] support logprobs in offline mode via SamplingParams * fix: serialize Logprob as dict before zmq send to fix msgpack error * refactor: remove redundant methods to simplify codebase * Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization * refactor: centralize param validation in engine_client to reduce duplication * revert: rollback changes in offline_demo.py * revert: rollback changes in offline_demo.py * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 20:25:56 +08:00
chenjian	fe17410f9c	[BUG] Fix bug for pd in fd (#3034 ) * Fix bug for pd in fd * Fix bug for pd in fd --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 20:17:27 +08:00
Zhang Yulong	1a543bca29	Fix test_EB_Lite_serving.py (#3119 ) * Fix test_EB_Lite_serving.py * fix test_EB_Lite_serving.py	2025-07-31 20:15:25 +08:00
Yuan Xiaolan	5f56d289a7	fix is_permuted (#3098 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:58:05 +08:00
LiqinruiG	25005fee30	[Doc] add chat_template_kwagrs and update params docs (#3103 ) * add chat_template_kwagrs and update params docs * add chat_template_kwagrs and update params docs * update enable_thinking * pre-commit * update test case --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:44:06 +08:00
kevin	22cab724e8	[Feature] block scheduler v1 support prefix caching (#3061 ) * block scheduler v1 support prefix cache * update code * update code * fix code bug * add timeout time --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:29:19 +08:00
chenjian	32307283f1	Fix bug for offline inference in scheduler v1 (#3117 )	2025-07-31 17:54:24 +08:00
YUNSHEN XIE	583eae2fd1	fix ci (#3106 ) * fix ci * disable test_non_streaming_chat_with_min_tokens	2025-07-31 17:25:08 +08:00
JYChen	1ef38b1563	[doc] best practice for eb45 text models (#3002 ) * [doc] best practice for eb45 text models * fix docs	2025-07-31 17:21:55 +08:00
Jiang-Jia-Jun	4498058722	Update README.md	2025-07-31 15:33:12 +08:00
Jiang-Jia-Jun	66304cf921	Update sampling.md	2025-07-31 15:02:57 +08:00
yinwei	5b9aec1f10	xpu release 2.0.3 (#3105 )	2025-07-31 14:26:07 +08:00
YUNSHEN XIE	66c3835a46	add approve ci (#3093 ) * add approve ci * fix * fix	2025-07-31 10:10:10 +08:00
RAM	d850660872	[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989 ) * reset decoder_block_shape_q buffer * refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch * update decode_max_tile_size * fix pre-commit * update block_multihead_attn_backend * update flas attn backend * update MLA Attention * update XPU Attention * update gcu,iluvatar model runner * Update MTP * fix MTP bug	2025-07-31 00:09:31 +08:00

1 2 3 4 5 ...

2908 Commits