FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-06 09:07:10 +08:00

Author	SHA1	Message	Date
chenjian	c487b62ee0	[Bug fix] Fix memory allocation (#3475 ) * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug * Fix bug for memory allocation	2025-08-19 19:48:24 +08:00
chenjian	d2f6c3b998	[Bug fix] Fix bug for seq_len_encoder is 1 (#3467 )	2025-08-19 15:21:32 +08:00
chenjian	aba94169dc	[Feature] Support batched tokens for EP (#3415 ) * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug * Support batched tokens for EP and fix bug	2025-08-18 11:43:36 +08:00
chenjian	3f86ae0007	fix cache messager bug when d restart (#3386 )	2025-08-14 11:43:59 +08:00
chenjian	89177d881c	[Bug fix] Fix zmq core bug (#3357 ) * [Bug fix] Fix zmq core bug due to concurrently used by threads * Fix zmq core bug due to concurrently used by threads	2025-08-13 20:24:39 +08:00
chenjian	7573802a88	[Feature] Support mtp ep in fd (#3340 ) * [Optimize] Add metrics for analysing perf * Fix bug in mtp	2025-08-11 21:49:44 +08:00
chenjian	110f33a530	[Bug fix] Test td cache messager (#3242 ) * support disable cache task in decode node * fix busg * Update engine.py * Update expert_service.py * Update splitwise_connector.py * Optimize log for debug * Optimize log for debug * fix bug --------- Co-authored-by: ltd0924 <ltd0924@sina.com> Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>	2025-08-06 15:52:45 +08:00
chenjian	a4572a5e5d	fix bug for pd step signal (#3230 )	2025-08-06 10:41:52 +08:00
chenjian	a9d231c900	Fix bug for concurrently visit zmq (#3233 )	2025-08-06 10:41:10 +08:00
ltd0924	b20ffe3697	[Feature] optimize expert parallel (#3196 ) * optimize * Update expert_service.py * Update worker_process.py * optimize	2025-08-05 17:34:24 +08:00
ltd0924	dcf9c2daff	[Feature] Optimize prefix cache (#3208 ) * [LLM] support ep * Update worker_process.py * Update expert_service.py * Update worker_process.py * format files * optimize prefix cache * optimize prefix cache * optimize prefix cache * pre commit format * pre commit format * pre commit format * Update cache_messager.py	2025-08-05 17:13:11 +08:00
chenjian	9f9971844f	[Feature] Support ep pd with external module (#3194 ) * Support external module * Support external module * Support external module * Support external module * refactor code to make it more clear * refactor code to make it more clear * refactor code to make it more clear * refactor code to make it more clear * fix according to review * fix according to review * fix according to review * fix according to review * fix according to review * fix according to review * fix bug * fix bug * fix bug * merge --------- Co-authored-by: root <root@tjdm-inf-sci-k8s-hzz2-h12ni8-0202.tjdm.baidu.com>	2025-08-04 20:32:41 +08:00
gaoziyuan	0443587a57	【Feature】support qwen3 name_mapping (#3179 ) * add fd plugins && rm model_classed * fix reviews * add docs * fix * fix unitest ci * support qwen3 name_mapping	2025-08-04 01:34:07 -07:00
ltd0924	c9e6ce1518	Update cache_messager.py (#3172 )	2025-08-04 14:32:34 +08:00
gaoziyuan	4021d66ea5	【Feature】add fd plugins && rm model_classes (#3123 ) * add fd plugins && rm model_classed * fix reviews * add docs * fix * fix unitest ci	2025-08-03 19:53:20 -07:00
bukejiyu	1582814905	fix load_pre_sharded_checkpoint (#3152 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-04 10:44:20 +08:00
ApplEOFDiscord	b71cbb466d	[Feature] remove dependency on enable_mm and refine multimodal's code (#3014 ) * remove dependency on enable_mm * fix codestyle check error * fix codestyle check error * update docs * resolve conflicts on model config * fix unit test error * fix code style check error --------- Co-authored-by: shige <1021937542@qq.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-01 20:01:18 +08:00
yangjianfengo1	64d7a3194d	集中式支持fa3 (#3112 )	2025-08-01 18:03:36 +08:00
Ryan	94264bbf60	[Code Simplification] Refactor Post-processing in VL Model Forward Method (#2937 ) * rm sth useless * refactor model forward * mv bool index to kernel	2025-08-01 17:28:07 +08:00
yinwei	3a4db15765	Fix out-of-memory issue during single-XPU deployment (#3133 )	2025-08-01 17:12:03 +08:00
chen	a2f5cc54f8	moe preprocess op support 160 experts and fused_moe triton kernel name add K (#3121 )	2025-08-01 10:46:20 +08:00
SunLei	dade19d7a4	[Feature] General support for logprobs (#2974 ) * [Feature] support logprobs in chat/completions and completions endpoints * Temporarily comment out text_offset due to incorrect logic * Clean up temporary debug prints * [Feature] support logprobs in offline mode via SamplingParams * fix: serialize Logprob as dict before zmq send to fix msgpack error * refactor: remove redundant methods to simplify codebase * Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization * refactor: centralize param validation in engine_client to reduce duplication * revert: rollback changes in offline_demo.py * revert: rollback changes in offline_demo.py * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 20:25:56 +08:00
chenjian	fe17410f9c	[BUG] Fix bug for pd in fd (#3034 ) * Fix bug for pd in fd * Fix bug for pd in fd --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 20:17:27 +08:00
Yuan Xiaolan	5f56d289a7	fix is_permuted (#3098 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:58:05 +08:00
LiqinruiG	25005fee30	[Doc] add chat_template_kwagrs and update params docs (#3103 ) * add chat_template_kwagrs and update params docs * add chat_template_kwagrs and update params docs * update enable_thinking * pre-commit * update test case --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:44:06 +08:00
kevin	22cab724e8	[Feature] block scheduler v1 support prefix caching (#3061 ) * block scheduler v1 support prefix cache * update code * update code * fix code bug * add timeout time --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:29:19 +08:00
chenjian	32307283f1	Fix bug for offline inference in scheduler v1 (#3117 )	2025-07-31 17:54:24 +08:00
RAM	d850660872	[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989 ) * reset decoder_block_shape_q buffer * refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch * update decode_max_tile_size * fix pre-commit * update block_multihead_attn_backend * update flas attn backend * update MLA Attention * update XPU Attention * update gcu,iluvatar model runner * Update MTP * fix MTP bug	2025-07-31 00:09:31 +08:00
chenjian	fe0e3f508b	[BUG FIX] Fix bug when preempted request rescheduled (#3080 ) * Fix bug when preempted request rescheduled * Fix bug when preempted request rescheduled * Fix bug when preempted request rescheduled	2025-07-30 22:25:47 +08:00
Jiang-Jia-Jun	0616c208d2	[Feature] Support include_stop_str_in_output in completion api (#3096 ) * [Feature] Support include_stop_str_in_output in completion api * Fix ci test --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-30 22:18:48 +08:00
YuanRisheng	7dfdd157ac	[BugFix]Fix ep size (#3092 ) * fix ep * fix num_layer	2025-07-30 21:03:12 +08:00
ltd0924	d17886de19	[Feature] support ep in mixed mode (#3001 ) * [LLM] support ep * Update worker_process.py * Update expert_service.py * Update worker_process.py * format files	2025-07-30 20:43:39 +08:00
Zhida Hu	3f8a41e68c	[*] fix the memory leak when modify qp to rts failed (#3051 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-30 19:49:07 +08:00
李泳桦	b242150f94	[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client (#3058 ) * [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client * [fix] delete ci test case for enable_thinking * [fix] add reasoning_parser when server starts * [fix] fix ci consistency test error with reasoning parser * [doc] update docs related to metadata * [fix] cancel enable_thinking default value	2025-07-30 19:25:20 +08:00
bukejiyu	db698bda01	qwen loader (#3057 )	2025-07-30 19:09:38 +08:00
zhink	d89b6dd43f	adapter qwen3 moe attr for init (#3066 ) adapter qwen3 moe attr for init	2025-07-30 16:49:28 +08:00
bukejiyu	8e203666d9	w4a8 offline (#3074 ) * w4a8 offline * update * update * update	2025-07-30 16:33:30 +08:00
ming1753	5acde4eb43	[Feature] Multimodal Scheduler V1 (#3019 ) * [Feature] Support multimodal scheduler v1 * remove debug log * fix bug * fix format * modify code * fix bug * fix bug * fix bug * modify code	2025-07-30 16:05:55 +08:00
Jiang-Jia-Jun	ffa0f4d99b	[Fix] Fix version function (#3076 ) * [Fix] Fix version function * Fix commit * Fix commit * fix code sync * Update coverage_run.sh --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-30 16:05:24 +08:00
ltd0924	ecf2fd5b9a	[BugFix] vl encoder tokens dtype problem (#3069 )	2025-07-30 15:20:53 +08:00
Yuan Xiaolan	35935da9e5	support W4A8 EPLB (#3075 )	2025-07-30 14:34:12 +08:00
Yzc216	159767717d	[Feature] multi source download (#3072 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation * modify model download path	2025-07-30 14:10:13 +08:00
YuanRisheng	99a70fc722	unify parallel config (#3070 )	2025-07-30 11:41:23 +08:00
Sunny-bot1	74aa31d15b	[Feature] support bad_words (#3055 ) * support bad_words * support online infer bad_words * update * add CI test * update * update * update --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-30 09:31:29 +08:00
zhuzixuan	ad7bb52a28	修复传入max_tokens=1时的报错 (#3068 ) * 修复传入max_tokens=1时的报错 * 修复传入max_tokens=1时的报错 * 修复传入max_tokens=1时的报错 * 修复传入max_tokens=1时的报错 * 修复传入max_tokens=1时的报错 * 修复传入max_tokens=1时的报错	2025-07-29 23:49:28 +08:00
Ryan	73cfe1fd37	[SOT] Extend SOT warmup support to new hardware (#3032 ) * add new hardware * add_sot_warmup4new_hardware * fix conflict * rm Optional	2025-07-29 22:45:20 +08:00
Zero Rains	b2f9a42d87	[Feature] Support repetition early stop (#3024 ) * support repetition early stop and support user to set the parameter * remove log * fix codestyle * add the early_stop_config to rollout_config * update config and EarlyStopper class * fix the bug for triton * modify the stop method * update description * modify the usage for stop_flags --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-29 22:42:54 +08:00
Yuan Xiaolan	3214fb5393	support model loading for w4a8 offline quant (#3064 ) 支持W4A8 EP 对离线量化权重的load	2025-07-29 21:54:37 +08:00
Longzhi Wang	be0a0f2bb2	fix arguement error in ep when pd (#3060 )	2025-07-29 17:17:24 +08:00
YuanRisheng	502ee92a0a	Unify server-side and model-side Config (Part3) (#3047 ) * merge model config * fix arch * fix rl	2025-07-29 17:07:44 +08:00

1 2 3 4 5 ...

766 Commits