FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-10-10 11:00:19 +08:00

Author	SHA1	Message	Date
luukunn	edf1ca07af	Feature/online/vs think 20250813 (#3440 ) * add stream * fix ernie_vl_reasoning_parsers * fix bug	2025-08-15 18:33:58 +08:00
luukunn	bbd50c6717	add tool parser	2025-08-14 21:08:49 +08:00
luukunn	132a8ef425	Release/2.1 (#3414 ) * Pre ce modified (#3335) (#3360) * Pre ce modified (#3335) * update * update * fix * fix * update * update * update * fix * update * update * update * add ut fix pr(3367) * [Bug Fix] Fix V1 video bug (#3387) * fix stopseq error info (#3342) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> * [BugFix] Fix default log level of paddleformers (#3377) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> * [Polish Code] Remove useless notes * feat(log):add_request_and_response_log (#3392) * Optimize CI execution workflow. (#3371) (#3384) * fix * [BugFix] fix control signal release failed (#3374) * [BugFix] * [BugFix] * [BugFix] * [BugFix] * fix * fix --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1" This reverts commit `02596fc537`, reversing changes made to `03347626a6`. * [XPU] Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3393) * fix v1 schedule oom bug * fix v1 schedule oom bug * [BugFix] fix ErnieProcessor not set raw_prediction (#3401) * [Doc]Release fastdeploy-xpu 2.1.0 (#3407) * fix v1 schedule oom bug * fix v1 schedule oom bug * update release note * [Doc]Release fastdeploy-xpu 2.0.3 (#3408) * fix v1 schedule oom bug * fix v1 schedule oom bug * update release note * update info --------- Co-authored-by: YUNSHEN XIE <1084314248@qq.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com> Co-authored-by: JYChen <zoooo0820@qq.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: xiaolei373 <zley373@gmail.com> Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: yinwei <yinwei_hust@163.com> Co-authored-by: memoryCoderC <1137889088@qq.com>	2025-08-14 20:53:47 +08:00
Jiang-Jia-Jun	e11331927f	[Sync Code] Update vs branch (#3403 ) * Pre ce modified (#3335) (#3360) * Pre ce modified (#3335) * update * update * fix * fix * update * update * update * fix * update * update * update * add ut fix pr(3367) * [Bug Fix] Fix V1 video bug (#3387) * fix stopseq error info (#3342) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> * [BugFix] Fix default log level of paddleformers (#3377) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> * [Polish Code] Remove useless notes * feat(log):add_request_and_response_log (#3392) * Optimize CI execution workflow. (#3371) (#3384) * fix * [BugFix] fix control signal release failed (#3374) * [BugFix] * [BugFix] * [BugFix] * [BugFix] * fix * fix --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: YUNSHEN XIE <1084314248@qq.com> Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com> Co-authored-by: JYChen <zoooo0820@qq.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com> Co-authored-by: xiaolei373 <zley373@gmail.com> Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>	2025-08-14 17:14:45 +08:00
luukunn	81092c0fe3	add tool parser	2025-08-13 16:06:22 +08:00
memoryCoderC	37b76158f9	Completion add raw_prediction/text_after_process (#3362 )	2025-08-12 23:20:36 +08:00
memoryCoderC	fe2094609f	Release/2.1 (#3361 ) * [BugFix] v1/completions add finish_reason * update TestOpenAIServingCompletion for merge	2025-08-12 23:06:51 +08:00
gaoziyuan	b4bb54b56b	bugfix (#3322 )	2025-08-12 16:16:37 +08:00
Jiang-Jia-Jun	eeec4bd15e	Remove useless code release/2.1 (#3338 )	2025-08-12 11:32:50 +08:00
chenjian	25f51b0611	Fix block num in schduelr v1 for release 2.1 (#3315 ) * fix bug for scheduler v0 * fix block num setting in scheduler v1 for release 2.1 * fix block num setting in scheduler v1 for release 2.1 --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: YUNSHEN XIE <1084314248@qq.com>	2025-08-12 00:41:05 +08:00
ming1753	9b07f85f6d	[Bug Fix] fix vl V1 schedule bug (#3284 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: YUNSHEN XIE <1084314248@qq.com>	2025-08-12 00:40:45 +08:00
Jiang-Jia-Jun	ca4e4ab911	Revert "[BugFix] fix ep (#3290 )" (#3317 ) This reverts commit `86ff68be4b`.	2025-08-11 16:17:58 +08:00
chenjian	c000cff744	fix scheduler bug in release2.1 (#3295 )	2025-08-10 13:55:22 +08:00
lizexu123	86ff68be4b	[BugFix] fix ep (#3290 ) * fix ep * fix	2025-08-09 16:32:35 +08:00
yinwei	702c313ed1	revert pr (#3286 )	2025-08-09 16:29:35 +08:00
ltd0924	6706ccb37e	[BugFix] fix too many open files problem (#3275 )	2025-08-08 20:11:32 +08:00
JYChen	1b6f482c15	[Cherry-pick] fix stop seq (#3263 ) * fix out-bound value for stop sequence * catch error if there are out-of-bounds value * check in offline mode	2025-08-07 19:11:37 +08:00
sg263	5d3bf308f6	merge develop trace FD_START (#3253 ) Co-authored-by: shige <shige@baidu.com>	2025-08-07 11:10:55 +08:00
Sunny-bot1	f672a34f95	[FIX 2.1]fix bad_words when sending requests consecutively (#3199 ) * fix bad_words * fix log * fix log	2025-08-06 15:47:27 +08:00
lizexu123	bc0b92bba4	[BugFix] support real batch_size (#3109 ) (#3217 ) * support real bsz * fix * fix xpu_model_runner.py,gpu_model_runner.py,gcu_model_runner.py,iluvatar_model_runner.py * add event_loop_ep * fix * Add comments * fix * support mtp real_batch_size * fix * self.tmp_seq_lens_this_time->self.seq_lens_this_time_buffer * fix * fix VL real_seq_lens_this_time * fix * fix mtp * fix * fix mtp * fix xpu * fix	2025-08-06 14:30:33 +08:00
SunLei	3dd8492601	[Bugfix] Fix uninitialized decoded_token and add corresponding unit test (#3201 ) * Update test_base_chat.py (#3183) * [Bugfix] Fix uninitialized decoded_token and add corresponding unit test. --------- Co-authored-by: Divano <dddivano@outlook.com>	2025-08-05 10:55:22 +08:00
RAM	bd77a3a643	[Bug Fix] Fix bug of MLA Attention Backend (#3178 ) * fix typo * fix mla attention backend	2025-08-05 10:53:27 +08:00
yinwei	4367c09a5f	Fix out-of-memory issue during single-XPU deployment (#3131 )	2025-08-04 16:02:43 +08:00
bukejiyu	8e789dcb67	fix load_pre_sharded_checkpoint (#3152 ) (#3169 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-04 15:44:10 +08:00
ltd0924	5f6fc7f7b9	Update cache_messager.py (#3173 )	2025-08-04 15:09:17 +08:00
RAM	d4059cabf0	fix typo (#3153 )	2025-08-01 22:34:59 +08:00
chen	c8dd5976ae	fix request_output sampling_params (#3154 )	2025-08-01 22:34:33 +08:00
SunLei	dade19d7a4	[Feature] General support for logprobs (#2974 ) * [Feature] support logprobs in chat/completions and completions endpoints * Temporarily comment out text_offset due to incorrect logic * Clean up temporary debug prints * [Feature] support logprobs in offline mode via SamplingParams * fix: serialize Logprob as dict before zmq send to fix msgpack error * refactor: remove redundant methods to simplify codebase * Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization * refactor: centralize param validation in engine_client to reduce duplication * revert: rollback changes in offline_demo.py * revert: rollback changes in offline_demo.py * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs * [bugfix] fix parameter validation for logprobs --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 20:25:56 +08:00
chenjian	fe17410f9c	[BUG] Fix bug for pd in fd (#3034 ) * Fix bug for pd in fd * Fix bug for pd in fd --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 20:17:27 +08:00
Yuan Xiaolan	5f56d289a7	fix is_permuted (#3098 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:58:05 +08:00
LiqinruiG	25005fee30	[Doc] add chat_template_kwagrs and update params docs (#3103 ) * add chat_template_kwagrs and update params docs * add chat_template_kwagrs and update params docs * update enable_thinking * pre-commit * update test case --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:44:06 +08:00
kevin	22cab724e8	[Feature] block scheduler v1 support prefix caching (#3061 ) * block scheduler v1 support prefix cache * update code * update code * fix code bug * add timeout time --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-31 19:29:19 +08:00
chenjian	32307283f1	Fix bug for offline inference in scheduler v1 (#3117 )	2025-07-31 17:54:24 +08:00
RAM	d850660872	[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989 ) * reset decoder_block_shape_q buffer * refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch * update decode_max_tile_size * fix pre-commit * update block_multihead_attn_backend * update flas attn backend * update MLA Attention * update XPU Attention * update gcu,iluvatar model runner * Update MTP * fix MTP bug	2025-07-31 00:09:31 +08:00
chenjian	fe0e3f508b	[BUG FIX] Fix bug when preempted request rescheduled (#3080 ) * Fix bug when preempted request rescheduled * Fix bug when preempted request rescheduled * Fix bug when preempted request rescheduled	2025-07-30 22:25:47 +08:00
Jiang-Jia-Jun	0616c208d2	[Feature] Support include_stop_str_in_output in completion api (#3096 ) * [Feature] Support include_stop_str_in_output in completion api * Fix ci test --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-30 22:18:48 +08:00
YuanRisheng	7dfdd157ac	[BugFix]Fix ep size (#3092 ) * fix ep * fix num_layer	2025-07-30 21:03:12 +08:00
ltd0924	d17886de19	[Feature] support ep in mixed mode (#3001 ) * [LLM] support ep * Update worker_process.py * Update expert_service.py * Update worker_process.py * format files	2025-07-30 20:43:39 +08:00
Zhida Hu	3f8a41e68c	[*] fix the memory leak when modify qp to rts failed (#3051 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-07-30 19:49:07 +08:00
李泳桦	b242150f94	[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client (#3058 ) * [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client * [fix] delete ci test case for enable_thinking * [fix] add reasoning_parser when server starts * [fix] fix ci consistency test error with reasoning parser * [doc] update docs related to metadata * [fix] cancel enable_thinking default value	2025-07-30 19:25:20 +08:00
bukejiyu	db698bda01	qwen loader (#3057 )	2025-07-30 19:09:38 +08:00
zhink	d89b6dd43f	adapter qwen3 moe attr for init (#3066 ) adapter qwen3 moe attr for init	2025-07-30 16:49:28 +08:00
bukejiyu	8e203666d9	w4a8 offline (#3074 ) * w4a8 offline * update * update * update	2025-07-30 16:33:30 +08:00
ming1753	5acde4eb43	[Feature] Multimodal Scheduler V1 (#3019 ) * [Feature] Support multimodal scheduler v1 * remove debug log * fix bug * fix format * modify code * fix bug * fix bug * fix bug * modify code	2025-07-30 16:05:55 +08:00
Jiang-Jia-Jun	ffa0f4d99b	[Fix] Fix version function (#3076 ) * [Fix] Fix version function * Fix commit * Fix commit * fix code sync * Update coverage_run.sh --------- Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>	2025-07-30 16:05:24 +08:00
ltd0924	ecf2fd5b9a	[BugFix] vl encoder tokens dtype problem (#3069 )	2025-07-30 15:20:53 +08:00
Yuan Xiaolan	35935da9e5	support W4A8 EPLB (#3075 )	2025-07-30 14:34:12 +08:00
Yzc216	159767717d	[Feature] multi source download (#3072 ) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation * modify model download path	2025-07-30 14:10:13 +08:00
YuanRisheng	99a70fc722	unify parallel config (#3070 )	2025-07-30 11:41:23 +08:00
Sunny-bot1	74aa31d15b	[Feature] support bad_words (#3055 ) * support bad_words * support online infer bad_words * update * add CI test * update * update * update --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-07-30 09:31:29 +08:00

1 2 3 4 5 ...

772 Commits