FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-11-03 11:02:01 +08:00

Author	SHA1	Message	Date
qwes5s5	2ee91d7a96	[metrics] Add serveral observability metrics (#3868 ) (#4011 ) * [metrics] Add serveral observability metrics (#3868) * Add several observability metrics * [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息 * adjust some metrics and md files * trigger ci * adjust ci file * trigger ci * trigger ci --------- Co-authored-by: K11OntheBoat <your_email@example.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * version adjust --------- Co-authored-by: K11OntheBoat <your_email@example.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-09-10 10:59:57 +08:00
Zero Rains	187ccb0f04	get org_vocab_size from args (#3982 )	2025-09-09 15:08:28 +08:00
chenjian	98b3647aad	[Fix] fix prefix cache in release21 (#3922 ) * fix prefix cache in release21 * fix * Fix when prompt ids is numpy	2025-09-08 11:33:59 +08:00
chenjian	ffec66097c	[optimize] Optimize prefix caching in v1 release/2.1 (#3823 ) * [optimize] Optimize prefix caching in v1 * [optimize] Optimize prefix caching in v1 --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-09-04 19:25:02 +08:00
chen	c2f5c99b1e	check (#3866 )	2025-09-03 20:46:13 +08:00
ltd0924	cc5430e4c2	[BugFix] [CP] fix max streaming tokens invalid (#3798 ) * Update serving_chat.py * Update serving_completion.py	2025-09-02 21:03:36 +08:00
chen	1e19833ba5	[CP] CP Lm head fp32 and temp_logprob to release/2.1 (#3766 ) * [Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552) * [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing * infer engine support temp_scaled_logprobs and top_p_normalized_logprobs * delete some code * code check * code check and add doc * fix tokenizer.decoder(-1), return 'Invalid Token' * add ci for temp_scaled and top_p logprobs * check test * check seq len time shape * logprob clip inf --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com> * [Precision] Support lm_head layer running in float32 (#3597) * support lm_head fp32 bf16 fp16 * support lm_head fp32 bf16 fp16 * add doc and check code * lm_head_fp32 specify lm_head as fp32 * code check * check doc * code check --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com>	2025-09-01 19:56:54 +08:00
chenjian	c49c43d51c	[Bug fix] Fix perf in mixed deployment with yiyan adpater (#3703 ) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-09-01 14:06:09 +08:00
chenjian	a424ab907f	[Bug fix] Fix prefix cache in v1 (#3710 ) * [Bug fix] Fix prefix cache in V1 * add comment	2025-09-01 10:14:25 +08:00
chenjian	10a95f8ed5	[Fix] Do not drop result when request result slowly (#3704 ) * [Fix] Do not drop result when request result slowly * set default FD_ZMQ_SNDHWM to 64k	2025-09-01 10:14:04 +08:00
RAM	b9af800edd	[Optimize] Increase zmq buffer size to prevent apiserver too slowly to consume (#3723 ) (#3728 ) Co-authored-by: chenjian <1435317881@qq.com>	2025-08-30 15:58:18 +08:00
Zero Rains	64cf769bee	fix the bug when num_key_value_heads < tensor_parallel_size (#3722 )	2025-08-30 12:40:29 +08:00
Jiang-Jia-Jun	3364af767b	Revert "[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHE…" (#3719 ) This reverts commit `578b8c5da2`.	2025-08-29 19:55:50 +08:00
lizexu123	578b8c5da2	[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER. (#3670 ) * merge 2.1 * fix * pre-commit * fix	2025-08-29 19:53:44 +08:00
ltd0924	8517e04956	[bugfix]PR3663 parameter is 0 (#3679 ) * Update engine.py * Update engine_client.py * Update engine.py * Update engine.py	2025-08-29 11:46:42 +08:00
李泳桦	aad9d3564e	[feat] add metrics for yiyan adapter (#3615 ) * [feat] add metrics for yiyan adapter (#3219) * [feat] add metrics for yiyan adapter * [fix] fix metrics num_requests_waiting and num_requests_running * [fix] fix metrics gpu_cache_usage_perc * [refactor] change where requests_number increases * [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly * [chore] delete useless code * [fix] fix error	2025-08-28 21:16:58 +08:00
Jiang-Jia-Jun	6039cdc2c5	Revert "[BugFix] fix parameter is 0 (#3663 )" (#3681 ) This reverts commit `6a90cfd144`.	2025-08-28 15:55:55 +08:00
李泳桦	6545994c58	[fix] qwen output inconsistency when top_p=0 (#3634 ) (#3662 ) * [fix] qwen output inconsistency when top_p=0 * [fix] remove decode pre_id code	2025-08-28 09:54:17 +08:00
ltd0924	6a90cfd144	[BugFix] fix parameter is 0 (#3663 ) * Update engine.py * Update engine_client.py	2025-08-28 09:52:17 +08:00
zhuzixuan	80db7fce05	【Bugfix】修复2.1分支上0.3B模型性能大幅下降 (#3624 ) * 恢复异步方法。【BugFix】completion接口echo回显支持 (#3245) * wenxin-tools-511,修复v1/completion无法回显的问题。 * 支持多prompt的回显 * 支持多prompt情况下的流式回显 * 补充了 completion 接口支持 echo 的单元测试 * pre-commit * 移除了多余的test文件 * 修复了completion接口echo支持的单测方法 * 补充了单元测试文件 * 补充单测 * unittest * 补充单测 * 修复单测 * 删除不必要的assert. * 重新提交 * 更新测试方法 * ut * 验证是否是正确思路单测 * 验证是否是正确思路单测 * 验证是否是正确思路单测3 * 优化单测代码，有针对性地缩小单测范围。 * 优化单测代码2，有针对性地缩小单测范围。 * 优化单测代码3，有针对性地缩小单测范围。 * support 'echo' in chat/completion. * update * update * update * update * update * update * 补充了关于tokenid的单元测试 * update * 修正index错误 * 修正index错误 * [Bugfix] Significant performance degradation of 0.3B model on branch 2.1	2025-08-27 15:29:01 +08:00
ltd0924	96aed92e4a	[BugFix] ep mixed mode offline exit failed (#3623 )	2025-08-26 20:12:44 +08:00
SunLei	d8444e22ca	fix: replace list * n initialization with list comprehension to avoid shared references (#3620 )	2025-08-26 17:53:09 +08:00
李泳桦	df27a488b1	[fix] fix ZmqIpcClient.close() error (#3600 )	2025-08-26 10:16:41 +08:00
李泳桦	b1f8f1aa07	[fix] fix completion stream api output_tokens not in usage (#3588 )	2025-08-25 18:31:57 +08:00
zhuzixuan	4e369c7fa7	【BugFix】completion接口echo回显支持 (#3477 ) * update 【BugFix】completion接口echo回显支持 (#3245) * wenxin-tools-511,修复v1/completion无法回显的问题。 * 支持多prompt的回显 * 支持多prompt情况下的流式回显 * 补充了 completion 接口支持 echo 的单元测试 * pre-commit * 移除了多余的test文件 * 修复了completion接口echo支持的单测方法 * 补充了单元测试文件 * 补充单测 * unittest * 补充单测 * 修复单测 * 删除不必要的assert. * 重新提交 * 更新测试方法 * ut * 验证是否是正确思路单测 * 验证是否是正确思路单测 * 验证是否是正确思路单测3 * 优化单测代码，有针对性地缩小单测范围。 * 优化单测代码2，有针对性地缩小单测范围。 * 优化单测代码3，有针对性地缩小单测范围。 * support 'echo' in chat/completion. * update * update * update * update * update * update * 补充了关于tokenid的单元测试 * update * 修正index错误 * 修正index错误 * 解决冲突 * 解决冲突 * 解决冲突 --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-23 13:08:48 +08:00
Zero Rains	f8d3255520	[Cherry-Pick] Launch expert_service before kv_cache initialization in worker_process (#3558 ) * launch expert_service before kv_cache initialization * update code --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-23 13:08:34 +08:00
chenjian	e8af92aab7	[Feature] Support mixed deployment with yiyan adapter (#3533 ) * [Feature] Support mixed deployment with yiyan adapter * [Feature] Support mixed deployment with yiyan adapter * fix merge --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-23 09:56:47 +08:00
K11OntheBoat	8b9f167ccc	Avoid tokenizer bug for XPU CI (#3563 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-08-23 00:09:56 +08:00
K11OntheBoat	93d999b830	[Feature] Support limit thinking len for text models (#3527 ) * support limit thinking len * remove default think_end_id * remove reasoning_max_tokens * update think_end_id for ernie * update think_end_id for ernie. --------- Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”> Co-authored-by: luukunn <981429396@qq.com>	2025-08-22 14:48:15 +08:00
ltd0924	4d6fb96cd6	[BugFix] Api server bugs (#3530 ) * Update serving_chat.py * Update serving_completion.py * Update serving_completion.py	2025-08-22 14:01:14 +08:00
ltd0924	c18975366e	[BUGFIX] fix ep mixed bug (#3513 ) * Update expert_service.py * Update engine.py * Update engine.py * Update engine.py * Update expert_service.py * Update engine.py	2025-08-22 11:35:50 +08:00
luukunn	4a9c04a746	[Feature] add tool parser (#3518 ) * [Feature] Pass through the `chat_template_kwargs` to the data processing module (#3421) * fix chat_template_args * fix args * add offline * add offline * fix * fix * fix default enable_thinking value * fix default enable_thinking value * modify condition * Revert "modify condition" This reverts commit `26430bdeb1`. * fix unit test * add Tool Parser (#3272) * add tool-parser * add tool-parser * add tool parser * add tool parser * fix * add offline * add offline * fix * parsers:tool&reasoning * 修改tool parser名称· * update * fix reasoning-parser * add requirements * fix finish reason * fix * fix reasoning-parser * fix * fix * fix * fix * fix --------- Co-authored-by: zhuzixuan <zhuzixuan@baidu.com> * [Feature] add tool parser (#3483) * add tool parser * add x1 enable_thinking * restart ci * fix vl reasoning parser * modify call style * modify call style * add offline enablethinking * fix completion * fix * fix unit test * fix unit test * fix unit test * fix vl reasoning parser * fix vl reasoning parser * fix unit test --------- Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>	2025-08-22 11:14:35 +08:00
李泳桦	1b399b91c0	[fix] setting disable_chat_template while passing prompt_token_ids led to response error (#3511 ) * [fix] setting disable_chat_template while passing prompt_token_ids led to response error * [fix] code syntax * [test] add test case for this bug * [test] add test case for empty message list * [test] fix test case for empty message list	2025-08-21 17:33:10 +08:00
memoryCoderC	8bf48dfab8	[Feature] add prompt_tokens and completion_tokens (#3505 ) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-08-21 14:10:06 +08:00
lizexu123	fcdc5c2c54	fix num_seqs (#3396 )	2025-08-21 14:03:11 +08:00
luukunn	d07338f932	[Feature] Pass through the `chat_template_kwargs` to the data processing module (#3421 ) (#3469 ) * fix chat_template_args * fix args * add offline * add offline * fix * fix * fix default enable_thinking value * fix default enable_thinking value * modify condition * Revert "modify condition" This reverts commit `26430bdeb1`. * fix unit test	2025-08-19 17:40:12 +08:00
gaoziyuan	3ffbc98179	fix dynamic_weight config bug (#3432 )	2025-08-18 14:36:53 +08:00
chenjian	edd13aad66	support logprob in v1 for release/2.1 (#3446 )	2025-08-17 08:16:00 +08:00
memoryCoderC	ad8ea68906	[BugFix] fix ErnieProcessor not set raw_prediction (#3401 )	2025-08-14 19:10:07 +08:00
yinwei	101605869c	[XPU] Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3393 ) * fix v1 schedule oom bug * fix v1 schedule oom bug	2025-08-14 17:41:40 +08:00
Jiang-Jia-Jun	28918702c2	Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1" This reverts commit `02596fc537`, reversing changes made to `03347626a6`.	2025-08-14 17:20:29 +08:00
Jiang-Jia-Jun	02596fc537	Merge branch 'feature/online/vs_think_20250813' into release/2.1	2025-08-14 17:13:36 +08:00
ltd0924	03347626a6	[BugFix] fix control signal release failed (#3374 ) * [BugFix] * [BugFix] * [BugFix] * [BugFix] * fix * fix --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-08-14 17:01:25 +08:00
xiaolei373	d1d321bafd	feat(log):add_request_and_response_log (#3392 )	2025-08-14 14:50:48 +08:00
Jiang-Jia-Jun	dc5d3ff5a0	[Polish Code] Remove useless notes	2025-08-14 14:05:29 +08:00
Jiang-Jia-Jun	f0a707e06f	[BugFix] Fix default log level of paddleformers (#3377 ) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-08-14 11:36:13 +08:00
JYChen	4870919682	fix stopseq error info (#3342 ) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-08-14 10:45:05 +08:00
ming1753	a375378cc1	[Bug Fix] Fix V1 video bug (#3387 )	2025-08-14 09:49:22 +08:00
luukunn	81092c0fe3	add tool parser	2025-08-13 16:06:22 +08:00
memoryCoderC	37b76158f9	Completion add raw_prediction/text_after_process (#3362 )	2025-08-12 23:20:36 +08:00

1 2 3 4 5 ...

816 Commits