Commit Graph

816 Commits

Author SHA1 Message Date
qwes5s5
2ee91d7a96 [metrics] Add serveral observability metrics (#3868) (#4011)
* [metrics] Add serveral observability metrics (#3868)

* Add several observability metrics

* [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息

* adjust some metrics and md files

* trigger ci

* adjust ci file

* trigger ci

* trigger ci

---------

Co-authored-by: K11OntheBoat <your_email@example.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* version adjust

---------

Co-authored-by: K11OntheBoat <your_email@example.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-10 10:59:57 +08:00
Zero Rains
187ccb0f04 get org_vocab_size from args (#3982) 2025-09-09 15:08:28 +08:00
chenjian
98b3647aad [Fix] fix prefix cache in release21 (#3922)
* fix prefix cache in release21

* fix

* Fix when prompt ids is numpy
2025-09-08 11:33:59 +08:00
chenjian
ffec66097c [optimize] Optimize prefix caching in v1 release/2.1 (#3823)
* [optimize] Optimize prefix caching in v1

* [optimize] Optimize prefix caching in v1

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-04 19:25:02 +08:00
chen
c2f5c99b1e check (#3866) 2025-09-03 20:46:13 +08:00
ltd0924
cc5430e4c2 [BugFix] [CP] fix max streaming tokens invalid (#3798)
* Update serving_chat.py

* Update serving_completion.py
2025-09-02 21:03:36 +08:00
chen
1e19833ba5 [CP] CP Lm head fp32 and temp_logprob to release/2.1 (#3766)
* [Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552)

* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing

* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs

* delete some code

* code check

* code check and add doc

* fix tokenizer.decoder(-1), return 'Invalid Token'

* add ci for temp_scaled and top_p logprobs

* check test

* check seq len time shape

* logprob clip inf

---------

Co-authored-by: sunlei1024 <sunlei5788@gmail.com>

* [Precision] Support lm_head layer running in float32 (#3597)

* support lm_head fp32 bf16 fp16

* support lm_head fp32 bf16 fp16

* add doc and check code

* lm_head_fp32 specify lm_head as fp32

* code check

* check doc

* code check

---------

Co-authored-by: sunlei1024 <sunlei5788@gmail.com>
2025-09-01 19:56:54 +08:00
chenjian
c49c43d51c [Bug fix] Fix perf in mixed deployment with yiyan adpater (#3703)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-09-01 14:06:09 +08:00
chenjian
a424ab907f [Bug fix] Fix prefix cache in v1 (#3710)
* [Bug fix] Fix prefix cache in V1

* add comment
2025-09-01 10:14:25 +08:00
chenjian
10a95f8ed5 [Fix] Do not drop result when request result slowly (#3704)
* [Fix] Do not drop result when request result slowly

* set default FD_ZMQ_SNDHWM to 64k
2025-09-01 10:14:04 +08:00
RAM
b9af800edd [Optimize] Increase zmq buffer size to prevent apiserver too slowly to consume (#3723) (#3728)
Co-authored-by: chenjian <1435317881@qq.com>
2025-08-30 15:58:18 +08:00
Zero Rains
64cf769bee fix the bug when num_key_value_heads < tensor_parallel_size (#3722) 2025-08-30 12:40:29 +08:00
Jiang-Jia-Jun
3364af767b Revert "[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHE…" (#3719)
This reverts commit 578b8c5da2.
2025-08-29 19:55:50 +08:00
lizexu123
578b8c5da2 [BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER. (#3670)
* merge 2.1

* fix

* pre-commit

* fix
2025-08-29 19:53:44 +08:00
ltd0924
8517e04956 [bugfix]PR3663 parameter is 0 (#3679)
* Update engine.py

* Update engine_client.py

* Update engine.py

* Update engine.py
2025-08-29 11:46:42 +08:00
李泳桦
aad9d3564e [feat] add metrics for yiyan adapter (#3615)
* [feat] add metrics for yiyan adapter (#3219)

* [feat] add metrics for yiyan adapter

* [fix] fix metrics num_requests_waiting and num_requests_running

* [fix] fix metrics gpu_cache_usage_perc

* [refactor] change where requests_number increases

* [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly

* [chore] delete useless code

* [fix] fix error
2025-08-28 21:16:58 +08:00
Jiang-Jia-Jun
6039cdc2c5 Revert "[BugFix] fix parameter is 0 (#3663)" (#3681)
This reverts commit 6a90cfd144.
2025-08-28 15:55:55 +08:00
李泳桦
6545994c58 [fix] qwen output inconsistency when top_p=0 (#3634) (#3662)
* [fix] qwen output inconsistency when top_p=0

* [fix] remove decode pre_id code
2025-08-28 09:54:17 +08:00
ltd0924
6a90cfd144 [BugFix] fix parameter is 0 (#3663)
* Update engine.py

* Update engine_client.py
2025-08-28 09:52:17 +08:00
zhuzixuan
80db7fce05 【Bugfix】修复2.1分支上0.3B模型性能大幅下降 (#3624)
* 恢复异步方法。
【BugFix】completion接口echo回显支持 (#3245)

* wenxin-tools-511,修复v1/completion无法回显的问题。

* 支持多prompt的回显

* 支持多prompt情况下的流式回显

* 补充了 completion 接口支持 echo 的单元测试

* pre-commit

* 移除了多余的test文件

* 修复了completion接口echo支持的单测方法

* 补充了单元测试文件

* 补充单测

* unittest

* 补充单测

* 修复单测

* 删除不必要的assert.

* 重新提交

* 更新测试方法

* ut

* 验证是否是正确思路单测

* 验证是否是正确思路单测

* 验证是否是正确思路单测3

* 优化单测代码,有针对性地缩小单测范围。

* 优化单测代码2,有针对性地缩小单测范围。

* 优化单测代码3,有针对性地缩小单测范围。

* support 'echo' in chat/completion.

* update

* update

* update

* update

* update

* update

* 补充了关于tokenid的单元测试

* update

* 修正index错误

* 修正index错误

* [Bugfix] Significant performance degradation of 0.3B model on branch 2.1
2025-08-27 15:29:01 +08:00
ltd0924
96aed92e4a [BugFix] ep mixed mode offline exit failed (#3623) 2025-08-26 20:12:44 +08:00
SunLei
d8444e22ca fix: replace list * n initialization with list comprehension to avoid shared references (#3620) 2025-08-26 17:53:09 +08:00
李泳桦
df27a488b1 [fix] fix ZmqIpcClient.close() error (#3600) 2025-08-26 10:16:41 +08:00
李泳桦
b1f8f1aa07 [fix] fix completion stream api output_tokens not in usage (#3588) 2025-08-25 18:31:57 +08:00
zhuzixuan
4e369c7fa7 【BugFix】completion接口echo回显支持 (#3477)
* update
【BugFix】completion接口echo回显支持 (#3245)

* wenxin-tools-511,修复v1/completion无法回显的问题。

* 支持多prompt的回显

* 支持多prompt情况下的流式回显

* 补充了 completion 接口支持 echo 的单元测试

* pre-commit

* 移除了多余的test文件

* 修复了completion接口echo支持的单测方法

* 补充了单元测试文件

* 补充单测

* unittest

* 补充单测

* 修复单测

* 删除不必要的assert.

* 重新提交

* 更新测试方法

* ut

* 验证是否是正确思路单测

* 验证是否是正确思路单测

* 验证是否是正确思路单测3

* 优化单测代码,有针对性地缩小单测范围。

* 优化单测代码2,有针对性地缩小单测范围。

* 优化单测代码3,有针对性地缩小单测范围。

* support 'echo' in chat/completion.

* update

* update

* update

* update

* update

* update

* 补充了关于tokenid的单元测试

* update

* 修正index错误

* 修正index错误

* 解决冲突

* 解决冲突

* 解决冲突

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-23 13:08:48 +08:00
Zero Rains
f8d3255520 [Cherry-Pick] Launch expert_service before kv_cache initialization in worker_process (#3558)
* launch expert_service before kv_cache initialization

* update code

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-23 13:08:34 +08:00
chenjian
e8af92aab7 [Feature] Support mixed deployment with yiyan adapter (#3533)
* [Feature] Support mixed deployment with yiyan adapter

* [Feature] Support mixed deployment with yiyan adapter

* fix merge

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-23 09:56:47 +08:00
K11OntheBoat
8b9f167ccc Avoid tokenizer bug for XPU CI (#3563)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-08-23 00:09:56 +08:00
K11OntheBoat
93d999b830 [Feature] Support limit thinking len for text models (#3527)
* support limit thinking len

* remove default think_end_id

* remove reasoning_max_tokens

* update think_end_id for ernie

* update think_end_id for ernie.

---------

Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: luukunn <981429396@qq.com>
2025-08-22 14:48:15 +08:00
ltd0924
4d6fb96cd6 [BugFix] Api server bugs (#3530)
* Update serving_chat.py

* Update serving_completion.py

* Update serving_completion.py
2025-08-22 14:01:14 +08:00
ltd0924
c18975366e [BUGFIX] fix ep mixed bug (#3513)
* Update expert_service.py

* Update engine.py

* Update engine.py

* Update engine.py

* Update expert_service.py

* Update engine.py
2025-08-22 11:35:50 +08:00
luukunn
4a9c04a746 [Feature] add tool parser (#3518)
* [Feature] Pass through the `chat_template_kwargs` to the data processing module (#3421)

* fix chat_template_args

* fix args

* add offline

* add offline

* fix

* fix

* fix default enable_thinking value

* fix default enable_thinking value

* modify condition

* Revert "modify condition"

This reverts commit 26430bdeb1.

* fix unit test

* add Tool Parser (#3272)

* add tool-parser

* add tool-parser

* add tool parser

* add tool parser

* fix

* add offline

* add offline

* fix

* parsers:tool&reasoning

* 修改tool parser名称·

* update

* fix reasoning-parser

* add requirements

* fix finish reason

* fix

* fix reasoning-parser

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>

* [Feature] add tool parser (#3483)

* add tool parser

* add x1 enable_thinking

* restart ci

* fix vl reasoning parser

* modify call style

* modify call style

* add offline enablethinking

* fix completion

* fix

* fix unit test

* fix unit test

* fix unit test

* fix vl reasoning parser

* fix vl reasoning parser

* fix unit test

---------

Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>
2025-08-22 11:14:35 +08:00
李泳桦
1b399b91c0 [fix] setting disable_chat_template while passing prompt_token_ids led to response error (#3511)
* [fix] setting disable_chat_template while passing prompt_token_ids led to response error

* [fix] code syntax

* [test] add test case for this bug

* [test] add test case for empty message list

* [test] fix test case for empty message list
2025-08-21 17:33:10 +08:00
memoryCoderC
8bf48dfab8 [Feature] add prompt_tokens and completion_tokens (#3505)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-08-21 14:10:06 +08:00
lizexu123
fcdc5c2c54 fix num_seqs (#3396) 2025-08-21 14:03:11 +08:00
luukunn
d07338f932 [Feature] Pass through the chat_template_kwargs to the data processing module (#3421) (#3469)
* fix chat_template_args

* fix args

* add offline

* add offline

* fix

* fix

* fix default enable_thinking value

* fix default enable_thinking value

* modify condition

* Revert "modify condition"

This reverts commit 26430bdeb1.

* fix unit test
2025-08-19 17:40:12 +08:00
gaoziyuan
3ffbc98179 fix dynamic_weight config bug (#3432) 2025-08-18 14:36:53 +08:00
chenjian
edd13aad66 support logprob in v1 for release/2.1 (#3446) 2025-08-17 08:16:00 +08:00
memoryCoderC
ad8ea68906 [BugFix] fix ErnieProcessor not set raw_prediction (#3401) 2025-08-14 19:10:07 +08:00
yinwei
101605869c [XPU] Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3393)
* fix v1 schedule oom bug

* fix v1 schedule oom bug
2025-08-14 17:41:40 +08:00
Jiang-Jia-Jun
28918702c2 Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1"
This reverts commit 02596fc537, reversing
changes made to 03347626a6.
2025-08-14 17:20:29 +08:00
Jiang-Jia-Jun
02596fc537 Merge branch 'feature/online/vs_think_20250813' into release/2.1 2025-08-14 17:13:36 +08:00
ltd0924
03347626a6 [BugFix] fix control signal release failed (#3374)
* [BugFix]

* [BugFix]

* [BugFix]

* [BugFix]

* fix

* fix

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-14 17:01:25 +08:00
xiaolei373
d1d321bafd feat(log):add_request_and_response_log (#3392) 2025-08-14 14:50:48 +08:00
Jiang-Jia-Jun
dc5d3ff5a0 [Polish Code] Remove useless notes 2025-08-14 14:05:29 +08:00
Jiang-Jia-Jun
f0a707e06f [BugFix] Fix default log level of paddleformers (#3377)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-08-14 11:36:13 +08:00
JYChen
4870919682 fix stopseq error info (#3342)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-08-14 10:45:05 +08:00
ming1753
a375378cc1 [Bug Fix] Fix V1 video bug (#3387) 2025-08-14 09:49:22 +08:00
luukunn
81092c0fe3 add tool parser 2025-08-13 16:06:22 +08:00
memoryCoderC
37b76158f9 Completion add raw_prediction/text_after_process (#3362) 2025-08-12 23:20:36 +08:00