qwes5s5
2ee91d7a96
[metrics] Add serveral observability metrics ( #3868 ) ( #4011 )
...
* [metrics] Add serveral observability metrics (#3868 )
* Add several observability metrics
* [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息
* adjust some metrics and md files
* trigger ci
* adjust ci file
* trigger ci
* trigger ci
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
* version adjust
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-10 10:59:57 +08:00
Zero Rains
187ccb0f04
get org_vocab_size from args ( #3982 )
2025-09-09 15:08:28 +08:00
chenjian
98b3647aad
[Fix] fix prefix cache in release21 ( #3922 )
...
* fix prefix cache in release21
* fix
* Fix when prompt ids is numpy
2025-09-08 11:33:59 +08:00
chenjian
ffec66097c
[optimize] Optimize prefix caching in v1 release/2.1 ( #3823 )
...
* [optimize] Optimize prefix caching in v1
* [optimize] Optimize prefix caching in v1
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-04 19:25:02 +08:00
chen
c2f5c99b1e
check ( #3866 )
2025-09-03 20:46:13 +08:00
ltd0924
cc5430e4c2
[BugFix] [CP] fix max streaming tokens invalid ( #3798 )
...
* Update serving_chat.py
* Update serving_completion.py
2025-09-02 21:03:36 +08:00
chen
1e19833ba5
[CP] CP Lm head fp32 and temp_logprob to release/2.1 ( #3766 )
...
* [Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552 )
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* delete some code
* code check
* code check and add doc
* fix tokenizer.decoder(-1), return 'Invalid Token'
* add ci for temp_scaled and top_p logprobs
* check test
* check seq len time shape
* logprob clip inf
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
* [Precision] Support lm_head layer running in float32 (#3597 )
* support lm_head fp32 bf16 fp16
* support lm_head fp32 bf16 fp16
* add doc and check code
* lm_head_fp32 specify lm_head as fp32
* code check
* check doc
* code check
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
2025-09-01 19:56:54 +08:00
chenjian
c49c43d51c
[Bug fix] Fix perf in mixed deployment with yiyan adpater ( #3703 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-09-01 14:06:09 +08:00
chenjian
a424ab907f
[Bug fix] Fix prefix cache in v1 ( #3710 )
...
* [Bug fix] Fix prefix cache in V1
* add comment
2025-09-01 10:14:25 +08:00
chenjian
10a95f8ed5
[Fix] Do not drop result when request result slowly ( #3704 )
...
* [Fix] Do not drop result when request result slowly
* set default FD_ZMQ_SNDHWM to 64k
2025-09-01 10:14:04 +08:00
RAM
b9af800edd
[Optimize] Increase zmq buffer size to prevent apiserver too slowly to consume ( #3723 ) ( #3728 )
...
Co-authored-by: chenjian <1435317881@qq.com >
2025-08-30 15:58:18 +08:00
Zero Rains
64cf769bee
fix the bug when num_key_value_heads < tensor_parallel_size ( #3722 )
2025-08-30 12:40:29 +08:00
Jiang-Jia-Jun
3364af767b
Revert "[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHE…" ( #3719 )
...
This reverts commit 578b8c5da2 .
2025-08-29 19:55:50 +08:00
lizexu123
578b8c5da2
[BugFix] Modify the bug in Qwen2 when enabling ENABLE_V1_KVCACHE_SCHEDULER. ( #3670 )
...
* merge 2.1
* fix
* pre-commit
* fix
2025-08-29 19:53:44 +08:00
ltd0924
8517e04956
[bugfix]PR3663 parameter is 0 ( #3679 )
...
* Update engine.py
* Update engine_client.py
* Update engine.py
* Update engine.py
2025-08-29 11:46:42 +08:00
李泳桦
aad9d3564e
[feat] add metrics for yiyan adapter ( #3615 )
...
* [feat] add metrics for yiyan adapter (#3219 )
* [feat] add metrics for yiyan adapter
* [fix] fix metrics num_requests_waiting and num_requests_running
* [fix] fix metrics gpu_cache_usage_perc
* [refactor] change where requests_number increases
* [chore] rename xxx_block_num as xxx_gpu_block_num, and update their values accordingly
* [chore] delete useless code
* [fix] fix error
2025-08-28 21:16:58 +08:00
Jiang-Jia-Jun
6039cdc2c5
Revert "[BugFix] fix parameter is 0 ( #3663 )" ( #3681 )
...
This reverts commit 6a90cfd144 .
2025-08-28 15:55:55 +08:00
李泳桦
6545994c58
[fix] qwen output inconsistency when top_p=0 ( #3634 ) ( #3662 )
...
* [fix] qwen output inconsistency when top_p=0
* [fix] remove decode pre_id code
2025-08-28 09:54:17 +08:00
ltd0924
6a90cfd144
[BugFix] fix parameter is 0 ( #3663 )
...
* Update engine.py
* Update engine_client.py
2025-08-28 09:52:17 +08:00
zhuzixuan
80db7fce05
【Bugfix】修复2.1分支上0.3B模型性能大幅下降 ( #3624 )
...
* 恢复异步方法。
【BugFix】completion接口echo回显支持 (#3245 )
* wenxin-tools-511,修复v1/completion无法回显的问题。
* 支持多prompt的回显
* 支持多prompt情况下的流式回显
* 补充了 completion 接口支持 echo 的单元测试
* pre-commit
* 移除了多余的test文件
* 修复了completion接口echo支持的单测方法
* 补充了单元测试文件
* 补充单测
* unittest
* 补充单测
* 修复单测
* 删除不必要的assert.
* 重新提交
* 更新测试方法
* ut
* 验证是否是正确思路单测
* 验证是否是正确思路单测
* 验证是否是正确思路单测3
* 优化单测代码,有针对性地缩小单测范围。
* 优化单测代码2,有针对性地缩小单测范围。
* 优化单测代码3,有针对性地缩小单测范围。
* support 'echo' in chat/completion.
* update
* update
* update
* update
* update
* update
* 补充了关于tokenid的单元测试
* update
* 修正index错误
* 修正index错误
* [Bugfix] Significant performance degradation of 0.3B model on branch 2.1
2025-08-27 15:29:01 +08:00
ltd0924
96aed92e4a
[BugFix] ep mixed mode offline exit failed ( #3623 )
2025-08-26 20:12:44 +08:00
SunLei
d8444e22ca
fix: replace list * n initialization with list comprehension to avoid shared references ( #3620 )
2025-08-26 17:53:09 +08:00
李泳桦
df27a488b1
[fix] fix ZmqIpcClient.close() error ( #3600 )
2025-08-26 10:16:41 +08:00
李泳桦
b1f8f1aa07
[fix] fix completion stream api output_tokens not in usage ( #3588 )
2025-08-25 18:31:57 +08:00
zhuzixuan
4e369c7fa7
【BugFix】completion接口echo回显支持 ( #3477 )
...
* update
【BugFix】completion接口echo回显支持 (#3245 )
* wenxin-tools-511,修复v1/completion无法回显的问题。
* 支持多prompt的回显
* 支持多prompt情况下的流式回显
* 补充了 completion 接口支持 echo 的单元测试
* pre-commit
* 移除了多余的test文件
* 修复了completion接口echo支持的单测方法
* 补充了单元测试文件
* 补充单测
* unittest
* 补充单测
* 修复单测
* 删除不必要的assert.
* 重新提交
* 更新测试方法
* ut
* 验证是否是正确思路单测
* 验证是否是正确思路单测
* 验证是否是正确思路单测3
* 优化单测代码,有针对性地缩小单测范围。
* 优化单测代码2,有针对性地缩小单测范围。
* 优化单测代码3,有针对性地缩小单测范围。
* support 'echo' in chat/completion.
* update
* update
* update
* update
* update
* update
* 补充了关于tokenid的单元测试
* update
* 修正index错误
* 修正index错误
* 解决冲突
* 解决冲突
* 解决冲突
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-23 13:08:48 +08:00
Zero Rains
f8d3255520
[Cherry-Pick] Launch expert_service before kv_cache initialization in worker_process ( #3558 )
...
* launch expert_service before kv_cache initialization
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-23 13:08:34 +08:00
chenjian
e8af92aab7
[Feature] Support mixed deployment with yiyan adapter ( #3533 )
...
* [Feature] Support mixed deployment with yiyan adapter
* [Feature] Support mixed deployment with yiyan adapter
* fix merge
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-23 09:56:47 +08:00
K11OntheBoat
8b9f167ccc
Avoid tokenizer bug for XPU CI ( #3563 )
...
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
2025-08-23 00:09:56 +08:00
K11OntheBoat
93d999b830
[Feature] Support limit thinking len for text models ( #3527 )
...
* support limit thinking len
* remove default think_end_id
* remove reasoning_max_tokens
* update think_end_id for ernie
* update think_end_id for ernie.
---------
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com ”>
Co-authored-by: luukunn <981429396@qq.com >
2025-08-22 14:48:15 +08:00
ltd0924
4d6fb96cd6
[BugFix] Api server bugs ( #3530 )
...
* Update serving_chat.py
* Update serving_completion.py
* Update serving_completion.py
2025-08-22 14:01:14 +08:00
ltd0924
c18975366e
[BUGFIX] fix ep mixed bug ( #3513 )
...
* Update expert_service.py
* Update engine.py
* Update engine.py
* Update engine.py
* Update expert_service.py
* Update engine.py
2025-08-22 11:35:50 +08:00
luukunn
4a9c04a746
[Feature] add tool parser ( #3518 )
...
* [Feature] Pass through the `chat_template_kwargs` to the data processing module (#3421 )
* fix chat_template_args
* fix args
* add offline
* add offline
* fix
* fix
* fix default enable_thinking value
* fix default enable_thinking value
* modify condition
* Revert "modify condition"
This reverts commit 26430bdeb1 .
* fix unit test
* add Tool Parser (#3272 )
* add tool-parser
* add tool-parser
* add tool parser
* add tool parser
* fix
* add offline
* add offline
* fix
* parsers:tool&reasoning
* 修改tool parser名称·
* update
* fix reasoning-parser
* add requirements
* fix finish reason
* fix
* fix reasoning-parser
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: zhuzixuan <zhuzixuan@baidu.com >
* [Feature] add tool parser (#3483 )
* add tool parser
* add x1 enable_thinking
* restart ci
* fix vl reasoning parser
* modify call style
* modify call style
* add offline enablethinking
* fix completion
* fix
* fix unit test
* fix unit test
* fix unit test
* fix vl reasoning parser
* fix vl reasoning parser
* fix unit test
---------
Co-authored-by: zhuzixuan <zhuzixuan@baidu.com >
2025-08-22 11:14:35 +08:00
李泳桦
1b399b91c0
[fix] setting disable_chat_template while passing prompt_token_ids led to response error ( #3511 )
...
* [fix] setting disable_chat_template while passing prompt_token_ids led to response error
* [fix] code syntax
* [test] add test case for this bug
* [test] add test case for empty message list
* [test] fix test case for empty message list
2025-08-21 17:33:10 +08:00
memoryCoderC
8bf48dfab8
[Feature] add prompt_tokens and completion_tokens ( #3505 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-08-21 14:10:06 +08:00
lizexu123
fcdc5c2c54
fix num_seqs ( #3396 )
2025-08-21 14:03:11 +08:00
luukunn
d07338f932
[Feature] Pass through the chat_template_kwargs to the data processing module ( #3421 ) ( #3469 )
...
* fix chat_template_args
* fix args
* add offline
* add offline
* fix
* fix
* fix default enable_thinking value
* fix default enable_thinking value
* modify condition
* Revert "modify condition"
This reverts commit 26430bdeb1 .
* fix unit test
2025-08-19 17:40:12 +08:00
gaoziyuan
3ffbc98179
fix dynamic_weight config bug ( #3432 )
2025-08-18 14:36:53 +08:00
chenjian
edd13aad66
support logprob in v1 for release/2.1 ( #3446 )
2025-08-17 08:16:00 +08:00
memoryCoderC
ad8ea68906
[BugFix] fix ErnieProcessor not set raw_prediction ( #3401 )
2025-08-14 19:10:07 +08:00
yinwei
101605869c
[XPU] Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER ( #3393 )
...
* fix v1 schedule oom bug
* fix v1 schedule oom bug
2025-08-14 17:41:40 +08:00
Jiang-Jia-Jun
28918702c2
Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1"
...
This reverts commit 02596fc537 , reversing
changes made to 03347626a6 .
2025-08-14 17:20:29 +08:00
Jiang-Jia-Jun
02596fc537
Merge branch 'feature/online/vs_think_20250813' into release/2.1
2025-08-14 17:13:36 +08:00
ltd0924
03347626a6
[BugFix] fix control signal release failed ( #3374 )
...
* [BugFix]
* [BugFix]
* [BugFix]
* [BugFix]
* fix
* fix
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-08-14 17:01:25 +08:00
xiaolei373
d1d321bafd
feat(log):add_request_and_response_log ( #3392 )
2025-08-14 14:50:48 +08:00
Jiang-Jia-Jun
dc5d3ff5a0
[Polish Code] Remove useless notes
2025-08-14 14:05:29 +08:00
Jiang-Jia-Jun
f0a707e06f
[BugFix] Fix default log level of paddleformers ( #3377 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-08-14 11:36:13 +08:00
JYChen
4870919682
fix stopseq error info ( #3342 )
...
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-08-14 10:45:05 +08:00
ming1753
a375378cc1
[Bug Fix] Fix V1 video bug ( #3387 )
2025-08-14 09:49:22 +08:00
luukunn
81092c0fe3
add tool parser
2025-08-13 16:06:22 +08:00
memoryCoderC
37b76158f9
Completion add raw_prediction/text_after_process ( #3362 )
2025-08-12 23:20:36 +08:00