Commit Graph

59 Commits

Author SHA1 Message Date
ltd0924
6a90cfd144 [BugFix] fix parameter is 0 (#3663)
* Update engine.py

* Update engine_client.py
2025-08-28 09:52:17 +08:00
zhuzixuan
80db7fce05 【Bugfix】修复2.1分支上0.3B模型性能大幅下降 (#3624)
* 恢复异步方法。
【BugFix】completion接口echo回显支持 (#3245)

* wenxin-tools-511,修复v1/completion无法回显的问题。

* 支持多prompt的回显

* 支持多prompt情况下的流式回显

* 补充了 completion 接口支持 echo 的单元测试

* pre-commit

* 移除了多余的test文件

* 修复了completion接口echo支持的单测方法

* 补充了单元测试文件

* 补充单测

* unittest

* 补充单测

* 修复单测

* 删除不必要的assert.

* 重新提交

* 更新测试方法

* ut

* 验证是否是正确思路单测

* 验证是否是正确思路单测

* 验证是否是正确思路单测3

* 优化单测代码,有针对性地缩小单测范围。

* 优化单测代码2,有针对性地缩小单测范围。

* 优化单测代码3,有针对性地缩小单测范围。

* support 'echo' in chat/completion.

* update

* update

* update

* update

* update

* update

* 补充了关于tokenid的单元测试

* update

* 修正index错误

* 修正index错误

* [Bugfix] Significant performance degradation of 0.3B model on branch 2.1
2025-08-27 15:29:01 +08:00
SunLei
d8444e22ca fix: replace list * n initialization with list comprehension to avoid shared references (#3620) 2025-08-26 17:53:09 +08:00
李泳桦
b1f8f1aa07 [fix] fix completion stream api output_tokens not in usage (#3588) 2025-08-25 18:31:57 +08:00
zhuzixuan
4e369c7fa7 【BugFix】completion接口echo回显支持 (#3477)
* update
【BugFix】completion接口echo回显支持 (#3245)

* wenxin-tools-511,修复v1/completion无法回显的问题。

* 支持多prompt的回显

* 支持多prompt情况下的流式回显

* 补充了 completion 接口支持 echo 的单元测试

* pre-commit

* 移除了多余的test文件

* 修复了completion接口echo支持的单测方法

* 补充了单元测试文件

* 补充单测

* unittest

* 补充单测

* 修复单测

* 删除不必要的assert.

* 重新提交

* 更新测试方法

* ut

* 验证是否是正确思路单测

* 验证是否是正确思路单测

* 验证是否是正确思路单测3

* 优化单测代码,有针对性地缩小单测范围。

* 优化单测代码2,有针对性地缩小单测范围。

* 优化单测代码3,有针对性地缩小单测范围。

* support 'echo' in chat/completion.

* update

* update

* update

* update

* update

* update

* 补充了关于tokenid的单元测试

* update

* 修正index错误

* 修正index错误

* 解决冲突

* 解决冲突

* 解决冲突

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-23 13:08:48 +08:00
chenjian
e8af92aab7 [Feature] Support mixed deployment with yiyan adapter (#3533)
* [Feature] Support mixed deployment with yiyan adapter

* [Feature] Support mixed deployment with yiyan adapter

* fix merge

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-23 09:56:47 +08:00
K11OntheBoat
93d999b830 [Feature] Support limit thinking len for text models (#3527)
* support limit thinking len

* remove default think_end_id

* remove reasoning_max_tokens

* update think_end_id for ernie

* update think_end_id for ernie.

---------

Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: luukunn <981429396@qq.com>
2025-08-22 14:48:15 +08:00
ltd0924
4d6fb96cd6 [BugFix] Api server bugs (#3530)
* Update serving_chat.py

* Update serving_completion.py

* Update serving_completion.py
2025-08-22 14:01:14 +08:00
luukunn
4a9c04a746 [Feature] add tool parser (#3518)
* [Feature] Pass through the `chat_template_kwargs` to the data processing module (#3421)

* fix chat_template_args

* fix args

* add offline

* add offline

* fix

* fix

* fix default enable_thinking value

* fix default enable_thinking value

* modify condition

* Revert "modify condition"

This reverts commit 26430bdeb1.

* fix unit test

* add Tool Parser (#3272)

* add tool-parser

* add tool-parser

* add tool parser

* add tool parser

* fix

* add offline

* add offline

* fix

* parsers:tool&reasoning

* 修改tool parser名称·

* update

* fix reasoning-parser

* add requirements

* fix finish reason

* fix

* fix reasoning-parser

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>

* [Feature] add tool parser (#3483)

* add tool parser

* add x1 enable_thinking

* restart ci

* fix vl reasoning parser

* modify call style

* modify call style

* add offline enablethinking

* fix completion

* fix

* fix unit test

* fix unit test

* fix unit test

* fix vl reasoning parser

* fix vl reasoning parser

* fix unit test

---------

Co-authored-by: zhuzixuan <zhuzixuan@baidu.com>
2025-08-22 11:14:35 +08:00
李泳桦
1b399b91c0 [fix] setting disable_chat_template while passing prompt_token_ids led to response error (#3511)
* [fix] setting disable_chat_template while passing prompt_token_ids led to response error

* [fix] code syntax

* [test] add test case for this bug

* [test] add test case for empty message list

* [test] fix test case for empty message list
2025-08-21 17:33:10 +08:00
memoryCoderC
8bf48dfab8 [Feature] add prompt_tokens and completion_tokens (#3505)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-08-21 14:10:06 +08:00
luukunn
d07338f932 [Feature] Pass through the chat_template_kwargs to the data processing module (#3421) (#3469)
* fix chat_template_args

* fix args

* add offline

* add offline

* fix

* fix

* fix default enable_thinking value

* fix default enable_thinking value

* modify condition

* Revert "modify condition"

This reverts commit 26430bdeb1.

* fix unit test
2025-08-19 17:40:12 +08:00
Jiang-Jia-Jun
28918702c2 Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1"
This reverts commit 02596fc537, reversing
changes made to 03347626a6.
2025-08-14 17:20:29 +08:00
Jiang-Jia-Jun
02596fc537 Merge branch 'feature/online/vs_think_20250813' into release/2.1 2025-08-14 17:13:36 +08:00
ltd0924
03347626a6 [BugFix] fix control signal release failed (#3374)
* [BugFix]

* [BugFix]

* [BugFix]

* [BugFix]

* fix

* fix

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-08-14 17:01:25 +08:00
xiaolei373
d1d321bafd feat(log):add_request_and_response_log (#3392) 2025-08-14 14:50:48 +08:00
JYChen
4870919682 fix stopseq error info (#3342)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-08-14 10:45:05 +08:00
luukunn
81092c0fe3 add tool parser 2025-08-13 16:06:22 +08:00
memoryCoderC
37b76158f9 Completion add raw_prediction/text_after_process (#3362) 2025-08-12 23:20:36 +08:00
memoryCoderC
fe2094609f Release/2.1 (#3361)
* [BugFix] v1/completions add finish_reason

* update TestOpenAIServingCompletion for merge
2025-08-12 23:06:51 +08:00
ltd0924
6706ccb37e [BugFix] fix too many open files problem (#3275) 2025-08-08 20:11:32 +08:00
JYChen
1b6f482c15 [Cherry-pick] fix stop seq (#3263)
* fix out-bound value for stop sequence

* catch error if there are out-of-bounds value

* check in offline mode
2025-08-07 19:11:37 +08:00
sg263
5d3bf308f6 merge develop trace FD_START (#3253)
Co-authored-by: shige <shige@baidu.com>
2025-08-07 11:10:55 +08:00
SunLei
3dd8492601 [Bugfix] Fix uninitialized decoded_token and add corresponding unit test (#3201)
* Update test_base_chat.py (#3183)

* [Bugfix] Fix uninitialized decoded_token and add corresponding unit test.

---------

Co-authored-by: Divano <dddivano@outlook.com>
2025-08-05 10:55:22 +08:00
SunLei
dade19d7a4 [Feature] General support for logprobs (#2974)
* [Feature] support logprobs in chat/completions and completions endpoints

* Temporarily comment out text_offset due to incorrect logic

* Clean up temporary debug prints

* [Feature] support logprobs in offline mode via SamplingParams

* fix: serialize Logprob as dict before zmq send to fix msgpack error

* refactor: remove redundant methods to simplify codebase

* Fix missing fields in CompletionOutput.to_dict affecting msgpack serialization

* refactor: centralize param validation in engine_client to reduce duplication

* revert: rollback changes in offline_demo.py

* revert: rollback changes in offline_demo.py

* [bugfix] fix parameter validation for logprobs

* [bugfix] fix parameter validation for logprobs

* [bugfix] fix parameter validation for logprobs

* [bugfix] fix parameter validation for logprobs

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 20:25:56 +08:00
LiqinruiG
25005fee30 [Doc] add chat_template_kwagrs and update params docs (#3103)
* add chat_template_kwagrs and update params docs

* add chat_template_kwagrs and update params docs

* update enable_thinking

* pre-commit

* update test case

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-31 19:44:06 +08:00
Jiang-Jia-Jun
0616c208d2 [Feature] Support include_stop_str_in_output in completion api (#3096)
* [Feature] Support include_stop_str_in_output in completion api

* Fix ci test

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-30 22:18:48 +08:00
李泳桦
b242150f94 [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client (#3058)
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client

* [fix] delete ci test case for enable_thinking

* [fix] add reasoning_parser when server starts

* [fix] fix ci consistency test error with reasoning parser

* [doc] update docs related to metadata

* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
Sunny-bot1
74aa31d15b [Feature] support bad_words (#3055)
* support bad_words

* support online infer bad_words

* update

* add CI test

* update

* update

* update

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-07-30 09:31:29 +08:00
李泳桦
69996a40da [feat] add disable_chat_template in chat api as a substitute for previous raw_request (#3020)
* [feat] add disable_chat_template in chat api as a substitute for previous raw_request

* [fix] pre-commit code check
2025-07-25 20:57:32 +08:00
Zero Rains
0fb37ab7e4 update flake8 version to support pre-commit in python3.12 (#3000)
* update flake8 version to support pre-commit in python3.12

* polish code
2025-07-24 01:43:31 -07:00
ltd0924
f935d6f862 [BugFix] fix multinode deployment (#2977) 2025-07-24 15:04:04 +08:00
Yzc216
e14587a954 [Feature] multi-source download (#2986)
* multi-source download

* multi-source download

* huggingface download revision

* requirement

* style

* add revision arg

* test

* pre-commit
2025-07-24 14:26:37 +08:00
李泳桦
2a8a2c06de [fix] non-streaming api now returns full output ids if return_token_ids is enabled (#2951) 2025-07-22 14:35:56 +08:00
Jiang-Jia-Jun
56102e91e1 [Polish] Return error message of raw_request (#2946)
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-22 10:21:32 +08:00
李泳桦
8a619e9db5 [Feature] Add return_token_ids, prompt_token_ids, and delete training, raw_request in request body (#2940)
* [feat] add return_token_ids, prompt_token_ids, delete raw_request in request body

* [fix] return_token_ids not working in curl request

* [test] improve some test cases of return_token_ids and prompt_token_ids

* [fix] the server responds ok even if request.messages is an empty list
2025-07-21 19:31:14 +08:00
Yuanle Liu
2f74e93d7e use dist.all_reduce(min) to sync num_blocks_local (#2933)
* pre-commit all files check

* reduce min num_blocks_local

* fix nranks=1

* pre-commit when commit-msg
2025-07-21 01:23:36 -07:00
lizexu123
67990e0572 [Feature] support min_p_sampling (#2872)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* Fastdeploy support min_p

* add test_min_p

* fix

* min_p_sampling

* update

* delete vl_gpu_model_runner.py

* fix

* Align usage of min_p with vLLM

* fix

* modified unit test

* fix test_min_sampling

* pre-commit all files

* fix

* fix

* fix

* fix xpu_model_runner.py
2025-07-20 23:17:59 -07:00
ltd0924
cc4cec0a74 Update engine_client.py (#2931) 2025-07-21 11:42:16 +08:00
Zero Rains
25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun
fbe3547c95 [Feature] Support include_stop_str_in_output in chat/completion (#2910)
* [Feature] Support include_stop_str_in_output in chat/completion

* Add ci test for include_stop_str_in_output

* Update version of openai

* Fix ci test

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-18 16:59:18 +08:00
sg263
e679567d59 [Trace]fix opentelemetry can not work in uvicorn (#2906)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* fix annotation

* fix annotation when add opentelemetry

* fix opentelemetry-instrumentation-fastapi

* fix pentelemetry-bootstrap

* fix opentelemetry can not work in uvicorn

* move conf to env

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-17 23:16:45 +08:00
Jiang-Jia-Jun
31cab9f87b Update test_openai.py 2025-07-17 16:07:31 +08:00
Jiang-Jia-Jun
d3dfa1446c Update test_openai.py 2025-07-17 16:07:07 +08:00
ltd0924
9c25dcca0b [LLM] Update Multinode Deployment (#2830)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* [LLM] fix multinode bugs

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] update multinode deployment

* [LLM] fix ci bugs

* Update fastdeploy/engine/args_utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [LLM] update random port

* [LLM] update random port

* [LLM] fix ci bugs

* fix ci bugs

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-16 23:42:54 +08:00
ltd0924
d245d1ca6c [LLM] support send batch data and aggregate data (#2860)
* [LLM] support send batch data and aggregate data

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] fix ci bugs

* [LLM] update
2025-07-16 23:42:20 +08:00
sg263
42b80182e0 [Trace] add opentelemetry (#2852)
* add opentelemetry

* add opentelemetry

* add opentelemetry on dequeue

* add opentelemetry on dequeue

* add opentelemetry on dequeue

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-07-16 15:33:25 +08:00
lddfym
ece88596ed fix spelling error (#2827) 2025-07-14 13:12:57 +08:00
zhenwenDang
d48c03413f Feature/logprob bug fix (#2817)
Some checks failed
Deploy GitHub Pages / deploy (push) Has been cancelled
* fix: handle missing logprobs at step 0 and incorrect finish reason with max_completion_tokens

* Prevent response_logprobs.logprob_token_ids[0] from going out of bounds
2025-07-12 16:48:51 +08:00
lddfym
b5e4288704 Global scheduler supports configuring hot updates (#2807)
* Check if the controller port is available

* Global scheduler supports configuring hot updates

* add interface: /controller/scheduler

* add interface: /controller/scheduler
2025-07-11 13:38:07 +08:00