Release/2.1 (#3414)

* Pre ce modified (#3335) (#3360)

* Pre ce modified (#3335)

* update

* update

* fix

* fix

* update

* update

* update

* fix

* update

* update

* update

* add ut fix pr(3367)

* [Bug Fix] Fix V1 video bug (#3387)

* fix stopseq error info (#3342)

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

* [BugFix] Fix default log level of paddleformers (#3377)

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

* [Polish Code] Remove useless notes

* feat(log):add_request_and_response_log (#3392)

* Optimize CI execution workflow. (#3371) (#3384)

* fix

* [BugFix] fix control signal release failed (#3374)

* [BugFix]

* [BugFix]

* [BugFix]

* [BugFix]

* fix

* fix

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Revert "Merge branch 'feature/online/vs_think_20250813' into release/2.1"

This reverts commit 02596fc537, reversing
changes made to 03347626a6.

* [XPU] Fixed the issue of performance degradation caused by enabling ENABLE_V1_KVCACHE_SCHEDULER (#3393)

* fix v1 schedule oom bug

* fix v1 schedule oom bug

* [BugFix] fix ErnieProcessor not set raw_prediction (#3401)

* [Doc]Release fastdeploy-xpu 2.1.0 (#3407)

* fix v1 schedule oom bug

* fix v1 schedule oom bug

* update release note

* [Doc]Release fastdeploy-xpu 2.0.3  (#3408)

* fix v1 schedule oom bug

* fix v1 schedule oom bug

* update release note

* update info

---------

Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
Co-authored-by: ming1753 <61511741+ming1753@users.noreply.github.com>
Co-authored-by: JYChen <zoooo0820@qq.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: xiaolei373 <zley373@gmail.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: yinwei <yinwei_hust@163.com>
Co-authored-by: memoryCoderC <1137889088@qq.com>
This commit is contained in:
luukunn
2025-08-14 20:53:47 +08:00
committed by GitHub
parent e11331927f
commit 132a8ef425
30 changed files with 132 additions and 1068 deletions

View File

@@ -141,7 +141,6 @@ class OpenAIServingChat:
previous_num_tokens = 0
num_prompt_tokens = 0
num_choices = 1
tool_called = False
max_streaming_response_tokens = (
request.max_streaming_response_tokens
if request.max_streaming_response_tokens is not None
@@ -246,28 +245,20 @@ class OpenAIServingChat:
output = res["outputs"]
delta_text = output["text"]
output_top_logprobs = output["top_logprobs"]
previous_num_tokens += len(output["token_ids"])
logprobs_res: Optional[LogProbs] = None
if request.logprobs and output_top_logprobs is not None:
logprobs_res = self._create_chat_logprobs(
output_top_logprobs, request.logprobs, request.top_logprobs
)
if self.engine_client.data_processor.tool_parser_obj and not res["finished"]:
tool_delta_message = output["tool_delta_message"]
if tool_delta_message is None:
continue
delta_message = tool_delta_message
delta_message.reasoning_content = output.get("reasoning_content")
if delta_message.tool_calls:
tool_called = True
else:
delta_message = DeltaMessage(
content=delta_text,
reasoning_content=output.get("reasoning_content"),
prompt_token_ids=None,
completion_token_ids=None,
tool_calls=None,
)
previous_num_tokens += len(output["token_ids"])
delta_message = DeltaMessage(
content=delta_text,
reasoning_content=output.get("reasoning_content"),
prompt_token_ids=None,
completion_token_ids=None,
tool_calls=output.get("tool_call_content", []),
)
choice = ChatCompletionResponseStreamChoice(
index=0,
@@ -285,7 +276,10 @@ class OpenAIServingChat:
max_tokens = request.max_completion_tokens or request.max_tokens
if has_no_token_limit or previous_num_tokens != max_tokens:
choice.finish_reason = "stop"
if tool_called:
if (
self.engine_client.reasoning_parser == "ernie_x1"
and output.get("finish_reason", "") == "tool_calls"
):
choice.finish_reason = "tool_calls"
else:
choice.finish_reason = "length"
@@ -425,7 +419,7 @@ class OpenAIServingChat:
role="assistant",
content=output["text"],
reasoning_content=output.get("reasoning_content"),
tool_calls=output.get("tool_call"),
tool_calls=output.get("tool_call_content"),
prompt_token_ids=prompt_token_ids if request.return_token_ids else None,
completion_token_ids=completion_token_ids if request.return_token_ids else None,
text_after_process=text_after_process if request.return_token_ids else None,
@@ -445,7 +439,7 @@ class OpenAIServingChat:
max_tokens = request.max_completion_tokens or request.max_tokens
if has_no_token_limit or previous_num_tokens != max_tokens:
choice.finish_reason = "stop"
if output.get("tool_call"):
if self.engine_client.reasoning_parser == "ernie_x1" and output.get("finish_reason", "") == "tool_calls":
choice.finish_reason = "tool_calls"
else:
choice.finish_reason = "length"