[fix]update apply_chat_template (#4137)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled

* update apply_chat_template

* fix unittest

* fix unittest

* fix

* fix

* fix unit test

* fix

* fix unit test

* add unit test
This commit is contained in:
luukunn
2025-09-24 18:56:32 +08:00
committed by GitHub
parent 7c1fd19f0f
commit 18f4977aec
10 changed files with 146 additions and 109 deletions

View File

@@ -208,7 +208,6 @@ class DataProcessor(BaseDataProcessor):
str: error message
"""
data_processor_logger.info(f"Start processing request: {request}")
request.chat_template = kwargs.get("chat_template")
request = self._apply_default_parameters(request)
if request.get("eos_token_ids") is None or len(request.eos_token_ids) == 0:
request.eos_token_ids = self.eos_token_ids
@@ -242,7 +241,7 @@ class DataProcessor(BaseDataProcessor):
if self.tokenizer.chat_template is None:
raise ValueError("This model does not support chat_template.")
task = request.to_dict()
chat_template_kwargs = kwargs.get("chat_template_kwargs")
chat_template_kwargs = kwargs.get("chat_template_kwargs", {})
if chat_template_kwargs:
if isinstance(chat_template_kwargs, dict):
for k, v in chat_template_kwargs.items():
@@ -251,7 +250,7 @@ class DataProcessor(BaseDataProcessor):
else:
raise ValueError("Invalid input: chat_template_kwargs must be a dict")
task.setdefault("enable_thinking", True)
request.prompt_token_ids = self.messages2ids(task)
request.prompt_token_ids = self.messages2ids(task, **chat_template_kwargs)
else:
raise ValueError(f"The request should have `input_ids`, `text` or `messages`: {request}.")
@@ -316,7 +315,7 @@ class DataProcessor(BaseDataProcessor):
elif request.get("messages"):
if self.tokenizer.chat_template is None:
raise ValueError("This model does not support chat_template.")
chat_template_kwargs = request.get("chat_template_kwargs")
chat_template_kwargs = request.get("chat_template_kwargs", {})
if chat_template_kwargs:
if isinstance(chat_template_kwargs, dict):
for k, v in chat_template_kwargs.items():
@@ -325,7 +324,7 @@ class DataProcessor(BaseDataProcessor):
else:
raise ValueError("Invalid input: chat_template_kwargs must be a dict")
request.setdefault("enable_thinking", True)
request["prompt_token_ids"] = self.messages2ids(request)
request["prompt_token_ids"] = self.messages2ids(request, **chat_template_kwargs)
else:
raise ValueError(f"Request must contain 'prompt_token_ids', 'prompt', or 'messages': {request}")
@@ -530,7 +529,7 @@ class DataProcessor(BaseDataProcessor):
return tokens["input_ids"][0]
def messages2ids(self, request):
def messages2ids(self, request, **kwargs):
"""
Convert multi-turn messages into ID sequences.
@@ -547,7 +546,7 @@ class DataProcessor(BaseDataProcessor):
split_special_tokens=False,
add_special_tokens=False,
return_tensors="pd",
chat_template=request.get("chat_template", None),
**kwargs,
)
request["text_after_process"] = spliced_message
req_id = None