[Feature] Support return logprob of generated tokens (#2784)

* online chat support logprobs * check xpu * check vl_gpu_model_runner * only cuda support logprob * get_worker() check platform --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-10-06 09:07:10 +08:00 · 2025-07-10 15:47:42 +08:00
parent 39d2a1de46
commit 823a47e64a
21 changed files with 592 additions and 105 deletions
--- a/fastdeploy/input/text_processor.py
+++ b/fastdeploy/input/text_processor.py
@@ -309,6 +309,10 @@ class DataProcessor(BaseDataProcessor):
        data_processor_logger.info(f"Processed request {request}")
        return request

+    def process_logprob_response(self, token_ids, **kwargs):
+        full_text = self.tokenizer.decode(token_ids, **kwargs)
+        return full_text
+
    def process_response(self, response_dict, **kwargs):
        """
        Preprocess the response