[CP] CP Lm head fp32 and temp_logprob to release/2.1 (#3766)

* [Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552) * [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing * infer engine support temp_scaled_logprobs and top_p_normalized_logprobs * delete some code * code check * code check and add doc * fix tokenizer.decoder(-1), return 'Invalid Token' * add ci for temp_scaled and top_p logprobs * check test * check seq len time shape * logprob clip inf --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com> * [Precision] Support lm_head layer running in float32 (#3597) * support lm_head fp32 bf16 fp16 * support lm_head fp32 bf16 fp16 * add doc and check code * lm_head_fp32 specify lm_head as fp32 * code check * check doc * code check --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com>
2025-10-07 17:41:52 +08:00 · 2025-09-01 19:56:54 +08:00
parent 4da603daec
commit 1e19833ba5
22 changed files with 188 additions and 54 deletions
--- a/fastdeploy/model_executor/models/utils.py
+++ b/fastdeploy/model_executor/models/utils.py
@@ -23,6 +23,7 @@ import os
 import random
 import re
 import struct
+from contextlib import contextmanager
 from functools import partial
 from typing import Any, NamedTuple, Optional, Union

@@ -533,3 +534,15 @@ def parser_quant_type(quant_type):
            quant_type_list.append(default_type)

        return quant_type_list[0], quant_type_list[1], quant_type_list[2]
+
+
+@contextmanager
+def temporary_dtype(dtype: str):
+    """Temporarily set Paddle default dtype"""
+    orig_dtype = paddle.get_default_dtype()
+    try:
+        if dtype is not None and dtype == "float32":
+            paddle.set_default_dtype(dtype)
+        yield
+    finally:
+        paddle.set_default_dtype(orig_dtype)