[Precision] Support lm_head layer running in float32 (#3597)

* support lm_head fp32 bf16 fp16 * support lm_head fp32 bf16 fp16 * add doc and check code * lm_head_fp32 specify lm_head as fp32 * code check * check doc
2025-12-24 13:28:13 +08:00 · 2025-08-27 11:34:53 +08:00
parent ad319a87cc
commit ce9c0917c5
15 changed files with 99 additions and 60 deletions
--- a/docs/parameters.md
+++ b/docs/parameters.md
@@ -51,6 +51,7 @@ When using FastDeploy to deploy models (including offline inference and service
 | ```chat_template``` | `str` | Specify the template used for model concatenation, It supports both string input and file path input. The default value is None. If not specified, the model's default template will be used. |
 | ```tool_call_parser``` | `str` | Specify the function call parser to be used for extracting function call content from the model's output. |
 | ```tool_parser_plugin``` | `str` | Specify the file path of the tool parser to be registered, so as to register parsers that are not in the code repository. The code format within these parsers must adhere to the format used in the code repository. |
+| ```lm_head_fp32```       | `bool`      | Specify the dtype of the lm_head layer as FP32. |

 ## 1. Relationship between KVCache allocation, ```num_gpu_blocks_override``` and ```block_size```?