mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
update doc (#4675)
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
This commit is contained in:
@@ -198,6 +198,8 @@ For ``LLM`` configuration, refer to [Parameter Documentation](parameters.md).
|
||||
* finished(bool): Completion status
|
||||
* metrics(fastdeploy.engine.request.RequestMetrics): Performance metrics
|
||||
* num_cached_tokens(int): Cached token count (only valid when enable_prefix_caching``` is enabled)
|
||||
* num_input_image_tokens(int): Number of input image tokens.
|
||||
* num_input_video_tokens(int): Number of input video tokens.
|
||||
* error_code(int): Error code
|
||||
* error_msg(str): Error message
|
||||
|
||||
|
||||
@@ -238,6 +238,19 @@ ChatMessage:
|
||||
completion_token_ids: Optional[List[int]] = None
|
||||
prompt_tokens: Optional[str] = None
|
||||
completion_tokens: Optional[str] = None
|
||||
UsageInfo:
|
||||
prompt_tokens: int = 0
|
||||
total_tokens: int = 0
|
||||
completion_tokens: Optional[int] = 0
|
||||
prompt_tokens_details: Optional[PromptTokenUsageInfo] = None
|
||||
completion_tokens_details: Optional[CompletionTokenUsageInfo] = None
|
||||
PromptTokenUsageInfo:
|
||||
cached_tokens: Optional[int] = None
|
||||
image_tokens: Optional[int] = None
|
||||
video_tokens: Optional[int] = None
|
||||
CompletionTokenUsageInfo:
|
||||
reasoning_tokens: Optional[int] = None
|
||||
image_tokens: Optional[int] = None
|
||||
ToolCall:
|
||||
id: str = None
|
||||
type: Literal["function"] = "function"
|
||||
@@ -414,6 +427,19 @@ CompletionResponseChoice:
|
||||
reasoning_content: Optional[str] = None
|
||||
finish_reason: Optional[Literal["stop", "length", "tool_calls"]]
|
||||
tool_calls: Optional[List[DeltaToolCall | ToolCall]] = None
|
||||
UsageInfo:
|
||||
prompt_tokens: int = 0
|
||||
total_tokens: int = 0
|
||||
completion_tokens: Optional[int] = 0
|
||||
prompt_tokens_details: Optional[PromptTokenUsageInfo] = None
|
||||
completion_tokens_details: Optional[CompletionTokenUsageInfo] = None
|
||||
PromptTokenUsageInfo:
|
||||
cached_tokens: Optional[int] = None
|
||||
image_tokens: Optional[int] = None
|
||||
video_tokens: Optional[int] = None
|
||||
CompletionTokenUsageInfo:
|
||||
reasoning_tokens: Optional[int] = None
|
||||
image_tokens: Optional[int] = None
|
||||
ToolCall:
|
||||
id: str = None
|
||||
type: Literal["function"] = "function"
|
||||
|
||||
@@ -198,6 +198,8 @@ for output in outputs:
|
||||
* finished(bool):标识当前query 是否推理结束
|
||||
* metrics(fastdeploy.engine.request.RequestMetrics):记录推理耗时指标
|
||||
* num_cached_tokens(int): 缓存的token数量, 仅在开启 ``enable_prefix_caching``时有效
|
||||
* num_input_image_tokens(int): 输入图片token的数量
|
||||
* num_input_video_tokens(int): 输入视频token的数量
|
||||
* error_code(int): 错误码
|
||||
* error_msg(str): 错误信息
|
||||
|
||||
|
||||
@@ -237,6 +237,19 @@ ChatMessage:
|
||||
completion_token_ids: Optional[List[int]] = None
|
||||
prompt_tokens: Optional[str] = None
|
||||
completion_tokens: Optional[str] = None
|
||||
UsageInfo:
|
||||
prompt_tokens: int = 0
|
||||
total_tokens: int = 0
|
||||
completion_tokens: Optional[int] = 0
|
||||
prompt_tokens_details: Optional[PromptTokenUsageInfo] = None
|
||||
completion_tokens_details: Optional[CompletionTokenUsageInfo] = None
|
||||
PromptTokenUsageInfo:
|
||||
cached_tokens: Optional[int] = None
|
||||
image_tokens: Optional[int] = None
|
||||
video_tokens: Optional[int] = None
|
||||
CompletionTokenUsageInfo:
|
||||
reasoning_tokens: Optional[int] = None
|
||||
image_tokens: Optional[int] = None
|
||||
ToolCall:
|
||||
id: str = None
|
||||
type: Literal["function"] = "function"
|
||||
@@ -410,6 +423,19 @@ CompletionResponseChoice:
|
||||
reasoning_content: Optional[str] = None
|
||||
finish_reason: Optional[Literal["stop", "length", "tool_calls"]]
|
||||
tool_calls: Optional[List[DeltaToolCall | ToolCall]] = None
|
||||
UsageInfo:
|
||||
prompt_tokens: int = 0
|
||||
total_tokens: int = 0
|
||||
completion_tokens: Optional[int] = 0
|
||||
prompt_tokens_details: Optional[PromptTokenUsageInfo] = None
|
||||
completion_tokens_details: Optional[CompletionTokenUsageInfo] = None
|
||||
PromptTokenUsageInfo:
|
||||
cached_tokens: Optional[int] = None
|
||||
image_tokens: Optional[int] = None
|
||||
video_tokens: Optional[int] = None
|
||||
CompletionTokenUsageInfo:
|
||||
reasoning_tokens: Optional[int] = None
|
||||
image_tokens: Optional[int] = None
|
||||
ToolCall:
|
||||
id: str = None
|
||||
type: Literal["function"] = "function"
|
||||
|
||||
Reference in New Issue
Block a user