mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-09-29 13:52:26 +08:00
[Docs]fix sampling docs 2.1 (#3333)
* [Docs]fix sampling docs (#3113) * fix sampling docs * fix sampling docs * update * fix docs
This commit is contained in:
@@ -98,7 +98,7 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
|
||||
{"role": "user", "content": "How old are you"}
|
||||
],
|
||||
"top_p": 0.8,
|
||||
"top_k": 50
|
||||
"top_k": 20
|
||||
}'
|
||||
```
|
||||
|
||||
@@ -117,7 +117,7 @@ response = client.chat.completions.create(
|
||||
],
|
||||
stream=True,
|
||||
top_p=0.8,
|
||||
top_k=50
|
||||
extra_body={"top_k": 20, "min_p":0.1}
|
||||
)
|
||||
for chunk in response:
|
||||
if chunk.choices[0].delta:
|
||||
@@ -159,8 +159,7 @@ response = client.chat.completions.create(
|
||||
],
|
||||
stream=True,
|
||||
top_p=0.8,
|
||||
top_k=20,
|
||||
min_p=0.1
|
||||
extra_body={"top_k": 20, "min_p":0.1}
|
||||
)
|
||||
for chunk in response:
|
||||
if chunk.choices[0].delta:
|
||||
|
@@ -183,6 +183,7 @@ For ```LLM``` configuration, refer to [Parameter Documentation](parameters.md).
|
||||
* min_p(float): Minimum probability relative to the maximum probability for a token to be considered (>0 filters low-probability tokens to improve quality)
|
||||
* max_tokens(int): Maximum generated tokens (input + output)
|
||||
* min_tokens(int): Minimum forced generation length
|
||||
* bad_words(list[str]): Prohibited words
|
||||
|
||||
### 2.5 fastdeploy.engine.request.RequestOutput
|
||||
|
||||
|
@@ -98,7 +98,7 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
|
||||
{"role": "user", "content": "How old are you"}
|
||||
],
|
||||
"top_p": 0.8,
|
||||
"top_k": 50
|
||||
"top_k": 20
|
||||
}'
|
||||
```
|
||||
|
||||
@@ -118,7 +118,7 @@ response = client.chat.completions.create(
|
||||
],
|
||||
stream=True,
|
||||
top_p=0.8,
|
||||
extra_body={"top_k": 50}
|
||||
extra_body={"top_k": 20}
|
||||
)
|
||||
for chunk in response:
|
||||
if chunk.choices[0].delta:
|
||||
@@ -161,8 +161,7 @@ response = client.chat.completions.create(
|
||||
],
|
||||
stream=True,
|
||||
top_p=0.8,
|
||||
extra_body={"top_k": 20},
|
||||
min_p=0.1
|
||||
extra_body={"top_k": 20, "min_p": 0.1}
|
||||
)
|
||||
for chunk in response:
|
||||
if chunk.choices[0].delta:
|
||||
|
@@ -183,6 +183,7 @@ for output in outputs:
|
||||
* min_p(float): token入选的最小概率阈值(相对于最高概率token的比值,设为>0可通过过滤低概率token来提升文本生成质量)
|
||||
* max_tokens(int): 限制模型生成的最大token数量(包括输入和输出)
|
||||
* min_tokens(int): 强制模型生成的最少token数量,避免过早结束
|
||||
* bad_words(list[str]): 禁止生成的词列表, 防止模型生成不希望出现的词
|
||||
|
||||
### 2.5 fastdeploy.engine.request.RequestOutput
|
||||
|
||||
|
Reference in New Issue
Block a user