mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
[Docs] fix PaddleOCR-VL docs bug (#4702)
This commit is contained in:
@@ -24,7 +24,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
|
||||
--max-model-len 16384 \
|
||||
--max-num-batched-tokens 16384 \
|
||||
--gpu-memory-utilization 0.8 \
|
||||
--max-num-seqs 128 \
|
||||
--max-num-seqs 128
|
||||
```
|
||||
**Example 2:** Deploying a 16K Context Service on a Single A100 GPU
|
||||
```shell
|
||||
@@ -36,7 +36,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
|
||||
--max-model-len 16384 \
|
||||
--max-num-batched-tokens 16384 \
|
||||
--gpu-memory-utilization 0.8 \
|
||||
--max-num-seqs 256 \
|
||||
--max-num-seqs 256
|
||||
```
|
||||
|
||||
An example is a set of configurations that can run stably while also delivering relatively good performance. If you have further requirements for precision or performance, please continue reading the content below.
|
||||
|
||||
@@ -24,7 +24,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
|
||||
--max-model-len 16384 \
|
||||
--max-num-batched-tokens 16384 \
|
||||
--gpu-memory-utilization 0.8 \
|
||||
--max-num-seqs 128 \
|
||||
--max-num-seqs 128
|
||||
```
|
||||
|
||||
**示例2:** A100上单卡部署16K上下文的服务
|
||||
@@ -37,7 +37,7 @@ python -m fastdeploy.entrypoints.openai.api_server \
|
||||
--max-model-len 16384 \
|
||||
--max-num-batched-tokens 16384 \
|
||||
--gpu-memory-utilization 0.8 \
|
||||
--max-num-seqs 256 \
|
||||
--max-num-seqs 256
|
||||
```
|
||||
|
||||
示例是可以稳定运行的一组配置,同时也能得到比较好的性能。
|
||||
|
||||
Reference in New Issue
Block a user