Files
FastDeploy/docs/zh/benchmark.md
2025-06-29 23:29:37 +00:00

41 lines
1.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Benchmark
FastDeploy基于[vLLM benchmark](https://github.com/vllm-project/vllm/blob/main/benchmarks/)脚本增加了部分统计信息可用于benchmark FastDeploy更详细的性能指标。
## 测试数据集
以下数据集来源于开源数据集(源数据来源于[HuggingFace Datasets](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json))
| 数据集 | 说明 |
| :----------------------------------------------------------- | :--------- |
| https://fastdeploy.bj.bcebos.com/eb_query/filtered_sharedgpt_2000_input_1136_output_200_fd.json | 开源数据集 |
## 测试方式
```
cd FastDeploy/benchmarks
python -m pip install -r requirements.txt
# 启动服务
python -m fastdeploy.entrypoints.openai.api_server \
--model baidu/ERNIE-4.5-0.3B-Base-Paddle \
--port 8188 \
--tensor-parallel-size 1 \
--max-model-len 8192
# 压测服务
python benchmark_serving.py \
--backend openai-chat \
--model baidu/ERNIE-4.5-0.3B-Base-Paddle \
--endpoint /v1/chat/completions \
--host 0.0.0.0 \
--port 8188 \
--dataset-name EBChat \
--dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
--metric-percentiles 80,95,99,99.9,99.95,99.99 \
--num-prompts 1 \
--max-concurrency 1 \
--save-result
```