mirror of
				https://github.com/PaddlePaddle/FastDeploy.git
				synced 2025-11-01 04:12:58 +08:00 
			
		
		
		
	
		
			
				
	
	
		
			41 lines
		
	
	
		
			1.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			41 lines
		
	
	
		
			1.5 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Benchmark
 | ||
| 
 | ||
| FastDeploy基于[vLLM benchmark](https://github.com/vllm-project/vllm/blob/main/benchmarks/)脚本,增加了部分统计信息,可用于benchmark FastDeploy更详细的性能指标。
 | ||
| 
 | ||
| ## 测试数据集
 | ||
| 
 | ||
| 以下数据集来源于开源数据集(源数据来源于[HuggingFace Datasets](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json))
 | ||
| 
 | ||
| | 数据集                                                       | 说明       |
 | ||
| | :----------------------------------------------------------- | :--------- |
 | ||
| | https://fastdeploy.bj.bcebos.com/eb_query/filtered_sharedgpt_2000_input_1136_output_200_fd.json | 开源数据集 |
 | ||
| 
 | ||
| ## 测试方式
 | ||
| 
 | ||
| ```
 | ||
| cd FastDeploy/benchmarks
 | ||
| python -m pip install -r requirements.txt
 | ||
| 
 | ||
| # 启动服务
 | ||
| python -m fastdeploy.entrypoints.openai.api_server \
 | ||
|        --model baidu/ERNIE-4.5-0.3B-Base-Paddle \
 | ||
|        --port 8188 \
 | ||
|        --tensor-parallel-size 1 \
 | ||
|        --max-model-len 8192
 | ||
| 
 | ||
| # 压测服务
 | ||
| python benchmark_serving.py \
 | ||
|   --backend openai-chat \
 | ||
|   --model baidu/ERNIE-4.5-0.3B-Base-Paddle \
 | ||
|   --endpoint /v1/chat/completions \
 | ||
|   --host 0.0.0.0 \
 | ||
|   --port 8188 \
 | ||
|   --dataset-name EBChat \
 | ||
|   --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
 | ||
|   --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
 | ||
|   --metric-percentiles 80,95,99,99.9,99.95,99.99 \
 | ||
|   --num-prompts 1 \
 | ||
|   --max-concurrency 1 \
 | ||
|   --save-result
 | ||
| ```
 | 
