mirror of
				https://github.com/PaddlePaddle/FastDeploy.git
				synced 2025-10-31 11:56:44 +08:00 
			
		
		
		
	
		
			
				
	
	
	
		
			1.4 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			1.4 KiB
		
	
	
	
	
	
	
	
Benchmark
FastDeploy extends the vLLM benchmark script with additional metrics, enabling more detailed performance benchmarking for FastDeploy.
Benchmark Dataset
The following dataset is sourced from open-source data (original data from HuggingFace Datasets):
| Dataset | Description | 
|---|---|
| https://fastdeploy.bj.bcebos.com/eb_query/filtered_sharedgpt_2000_input_1136_output_200_fd.json | Open-source dataset | 
How to Run
cd FastDeploy/benchmarks
python -m pip install -r requirements.txt
# Start service
python -m fastdeploy.entrypoints.openai.api_server \
       --model baidu/ERNIE-4.5-0.3B-Base-Paddle \
       --port 8188 \
       --tensor-parallel-size 1 \
       --max-model-len 8192
# Run benchmark
python benchmark_serving.py \
  --backend openai-chat \
  --model baidu/ERNIE-4.5-0.3B-Base-Paddle \
  --endpoint /v1/chat/completions \
  --host 0.0.0.0 \
  --port 8188 \
  --dataset-name EBChat \
  --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --metric-percentiles 80,95,99,99.9,99.95,99.99 \
  --num-prompts 1 \
  --max-concurrency 1 \
  --save-result
