Files
FastDeploy/benchmarks
kxz2002 24b85b752b
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
[Cherry-Pick] Unify the registration name recognition for tool_parser and reasoning_parser to “-” (#4668) (#4737)
* [Feature] add a new reasoning parser (#4571)

* add new reasoning_parser initial commit

* add parser file content

* add register

* ernie_test_reasoning_parser

* support <tool_call> token and add tool_parser

* add and fix unit tests

* modify reasoning_parser

* modify reasoning parser and tool parser

* modify unit tests

* modify reasoning_parser and tool_parser

* modify unit tests

* fix tool_parser

* modify the logic of reasoning_parser and tool_parser

* add and modify unit tests

* standardize code style

* simplify reasoning_parser and tool_parser

* modify unit test

* [BugFix] Fix finish reason in _create_chat_completion_choice  (#4582)

* fix n_param _create_chat_completion_choicel

* fix unit test

* fix final_res

* modify unit tests

* [BugFix] fix offline llm chat "enable_thinking" is always "False" (#4686)

* fix enable_thinking

* recover ernie4_5_vl_processor

* [Feature] Unify the registration name recognition for tool_parser and reasoning_parser to “-” (#4668)

* parser register name unify

* change ernie_x1 to ernie-x1

* change ernie4_5_vl to ernie-45-vl

* fix unit test
2025-10-31 23:27:21 +08:00
..
2025-09-22 14:27:17 +08:00
2025-07-24 15:19:23 +08:00
2025-07-04 09:53:03 +08:00

FastDeploy服务化性能压测工具

数据集:

wget下载到本地用于性能测试

Dataset Data Path
开源数据集 2k条 https://fastdeploy.bj.bcebos.com/eb_query/filtered_sharedgpt_2000_input_1136_output_200_fd.json

使用方式:

# 安装依赖
python -m pip install -r requirements.txt
参数说明
--backend openai-chat压测使用的后端接口指定为"openai-chat"使用chat/completion接口
--model EB45T模型名任意取名影响最后保存的结果文件名 EB45T \
--endpoint /v1/chat/completionsendpoint用于组url
--host 0.0.0.0服务ip地址用于组url
--port 9812服务HTTP端口用于组url
--dataset-name EBChat指定数据集类指定为"EBChat"可读取转存的FD格式数据集
--dataset-path ./eb45t_spv4_dataserver_1w_waigua_fd压测数据集路径
--hyperparameter-path EB45T.yaml(可选)超参文件请求时会更新进payload中默认不带任何超参
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len性能结果中展示的指标集合
--metric-percentiles 80,95,99,99.9,99.95,99.99:性能结果中展示的性能指标分位值
--num-prompts 1总计发送多少条请求
--max-concurrency 1压测并发数
--save-result开启结果保存结果文件会存入json默认False不保存
--debug开启debug模式逐条打印payload和output内容默认False
--shuffle是否打乱数据集默认False不打乱
--seed打乱数据集时的随机种子默认0
/v1/chat/completions接口压测单条数据调试
python benchmark_serving.py \
  --backend openai-chat \
  --model EB45T \
  --endpoint /v1/chat/completions \
  --host 0.0.0.0 \
  --port 9812 \
  --dataset-name EBChat \
  --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
  --hyperparameter-path yaml/request_yaml/eb45t-32k.yaml \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --metric-percentiles 80,95,99,99.9,99.95,99.99 \
  --num-prompts 1 \
  --max-concurrency 1 \
  --save-result
/v1/chat/completions接口完整100并发 2000条压测
# 保存infer_log.txt
python benchmark_serving.py \
  --backend openai-chat \
  --model EB45T \
  --endpoint /v1/chat/completions \
  --host 0.0.0.0 \
  --port 9812 \
  --dataset-name EBChat \
  --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
  --hyperparameter-path yaml/request_yaml/eb45t-32k.yaml \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --metric-percentiles 80,95,99,99.9,99.95,99.99 \
  --num-prompts 2000 \
  --max-concurrency 100 \
  --save-result > infer_log.txt 2>&1 &
/v1/completions接口压测

修改endpoint为/v1/completionsbackend为openai会对/v1/completions接口进行压测

# 保存infer_log.txt
python benchmark_serving.py \
  --backend openai \
  --model EB45T \
  --endpoint /v1/completions \
  --host 0.0.0.0 \
  --port 9812 \
  --dataset-name EBChat \
  --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
  --hyperparameter-path yaml/request_yaml/eb45t-32k.yaml \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --metric-percentiles 80,95,99,99.9,99.95,99.99 \
  --num-prompts 2000 \
  --max-concurrency 100 \
  --save-result > infer_log.txt 2>&1 &

投机解码性能测试工具

使用方式:

python benchmarks/benchmark_mtp.py \
  --host 127.0.0.1 --port 8000 \
  --max-concurrency 16 32 64 96 --num-prompts 256 \
  --acceptance-rate 0.8 --draft-token-steps 1 2 3 \
  --s_itl-base-model 15.88 22.84 16.47 16.93 \
  --dataset-name EBChat \
  --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json

参数说明

--host服务ip地址用于组url
--port服务HTTP端口用于组url
--max-concurrency测试并发数
--num-prompts总计发送多少条请求
--acceptance-rate投机解码的模拟接受率
--draft-token-steps投机解码的步数
--s_itl-base-model主模型的解码延迟可由上述的性能压测工具获得与batch-size一一对应
--dataset-name指定数据集类指定为"EBChat"可读取转存的FD格式数据集
--dataset-path测试数据集路径