FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Files

SunLei 5fb93d84f5 [Feature] [Benchmark]: add ZMQ-based FMQ implementation and benchmark tools (#5418 )

* feat(fmq): add ZMQ-based FMQ implementation and benchmark tools

* move FMQ_CONFIG_JSON to envs

* fix top_p_candidates (#5400)

Co-authored-by: freeliuzc <lzc842650834@gmail.com>

* [RL] Support Rollout Routing Replay (#5321)

* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>

* [Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374)

* fix multi-inputs

* fix threshold

* fix threshold

* fix

* [BugFix]remove _execute_empty_input (#5396)

* Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)

This reverts commit 96d2d4877b.

* [New][RL] Support Rollout Routing Replay (#5405)

* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

* Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)"

This reverts commit c45e064f3d.

* Fix XPU and NPU bug

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>

* bf16 deepseek (#5379)

* fix deepseek (#5410)

* Update tests/inter_communicator/test_fmq_factory.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update benchmarks/benchmark_fmq.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/inter_communicator/fmq.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: GoldPancake <56388518+Deleter-D@users.noreply.github.com>
Co-authored-by: freeliuzc <lzc842650834@gmail.com>
Co-authored-by: RAM <gstian5555@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com>
Co-authored-by: 周周周 <39978853+zhoutianzi666@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>

2025-12-08 22:04:49 +08:00

paddleocr_vl

add paddleocr_vl benchmark (#4833 )

2025-11-05 19:37:45 +08:00

yaml

[CE]change yaml name

2025-12-04 19:14:11 +08:00

backend_request_func.py

update (#5298 )

2025-11-28 18:29:16 +08:00

benchmark_dataset.py

update (#5298 )

2025-11-28 18:29:16 +08:00

benchmark_fmq.py

[Feature] [Benchmark]: add ZMQ-based FMQ implementation and benchmark tools (#5418 )

2025-12-08 22:04:49 +08:00

benchmark_mtp.py

fix typos (#4176 )

2025-09-22 14:27:17 +08:00

benchmark_serving.py

update (#5298 )

2025-11-28 18:29:16 +08:00

benchmark_utils.py

polish code with new pre-commit rule (#2923 )

2025-07-19 23:19:27 +08:00

quick_benchmark.py

polish code with new pre-commit rule (#2923 )

2025-07-19 23:19:27 +08:00

README.md

update (#5298 )

2025-11-28 18:29:16 +08:00

requirements.txt

Update requirements.txt

2025-07-04 09:53:03 +08:00

README.md

FastDeploy服务化性能压测工具

数据集：

wget下载到本地用于性能测试

Dataset	Data Path
开源数据集 2k条	`https://fastdeploy.bj.bcebos.com/eb_query/filtered_sharedgpt_2000_input_1136_output_200_fd.json`

使用方式：

# 安装依赖
python -m pip install -r requirements.txt

参数说明

--backend openai-chat：压测使用的后端接口，指定为"openai-chat"使用chat/completion接口
--model EB45T：模型名，任意取名，影响最后保存的结果文件名 EB45T \
--endpoint /v1/chat/completions：endpoint，用于组url
--host 0.0.0.0：服务ip地址，用于组url
--port 9812：服务HTTP端口，用于组url
--dataset-name EBChat：指定数据集类，指定为"EBChat"可读取转存的FD格式数据集
--dataset-path ./eb45t_spv4_dataserver_1w_waigua_fd：压测数据集路径
--hyperparameter-path EB45T.yaml：(可选)超参文件，请求时会更新进payload中，默认不带任何超参
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len：性能结果中展示的指标集合
--metric-percentiles 80,95,99,99.9,99.95,99.99：性能结果中展示的性能指标分位值
--num-prompts 1：总计发送多少条请求
--max-concurrency 1：压测并发数
--save-result：开启结果保存，结果文件会存入json，默认False不保存
--debug：开启debug模式，逐条打印payload和output内容，默认False
--shuffle：是否打乱数据集，默认False不打乱
--seed：打乱数据集时的随机种子，默认0

/v1/chat/completions接口压测单条数据调试

python benchmark_serving.py \
  --backend openai-chat \
  --model EB45T \
  --endpoint /v1/chat/completions \
  --host 0.0.0.0 \
  --port 9812 \
  --dataset-name EBChat \
  --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
  --hyperparameter-path yaml/request_yaml/eb45-32k.yaml \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --metric-percentiles 80,95,99,99.9,99.95,99.99 \
  --num-prompts 1 \
  --max-concurrency 1 \
  --save-result

/v1/chat/completions接口完整100并发 2000条压测

# 保存infer_log.txt
python benchmark_serving.py \
  --backend openai-chat \
  --model EB45T \
  --endpoint /v1/chat/completions \
  --host 0.0.0.0 \
  --port 9812 \
  --dataset-name EBChat \
  --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
  --hyperparameter-path yaml/request_yaml/eb45-32k.yaml \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --metric-percentiles 80,95,99,99.9,99.95,99.99 \
  --num-prompts 2000 \
  --max-concurrency 100 \
  --save-result > infer_log.txt 2>&1 &

/v1/completions接口压测

修改endpoint为/v1/completions，backend为openai，会对/v1/completions接口进行压测

# 保存infer_log.txt
python benchmark_serving.py \
  --backend openai \
  --model EB45T \
  --endpoint /v1/completions \
  --host 0.0.0.0 \
  --port 9812 \
  --dataset-name EBChat \
  --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
  --hyperparameter-path yaml/request_yaml/eb45-32k.yaml \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --metric-percentiles 80,95,99,99.9,99.95,99.99 \
  --num-prompts 2000 \
  --max-concurrency 100 \
  --save-result > infer_log.txt 2>&1 &

投机解码性能测试工具

使用方式：

python benchmarks/benchmark_mtp.py \
  --host 127.0.0.1 --port 8000 \
  --max-concurrency 16 32 64 96 --num-prompts 256 \
  --acceptance-rate 0.8 --draft-token-steps 1 2 3 \
  --s_itl-base-model 15.88 22.84 16.47 16.93 \
  --dataset-name EBChat \
  --dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json

参数说明

--host：服务ip地址，用于组url
--port：服务HTTP端口，用于组url
--max-concurrency：测试并发数
--num-prompts：总计发送多少条请求
--acceptance-rate：投机解码的模拟接受率
--draft-token-steps：投机解码的步数
--s_itl-base-model：主模型的解码延迟，可由上述的性能压测工具获得，与batch-size一一对应
--dataset-name：指定数据集类，指定为"EBChat"可读取转存的FD格式数据集
--dataset-path：测试数据集路径

指定输入输出长度，构造随机纯文输入测试

使用方式：

python benchmark_serving.py \
  --backend openai-chat \
  --model EB45T \
  --endpoint /v1/chat/completions \
  --host 0.0.0.0 \
  --port 9812 \
  --dataset-name random \
  --random-input-len 200 \
  --random-output-len 1024 \
  --random-range-ratio 0.1 \
  --percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
  --metric-percentiles 80,95,99,99.9,99.95,99.99 \
  --num-prompts 2000 \
  --max-concurrency 100 \
  --save-result > infer_log.txt 2>&1 &

README.md Unescape Escape

FastDeploy服务化性能压测工具

数据集：

使用方式：

参数说明

/v1/chat/completions接口压测单条数据调试

/v1/chat/completions接口完整100并发 2000条压测

/v1/completions接口压测

投机解码性能测试工具

使用方式：

参数说明

指定输入输出长度，构造随机纯文输入测试

使用方式：

README.md