mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
* feat(fmq): add ZMQ-based FMQ implementation and benchmark tools * move FMQ_CONFIG_JSON to envs * fix top_p_candidates (#5400) Co-authored-by: freeliuzc <lzc842650834@gmail.com> * [RL] Support Rollout Routing Replay (#5321) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com> * [Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374) * fix multi-inputs * fix threshold * fix threshold * fix * [BugFix]remove _execute_empty_input (#5396) * Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402) This reverts commit96d2d4877b. * [New][RL] Support Rollout Routing Replay (#5405) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl * Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)" This reverts commitc45e064f3d. * Fix XPU and NPU bug --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com> * bf16 deepseek (#5379) * fix deepseek (#5410) * Update tests/inter_communicator/test_fmq_factory.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update benchmarks/benchmark_fmq.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/inter_communicator/fmq.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: GoldPancake <56388518+Deleter-D@users.noreply.github.com> Co-authored-by: freeliuzc <lzc842650834@gmail.com> Co-authored-by: RAM <gstian5555@outlook.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com> Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com> Co-authored-by: 周周周 <39978853+zhoutianzi666@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
FastDeploy服务化性能压测工具
数据集:
wget下载到本地用于性能测试
| Dataset | Data Path |
|---|---|
| 开源数据集 2k条 | https://fastdeploy.bj.bcebos.com/eb_query/filtered_sharedgpt_2000_input_1136_output_200_fd.json |
使用方式:
# 安装依赖
python -m pip install -r requirements.txt
参数说明
--backend openai-chat:压测使用的后端接口,指定为"openai-chat"使用chat/completion接口
--model EB45T:模型名,任意取名,影响最后保存的结果文件名 EB45T \
--endpoint /v1/chat/completions:endpoint,用于组url
--host 0.0.0.0:服务ip地址,用于组url
--port 9812:服务HTTP端口,用于组url
--dataset-name EBChat:指定数据集类,指定为"EBChat"可读取转存的FD格式数据集
--dataset-path ./eb45t_spv4_dataserver_1w_waigua_fd:压测数据集路径
--hyperparameter-path EB45T.yaml:(可选)超参文件,请求时会更新进payload中,默认不带任何超参
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len:性能结果中展示的指标集合
--metric-percentiles 80,95,99,99.9,99.95,99.99:性能结果中展示的性能指标分位值
--num-prompts 1:总计发送多少条请求
--max-concurrency 1:压测并发数
--save-result:开启结果保存,结果文件会存入json,默认False不保存
--debug:开启debug模式,逐条打印payload和output内容,默认False
--shuffle:是否打乱数据集,默认False不打乱
--seed:打乱数据集时的随机种子,默认0
/v1/chat/completions接口压测单条数据调试
python benchmark_serving.py \
--backend openai-chat \
--model EB45T \
--endpoint /v1/chat/completions \
--host 0.0.0.0 \
--port 9812 \
--dataset-name EBChat \
--dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
--hyperparameter-path yaml/request_yaml/eb45-32k.yaml \
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
--metric-percentiles 80,95,99,99.9,99.95,99.99 \
--num-prompts 1 \
--max-concurrency 1 \
--save-result
/v1/chat/completions接口完整100并发 2000条压测
# 保存infer_log.txt
python benchmark_serving.py \
--backend openai-chat \
--model EB45T \
--endpoint /v1/chat/completions \
--host 0.0.0.0 \
--port 9812 \
--dataset-name EBChat \
--dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
--hyperparameter-path yaml/request_yaml/eb45-32k.yaml \
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
--metric-percentiles 80,95,99,99.9,99.95,99.99 \
--num-prompts 2000 \
--max-concurrency 100 \
--save-result > infer_log.txt 2>&1 &
/v1/completions接口压测
修改endpoint为/v1/completions,backend为openai,会对/v1/completions接口进行压测
# 保存infer_log.txt
python benchmark_serving.py \
--backend openai \
--model EB45T \
--endpoint /v1/completions \
--host 0.0.0.0 \
--port 9812 \
--dataset-name EBChat \
--dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json \
--hyperparameter-path yaml/request_yaml/eb45-32k.yaml \
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
--metric-percentiles 80,95,99,99.9,99.95,99.99 \
--num-prompts 2000 \
--max-concurrency 100 \
--save-result > infer_log.txt 2>&1 &
投机解码性能测试工具
使用方式:
python benchmarks/benchmark_mtp.py \
--host 127.0.0.1 --port 8000 \
--max-concurrency 16 32 64 96 --num-prompts 256 \
--acceptance-rate 0.8 --draft-token-steps 1 2 3 \
--s_itl-base-model 15.88 22.84 16.47 16.93 \
--dataset-name EBChat \
--dataset-path ./filtered_sharedgpt_2000_input_1136_output_200_fd.json
参数说明
--host:服务ip地址,用于组url
--port:服务HTTP端口,用于组url
--max-concurrency:测试并发数
--num-prompts:总计发送多少条请求
--acceptance-rate:投机解码的模拟接受率
--draft-token-steps:投机解码的步数
--s_itl-base-model:主模型的解码延迟,可由上述的性能压测工具获得,与batch-size一一对应
--dataset-name:指定数据集类,指定为"EBChat"可读取转存的FD格式数据集
--dataset-path:测试数据集路径
指定输入输出长度,构造随机纯文输入测试
相关参数:
- --dataset-name:指定数据集类,指定为"random"可构造随机纯文输入
- --random-input-len:随机输入长度,对应英文单词数,默认200
- --random-output-len:随机输出长度,默认1024
- --random-range-ratio:输入输出长度变化范围比,[length (1 - range_ratio), length (1 + range_ratio)],默认0.1
使用方式:
python benchmark_serving.py \
--backend openai-chat \
--model EB45T \
--endpoint /v1/chat/completions \
--host 0.0.0.0 \
--port 9812 \
--dataset-name random \
--random-input-len 200 \
--random-output-len 1024 \
--random-range-ratio 0.1 \
--percentile-metrics ttft,tpot,itl,e2el,s_ttft,s_itl,s_e2el,s_decode,input_len,s_input_len,output_len \
--metric-percentiles 80,95,99,99.9,99.95,99.99 \
--num-prompts 2000 \
--max-concurrency 100 \
--save-result > infer_log.txt 2>&1 &