From 10c6bded85f246a01b18a204816f218c720cc523 Mon Sep 17 00:00:00 2001 From: kevincheng2 Date: Thu, 29 Aug 2024 19:45:32 +0800 Subject: [PATCH] update README --- llm/README.md | 64 +-------------------------------------------------- 1 file changed, 1 insertion(+), 63 deletions(-) diff --git a/llm/README.md b/llm/README.md index 9d7ed90dc..6475ae2f9 100644 --- a/llm/README.md +++ b/llm/README.md @@ -32,69 +32,7 @@ Note: 1. 请保证 shm-size >= 5,不然可能会导致服务启动失败 -更多关于 FastDeploy 的使用方法,请查看[服务化部署流程](https://console.cloud.baidu-int.com/devops/icode/repos/baidu/fastdeploy/serving/blob/opensource/docs/FastDeploy_usage_tutorial.md) - -# benchmark 测试 - -我们在 `Llama-3-8B-Instruct` 模型不同的精度下,对 FastDeploy 的性能进行测试,测试结果如下表所示: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
框架精度QPStokens/s整句时延
FastDeployFP16/BF1616.213171.097.15
WINT814.842906.277.95
W8A8C8-INT820.604031.755.61
vLLMFP16/BF169.071766.1113.32
WINT88.231602.9614.85
W8A8C8-INT89.411831.8112.76
- -- 测试环境: - - GPU:NVIDIA A100-SXM4-80GB - - cuda 版本:11.6 - - cudnn 版本:8.4.0 - - Batch Size: 128 - - 请求并发量:128 - - vLLM 版本:v0.5.3 - - TRT-LLM 版本:v0.11.0 - - 数据集:[ShareGPT_V3_unfiltered_cleaned_split.json](https://huggingface.co/datasets/learnanything/sharegpt_v3_unfiltered_cleaned_split/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json) +更多关于 FastDeploy 的使用方法,请查看[服务化部署流程](https://github.com/PaddlePaddle/FastDeploy/blob/develop/llm/docs/FastDeploy_usage_tutorial.md) # License