mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
dcu adapter ernie45t (#2756)
Co-authored-by: lifu <lifu@sugon.com> Co-authored-by: yongqiangma <xing.wo@163.com>
This commit is contained in:
81
docs/zh/get_started/installation/hygon_dcu.md
Normal file
81
docs/zh/get_started/installation/hygon_dcu.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# 使用 FastDeploy 在海光 K100AI 上运行 ERNIE-4.5-300B-A47B & ERNIE-4.5-21B-A3B
|
||||
当前版本软件只是作为K100AI + Fastdeploy 推理大模型的一个演示 demo,跑最新ERNIE4.5模型可能存在问题,后续进行修复和性能优化,给客户提供一个更稳定的版本。
|
||||
|
||||
## 准备机器
|
||||
首先您需要准备以下配置的机器
|
||||
- OS:Linux
|
||||
- Python:3.10
|
||||
- 内存:2T
|
||||
- 磁盘:4T
|
||||
- DCU 型号:K100AI
|
||||
- DCU 驱动版本:≥ 6.3.8-V1.9.2
|
||||
|
||||
## 1. 使用 Docker 安装(推荐)
|
||||
|
||||
```bash
|
||||
mkdir Work
|
||||
cd Work
|
||||
docker pull image.sourcefind.cn:5000/dcu/admin/base/custom:fastdeploy2.0.0-kylinv10-dtk25.04-py3.10
|
||||
|
||||
docker run -it \
|
||||
--network=host \
|
||||
--name=ernie45t \
|
||||
--privileged \
|
||||
--device=/dev/kfd \
|
||||
--device=/dev/dri \
|
||||
--ipc=host \
|
||||
--shm-size=16G \
|
||||
--group-add video \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
-u root \
|
||||
--ulimit stack=-1:-1 \
|
||||
--ulimit memlock=-1:-1 \
|
||||
-v `pwd`:/home \
|
||||
-v /opt/hyhal:/opt/hyhal:ro \
|
||||
image.sourcefind.cn:5000/dcu/admin/base/custom:fastdeploy2.0.0-kylinv10-dtk25.04-py3.10 /bin/bash
|
||||
```
|
||||
|
||||
## 2. 启动服务
|
||||
```bash
|
||||
export FD_ATTENTION_BACKEND="BLOCK_ATTN"
|
||||
python -m fastdeploy.entrypoints.openai.api_server \
|
||||
--model "/models/ERNIE-45-Turbo/ERNIE-4.5-300B-A47B-Paddle/" \
|
||||
--port 8188 \
|
||||
--tensor-parallel-size 8 \
|
||||
--quantization=wint8 \
|
||||
--gpu-memory-utilization=0.8
|
||||
```
|
||||
|
||||
#### 请求服务
|
||||
|
||||
您可以基于 OpenAI 协议,通过 curl 和 python 两种方式请求服务。
|
||||
|
||||
```bash
|
||||
curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"messages": [
|
||||
{"role": "user", "content": "Where is the capital of China?"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
```python
|
||||
import openai
|
||||
|
||||
ip = "0.0.0.0"
|
||||
service_http_port = "8188"
|
||||
client = openai.Client(base_url=f"http://{ip}:{service_http_port}/v1", api_key="EMPTY_API_KEY")
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model="default",
|
||||
messages=[
|
||||
{"role": "user", "content": "Eliza's rate per hour for the first 40 hours she works each week is $10. She also receives an overtime pay of 1.2 times her regular hourly rate. If Eliza worked for 45 hours this week, how much are her earnings for this week?"},
|
||||
],
|
||||
temperature=1,
|
||||
max_tokens=1024,
|
||||
stream=False,
|
||||
)
|
||||
print(response)
|
||||
```
|
||||
Reference in New Issue
Block a user