Sync v2.0 version of code to github repo

This commit is contained in:
Jiang-Jia-Jun
2025-06-29 23:29:37 +00:00
parent d151496038
commit 92c2cfa2e7
597 changed files with 78776 additions and 22905 deletions

View File

@@ -0,0 +1,128 @@
# 使用 FastDeploy 在燧原 S60 上运行 ERNIE-4.5-21B-A3B模型
燧原 S60[了解燧原](https://www.enflame-tech.com/))是面向数据中心大规模部署的新一代人工智能推理加速卡,满足大语言模型、搜广推及传统模型的需求,具有模型覆盖面广、易用性强、易迁移易部署等特点,可广泛应用于图像及文本生成等应用、搜索与推荐、文本、图像及语音识别等主流推理场景。
FastDeploy 在燧原 S60 上对 ernie-4_5-21b-a3b-bf16-paddle 模型进行了深度适配和优化,实现了 GCU 推理入口和 GPU 的统一,无需修改即可完成推理任务的迁移。
## 🚀 快速开始 🚀
### 0. 机器准备。快速开始之前,您需要准备一台插有燧原 S60 加速卡的机器,要求如下:
| 芯片类型 | 驱动版本 | TopsRider 版本 |
| :---: | :---: | :---: |
| 燧原 S60 | 1.5.0.5 | 3.4.623 |
**注:如果需要验证您的机器是否插有燧原 S60 加速卡,只需主机环境下输入以下命令,看是否有输出:**
```bash
lspci | grep S60
# 例如lspci | grep S60 , 输出如下
08:00.0 Processing accelerators: Shanghai Enflame Technology Co. Ltd S60 [Enflame] (rev 01)
09:00.0 Processing accelerators: Shanghai Enflame Technology Co. Ltd S60 [Enflame] (rev 01)
0e:00.0 Processing accelerators: Shanghai Enflame Technology Co. Ltd S60 [Enflame] (rev 01)
11:00.0 Processing accelerators: Shanghai Enflame Technology Co. Ltd S60 [Enflame] (rev 01)
32:00.0 Processing accelerators: Shanghai Enflame Technology Co. Ltd S60 [Enflame] (rev 01)
38:00.0 Processing accelerators: Shanghai Enflame Technology Co. Ltd S60 [Enflame] (rev 01)
3b:00.0 Processing accelerators: Shanghai Enflame Technology Co. Ltd S60 [Enflame] (rev 01)
3c:00.0 Processing accelerators: Shanghai Enflame Technology Co. Ltd S60 [Enflame] (rev 01)
```
### 1. 环境准备:(这将花费您 510min 时间)
1. 拉取镜像
```bash
# 注意此镜像仅为paddle开发环境镜像中不包含预编译的飞桨安装包
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:topsrider3.4.623-ubuntu20-x86_64-gcc84
```
2. 参考如下命令启动容器
```bash
docker run --name paddle-gcu-llm -v /home:/home -v /work:/work --network=host --ipc=host -it --privileged ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-gcu:topsrider3.4.623-ubuntu20-x86_64-gcc84 /bin/bash
```
3. 获取并安装驱动<br/>
**docker 内提前放置了全量软件包,需拷贝至 docker 外目录,如:```/home/workspace/deps/```**
```bash
mkdir -p /home/workspace/deps/ && cp /root/TopsRider_i3x_*/TopsRider_i3x_*_deb_amd64.run /home/workspace/deps/
```
4. 安装驱动<br/>
**此操作需要在主机环境下执行**
```bash
cd /home/workspace/deps/
bash TopsRider_i3x_*_deb_amd64.run --driver --no-auto-load -y
```
驱动安装完成后**重新进入 docker**,参考如下命令
```bash
docker start paddle-gcu-llm
docker exec -it paddle-gcu-llm bash
```
5. 安装 PaddlePaddle<br/>
```bash
# PaddlePaddle『飞桨』深度学习框架提供运算基础能力
python -m pip install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
```
6. 安装 PaddleCustomDevice<br/>
```bash
# PaddleCustomDevice是PaddlePaddle『飞桨』深度学习框架的自定义硬件接入实现提供GCU的算子实现
python -m pip install paddle-custom-gcu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/gcu/
# 如想源码编译安装请参考https://github.com/PaddlePaddle/PaddleCustomDevice/blob/develop/backends/gcu/README_cn.md
```
7. 安装 FastDeploy 和 依赖<br/>
```bash
python -m pip install fastdeploy -i https://www.paddlepaddle.org.cn/packages/stable/gcu/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simplels
apt install python3.10-distutils
```
### 2. 数据准备:(这将花费您 25min 时间)
使用训练好的模型,在 GSM8K 上推理
```bash
mkdir -p /home/workspace/benchmark/ && cd /home/workspace/benchmark/
wget https://raw.githubusercontent.com/openai/grade-school-math/master/grade_school_math/data/test.jsonl
```
准备模型和权重,置于环境目录,如:```/work/models/ernie-4_5-21b-a3b-bf16-paddle/```
### 3. 推理:(这将花费您 2~5min 时间)
执行如下命令启动推理服务
```bash
python -m fastdeploy.entrypoints.openai.api_server \
--model "/work/models/ernie-4_5-21b-a3b-bf16-paddle/" \
--port 8188 \
--metrics-port 8200 \
--tensor-parallel-size 4 \
--max-model-len 8192 \
--num-gpu-blocks-override 1024
```
使用如下命令请求模型服务
```bash
curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "The largest ocean is"}
]
}'
```
成功运行后,可以查看到推理结果的生成,样例如下
```json
{"id":"chatcmpl-5cd96f3b-eff3-4dc0-8aa2-8b5d7b7b86f2","object":"chat.completion","created":1751167862,"model":"default","choices":[{"index":0,"message":{"role":"assistant","content":"3. **Pacific Ocean**: The Pacific Ocean is the largest and deepest of the world's oceans. It covers an area of approximately 181,344,000 square kilometers, which is more than 30% of the Earth's surface. It is located between the Americas to the west and east, and Asia and Australia to the north and south. The Pacific Ocean is known for its vastness, diverse marine life, and numerous islands.\n\nIn summary, the largest ocean in the world is the Pacific Ocean.","reasoning_content":null,"tool_calls":null},"finish_reason":"stop"}],"usage":{"prompt_tokens":11,"total_tokens":127,"completion_tokens":116,"prompt_tokens_details":{"cached_tokens":0}}}
```
### 4. 精度测试:(这将花费您 60~180min 时间)
准备精度脚本 ```bench_gsm8k.py``` 置于 ```/home/workspace/benchmark/``` ,并修改采样参数,如:
```bash
data = {
"messages": [
{
"role": "user",
"content": prompt,
}
],
"temperature": 0.6,
"max_tokens": 2047,
"top_p": 0.95,
"do_sample": True,
}
```
执行以下命令启动精度测试
```bash
cd /home/workspace/benchmark/
python -u bench_gsm8k.py --port 8188 --num-questions 1319 --num-shots 5 --parallel 2
```
执行成功运行后,当前目录可以查看到精度结果的生成,文件为 ```result.jsonl```,样例如下(部分数据集,仅示例)
```json
{"task": "gsm8k", "backend": "paddlepaddle", "num_gpus": 1, "latency": 365.548, "accuracy": 0.967, "num_requests": 30, "other": {"num_questions": 30, "parallel": 2}}
```

View File

@@ -0,0 +1,8 @@
# FastDeploy Installation Guide
FastDeploy currently supports installation on the following hardware platforms:
- [NVIDIA GPU Installation](nvidia_gpu.md)
- [Kunlunxin XPU Installation](kunlunxin_xpu.md)
- [Enflame S60 GCU Installation](Enflame_gcu.md)
- [Iluvatar GPU Installation](iluvatar_gpu.md)

View File

@@ -0,0 +1,102 @@
# 如何在天数机器上运行 ERNIE-4.5-300B-A47B-BF16
当前版本软件只是作为天数芯片 + Fastdeploy 推理大模型的一个演示 demo跑最新ERNIE4.5模型可能存在问题,后续进行修复和性能优化,给客户提供一个更稳定的版本。
## 准备机器
首先您需要准备以下配置的机器
| CPU | 内存 | 天数 | 硬盘|
|-----|------|-----|-----|
| x86 | 1TB| 8xBI150| 1TB|
目前需要将完整模型 load 到 host memory 中,需要需要大于 600GB 的 host memory后续版本会优化。
## 镜像
从官网获取:
```bash
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-ixuca:latest
```
## 准备容器
1. 启动容器
```bash
docker run -itd --name paddle_infer -v /usr/src:/usr/src -v /lib/modules:/lib/modules -v /dev:/dev -v /home/paddle:/home/paddle --privileged --cap-add=ALL --pid=host ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-ixuca:latest
docker exec -it paddle_infer bash
```
/home/paddle 为模型文件、whl包、脚本所在目录
2. 安装whl包
```bash
pip3 install paddlepaddle==3.1.0a0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
pip3 install paddle-iluvatar-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/ixuca/
pip3 install fastdeploy -i https://www.paddlepaddle.org.cn/packages/stable/ixuca/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simplels
pip3 install aistudio-sdk==0.2.6
```
## 准备推理demo脚本
推理 demo 路径:/home/paddle/scripts
脚本内容如下
`run_demo.sh`:
```bash
#!/bin/bash
export PADDLE_XCCL_BACKEND=iluvatar_gpu
export USE_WORKER_V1=1
export INFERENCE_MSG_QUEUE_ID=232132
export LD_PRELOAD=/usr/local/corex/lib64/libcuda.so.1
export FD_DEBUG=1
python3 run_demo.py
```
run_demo.py
```python
from fastdeploy import LLM, SamplingParams
prompts = [
"Hello, my name is",
]
# 采样参数
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=256)
# 加载模型
llm = LLM(model="/home/paddle/ernie-4_5-300b-a47b-bf16-paddle", tensor_parallel_size=16, max_model_len=8192)
# 批量进行推理llm内部基于资源情况进行请求排队、动态插入处理
outputs = llm.generate(prompts, sampling_params)
# 注意将其中`/home/paddle/ernie-4_5-300b-a47b-bf16-paddle`替换为您下载的ERNIE模型的路径。
# 输出结果
for output in outputs:
prompt = output.prompt
generated_text = output.outputs.text
print(prompt, generated_text)
```
## 运行demo
执行
```bash
./run_demo.sh
```
会有如下 log 打印load 模型耗时约470sdemo 运行约90s。
```
/usr/local/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:715: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
warnings.warn(warning_message)
/usr/local/lib/python3.10/site-packages/_distutils_hack/__init__.py:31: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(
[2025-06-27 16:35:10,856] [ INFO] - Loading configuration file /home/paddle/ernie-45t/generation_config.json
/usr/local/lib/python3.10/site-packages/paddlenlp/generation/configuration_utils.py:250: UserWarning: using greedy search strategy. However, `temperature` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `decode_strategy="greedy_search" ` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/usr/local/lib/python3.10/site-packages/paddlenlp/generation/configuration_utils.py:255: UserWarning: using greedy search strategy. However, `top_p` is set to `0.8` -- this flag is only used in sample-based generation modes. You should set `decode_strategy="greedy_search" ` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
INFO 2025-06-27 16:35:12,205 2717757 engine.py[line:134] Waitting worker processes ready...
Loading Weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:05<00:00, 18.13it/s]
Loading Layers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 199.50it/s]
[2025-06-27 16:35:24,030] [ WARNING] - import EventHandle and deep_ep Failed!
[2025-06-27 16:35:24,032] [ WARNING] - import EventHandle and deep_ep Failed!
INFO 2025-06-27 16:43:02,392 2717757 engine.py[line:700] Stop profile, num_gpu_blocks: 1820
INFO 2025-06-27 16:43:02,393 2717757 engine.py[line:175] Worker processes are launched with 471.5467264652252 seconds.
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:29<00:00, 89.98s/it, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Hello, my name is Hello! It's nice to meet you. I'm here to help with questions, have conversations, or assist with whatever you need. What would you like to talk about today? 😊
```

View File

@@ -0,0 +1,226 @@
# 昆仑芯 XPU
## 要求
- OSLinux
- Python3.10
- XPU 型号P800
- XPU 驱动版本:≥ 5.0.21.10
- XPU 固件版本:≥ 1.31
已验证的平台:
- CPUINTEL(R) XEON(R) PLATINUM 8563C
- 内存2T
- 磁盘4T
- OSCentOS release 7.6 (Final)
- Python3.10
- XPU 型号P800OAM 版)
- XPU 驱动版本5.0.21.10
- XPU 固件版本1.31
**注:** 目前只验证过 INTEL 或海光 CPU OAM 版 P800 服务器,暂未验证其它 CPU 和 PCIe 版 P800 服务器。
## 1. 使用 Docker 安装(推荐)
```bash
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.0.0
```
## 2. 使用 Pip 安装
### 安装 PaddlePaddle
```bash
python -m pip install paddlepaddle-xpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/xpu-p800/
```
或者您也可以安装最新版 PaddlePaddle不推荐
```bash
python -m pip install --pre paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu-p800/
```
### 安装 FastDeploy**注意不要通过 pypi 源安装**
```bash
python -m pip install fastdeploy-xpu==2.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/xpu-p800/
```
或者你也可以安装最新版 FastDeploy不推荐
```bash
python -m pip install --pre fastdeploy-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu-p800/
```
## 3. 从源码编译安装
### 安装 PaddlePaddle
```bash
python -m pip install paddlepaddle-xpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/xpu-p800/
```
或者您也可以安装最新版 PaddlePaddle不推荐
```bash
python -m pip install --pre paddlepaddle-xpu -i https://www.paddlepaddle.org.cn/packages/nightly/xpu-p800/
```
### 下载昆仑编译套件 XTDK 和 XVLLM 预编译算子库并设置路径
```bash
# XTDK
wget https://klx-sdk-release-public.su.bcebos.com/xtdk_15fusion/dev/3.2.40.1/xtdk-llvm15-ubuntu2004_x86_64.tar.gz
tar -xvf xtdk-llvm15-ubuntu2004_x86_64.tar.gz && mv xtdk-llvm15-ubuntu2004_x86_64 xtdk
export CLANG_PATH=$(pwd)/xtdk
# XVLLM
wget https://klx-sdk-release-public.su.bcebos.com/xinfer/daily/eb/20250624/output.tar.gz
tar -xvf output.tar.gz && mv output xvllm
export XVLLM_PATH=$(pwd)/xvllm
```
或者你也可以下载最新版 XTDK 和 XVLLM不推荐
```bash
XTDK: https://klx-sdk-release-public.su.bcebos.com/xtdk_15fusion/dev/latest/xtdk-llvm15-ubuntu2004_x86_64.tar.gz
XVLLM: https://klx-sdk-release-public.su.bcebos.com/xinfer/daily/eb/latest/output.tar.gz
```
### 下载 FastDelpoy 源码,切换到稳定分支或 TAG开始编译并安装
```bash
git clone https://github.com/PaddlePaddle/FastDeploy
git checkout <tag or branch>
cd FastDeploy
bash build.sh
```
编译后的产物在 ```FastDeploy/dist``` 目录下。
## 验证是否安装成功
```python
import paddle
from paddle.jit.marker import unified
paddle.utils.run_check()
from fastdeploy.model_executor.ops.xpu import block_attn
```
如果上述步骤均执行成功,代表 FastDeploy 已安装成功。
## 快速开始
目前 P800 暂时仅验证了以下模型的部署:
- ERNIE-4.5-300B-A47B-Paddle 32K WINT48卡
- ERNIE-4.5-300B-A47B-Paddle 128K WINT48卡
### 离线推理
安装 FastDeploy 后,您可以通过如下代码,基于用户给定的输入完成离线推理生成文本。
```python
from fastdeploy import LLM, SamplingParams
prompts = [
"Where is the capital of China?",
]
# 采样参数
sampling_params = SamplingParams(top_p=0.95)
# 加载模型
llm = LLM(model="baidu/ERNIE-4.5-300B-A47B-Paddle", tensor_parallel_size=8, max_model_len=8192, quantization='wint4')
# 批量进行推理llm内部基于资源情况进行请求排队、动态插入处理
outputs = llm.generate(prompts, sampling_params)
# 输出结果
for output in outputs:
prompt = output.prompt
generated_text = output.outputs.text
print(f"Prompt: {prompt}")
print(f"Generated text: {generated_text}")
```
更多参数可以参考文档 [参数说明](../../parameters.md)。
### OpenAI 兼容服务器
您还可以通过如下命令,基于 FastDeploy 实现 OpenAI API 协议兼容的服务器部署。
#### 启动服务
**ERNIE-4.5-300B-A47B-Paddle 32K WINT48卡推荐**
```bash
python -m fastdeploy.entrypoints.openai.api_server \
--model baidu/ERNIE-4.5-300B-A47B-Paddle \
--port 8188 \
--tensor-parallel-size 8 \
--max-model-len 32768 \
--max-num-seqs 64 \
--quantization "wint4" \
--gpu-memory-utilization 0.9
```
**ERNIE-4.5-300B-A47B-Paddle 128K WINT48卡**
```bash
python -m fastdeploy.entrypoints.openai.api_server \
--model baidu/ERNIE-4.5-300B-A47B-Paddle \
--port 8188 \
--tensor-parallel-size 8 \
--max-model-len 131072 \
--max-num-seqs 64 \
--quantization "wint4" \
--gpu-memory-utilization 0.9
```
更多参数可以参考 [参数说明](../../parameters.md)。
#### 请求服务
您可以基于 OpenAI 协议,通过 curl 和 python 两种方式请求服务。
```bash
curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Where is the capital of China?"}
]
}'
```
```python
import openai
host = "0.0.0.0"
port = "8188"
client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")
response = client.completions.create(
model="null",
prompt="Where is the capital of China?",
stream=True,
)
for chunk in response:
print(chunk.choices[0].text, end='')
print('\n')
response = client.chat.completions.create(
model="null",
messages=[
{"role": "system", "content": "I'm a helpful AI assistant."},
{"role": "user", "content": "Where is the capital of China?"},
],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta:
print(chunk.choices[0].delta.content, end='')
print('\n')
```
OpenAI 协议的更多说明可参考文档 [OpenAI Chat Compeltion API](https://platform.openai.com/docs/api-reference/chat/create),以及与 OpenAI 协议的区别可以参考 [服务化部署](../../serving/README.md)。

View File

@@ -0,0 +1,87 @@
# NVIDIA CUDA GPU Installation
在环境满足如下条件前提下
- GPU驱动 >= 535
- CUDA >= 12.3
- CUDNN >= 9.5
- Python >= 3.10
- Linux X86_64
可通过如下4种方式进行安装
## 1. 预编译Docker安装(推荐)
``` shell
docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.0.0
```
## 2. 预编译Pip安装
首先安装 paddlepaddle-gpu详细安装方式参考 [PaddlePaddle安装](https://www.paddlepaddle.org.cn/en/install/quick?docurl=/documentation/docs/en/develop/install/pip/linux-pip_en.html)
``` shell
python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
```
再安装 fastdeploy**注意不要通过pypi源安装**,需要通过如下方式安装
如你的 GPU 是 SM80/90 架构(A100/H100等),按如下方式安装
```
# 安装稳定版本fastdeploy
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# 安装Nightly Build的最新版本fastdeploy
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
```
如你的 GPU 是 SM86/89 架构(4090/L20/L40等),按如下方式安装
```
# 安装稳定版本fastdeploy
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# 安装Nightly Build的最新版本fastdeploy
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-86_89/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
```
## 3. 镜像自行构建
> 注意 ```dockerfiles/Dockerfile.gpu``` 默认编译的架构支持SM 80/90如若需要支持其它架构需自行修改Dockerfile中的 ```bash build.sh 1 python false [80,90]```建议不超过2个架构。
```
git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy
docker build -f dockerfiles/Dockerfile.gpu -t fastdeploy:gpu .
```
## 4. Wheel包源码编译
首先安装 paddlepaddle-gpu详细安装方式参考 [PaddlePaddle安装](https://www.paddlepaddle.org.cn/)
``` shell
python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
```
接着克隆源代码,编译安装
``` shell
git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy
# 第1个参数: 表示是否要构建wheel包1表示打包0表示只编译
# 第2个参数: Python解释器路径
# 第3个参数: 是否编译CPU推理算子
# 第4个参数: 编译的GPU架构
bash build.sh 1 python false [80,90]
```
编译后的产物在```FastDeploy/dist```目录下。
## 环境检查
在安装 FastDeploy 后,通过如下 Python 代码检查环境的可用性
``` python
import paddle
from paddle.jit.marker import unified
# 检查GPU卡的可用性
paddle.utils.run_check()
# 检查FastDeploy自定义算子编译成功与否
from fastdeploy.model_executor.ops.gpu import beam_search_softmax
```
如上代码执行成功,则认为环境可用。