Files
FastDeploy/examples/text/ernie-3.0/serving/README.md
chenjian 595ca69251 [Doc] Add doc for vdl serving (#1110)
* add doc for vdl serving

* add doc for vdl serving

* add doc for vdl serving

* fix link

* fix link

* fix gif size

* fix gif size

* add english version

* fix links

* fix links

* update format

* update docs

* update docs

* update docs

* update docs

* update docs

* update docs

---------

Co-authored-by: heliqi <1101791222@qq.com>
2023-01-30 19:22:59 +08:00

9.5 KiB
Raw Blame History

English | 简体中文

Example of ERNIE 3.0 Serving Deployment

Before serving deployment, you need to confirm

Prepare Models

Download the news classification model and the sequence labeling model of ERNIE 3.0 (if you have trained models, skip this step):

# Download and decompress the news classification model
wget https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/tnews_pruned_infer_model.zip
unzip tnews_pruned_infer_model.zip

# Move the download model to the model repository directory of classification tasks.
mv tnews_pruned_infer_model/float32.pdmodel models/ernie_seqcls_model/1/model.pdmodel
mv tnews_pruned_infer_model/float32.pdiparams models/ernie_seqcls_model/1/model.pdiparams

# Download and decompress the sequence labelling model
wget https://paddlenlp.bj.bcebos.com/models/transformers/ernie_3.0/msra_ner_pruned_infer_model.zip
unzip msra_ner_pruned_infer_model.zip

# Move the download model to the model repository directory of sequence labeling task.
mv msra_ner_pruned_infer_model/float32.pdmodel models/ernie_tokencls_model/1/model.pdmodel
mv msra_ner_pruned_infer_model/float32.pdiparams models/ernie_tokencls_model/1/model.pdiparams

After download and move, the models directory of the classification tasks is as follows:

models
├── ernie_seqcls                      # Pipeline for classification task
│   ├── 1
│   └── config.pbtxt                  # Combine pre and post processing and model inference
├── ernie_seqcls_model                # Model inference for classification task
│   ├── 1
│   │   └── model.onnx
│   └── config.pbtxt
├── ernie_seqcls_postprocess          # Post-processing of classification task
│   ├── 1
│   │   └── model.py
│   └── config.pbtxt
└── ernie_tokenizer                   # Pre-processing splitting
    ├── 1
    │   └── model.py
    └── config.pbtxt

Pull and Run Images

# x.y.z represent image versions. Please refer to the serving document to replace them with numbers
# GPU Image
docker pull registry.baidubce.com/paddlepaddle/fastdeploy:x.y.z-gpu-cuda11.4-trt8.4-21.10
# CPU Image
docker pull registry.baidubce.com/paddlepaddle/fastdeploy:x.y.z-cpu-only-21.10

# Running
docker run  -it --net=host --name fastdeploy_server --shm-size="1g" -v /path/serving/models:/models registry.baidubce.com/paddlepaddle/fastdeploy:x.y.z-cpu-only-21.10 bash

Deployment Models

The serving directory contains the configuration to start the pipeline service and the code to send the prediction request, including

models                    # Model repository needed for serving startup, containing model and service configuration files
seq_cls_rpc_client.py     # Script for sending pipeline prediction requests for news classification task
token_cls_rpc_client.py   # Script for sequence annotation task to send pipeline prediction requests

Attention:Attention: When starting the service, each python backend process of Server requests 64M memory by default, and the docker started by default cannot start more than one python backend node. There are two solutions:

  • 1.Set the shm-size parameter when starting the container, for example, docker run -it --net=host --name fastdeploy_server --shm-size="1g" -v /path/serving/models:/models registry.baidubce.com/paddlepaddle/fastdeploy:x.y.z-gpu-cuda11.4-trt8.4-21.10 bash
  • 2.Set the shm-default-byte-size parameter of python backend when starting the service. Set the default memory of python backend to 10M tritonserver --model-repository=/models --backend-config=python,shm-default-byte-size=10485760

Classification Task

Execute the following command in the container to start the service:

# Enable all models by default
fastdeployserver --model-repository=/models

# You can only enable classification task via parameters
fastdeployserver --model-repository=/models --model-control-mode=explicit --load-model=ernie_seqcls

The output is:

I1019 09:41:15.375496 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_tokenizer' version 1
I1019 09:41:15.375987 2823 model_repository_manager.cc:1022] loading: ernie_seqcls:1
I1019 09:41:15.477147 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_seqcls' version 1
I1019 09:41:15.477325 2823 server.cc:522]
...
I0613 08:59:20.577820 10021 server.cc:592]
+----------------------------+---------+--------+
| Model                      | Version | Status |
+----------------------------+---------+--------+
| ernie_seqcls               | 1       | READY  |
| ernie_seqcls_model         | 1       | READY  |
| ernie_seqcls_postprocess   | 1       | READY  |
| ernie_tokenizer            | 1       | READY  |
+----------------------------+---------+--------+
...
I0601 07:15:15.923270 8059 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
I0601 07:15:15.923604 8059 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
I0601 07:15:15.964984 8059 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002

Sequence Labelling Task

Execute the following command in the container to start the sequence labelling service:

fastdeployserver --model-repository=/models --model-control-mode=explicit --load-model=ernie_tokencls --backend-config=python,shm-default-byte-size=10485760

The output is:

I1019 09:41:15.375496 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_tokenizer' version 1
I1019 09:41:15.375987 2823 model_repository_manager.cc:1022] loading: ernie_seqcls:1
I1019 09:41:15.477147 2823 model_repository_manager.cc:1183] successfully loaded 'ernie_seqcls' version 1
I1019 09:41:15.477325 2823 server.cc:522]
...
I0613 08:59:20.577820 10021 server.cc:592]
+----------------------------+---------+--------+
| Model                      | Version | Status |
+----------------------------+---------+--------+
| ernie_tokencls             | 1       | READY  |
| ernie_tokencls_model       | 1       | READY  |
| ernie_tokencls_postprocess | 1       | READY  |
| ernie_tokenizer            | 1       | READY  |
+----------------------------+---------+--------+
...
I0601 07:15:15.923270 8059 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
I0601 07:15:15.923604 8059 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
I0601 07:15:15.964984 8059 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002

Client Requests

Client requests can execute script requests locally and in the container.

Dependencies should be installed to execute the script locally:

pip install grpcio
pip install tritonclient[all]

# If bash cannot recognize the brackets, you can use the following command to install dependencies:
pip install tritonclient\[all\]

Classification Task

Attention: The proxy need turning off when executing client requests. The ip address in the main function (the machine where you start services) should be modified as appropriate.

python seq_cls_grpc_client.py

The output is:

{'label': array([5, 9]), 'confidence': array([0.6425664 , 0.66534853], dtype=float32)}
{'label': array([4]), 'confidence': array([0.53198355], dtype=float32)}
acc: 0.5731

Sequence Labeling Task

Attention: The proxy need turning off when executing client requests. The ip address in the main function (the machine where you start services) should be modified as appropriate.

python token_cls_grpc_client.py

The output is:

input data: 北京的涮肉,重庆的火锅,成都的小吃都是极具特色的美食。
The model detects all entities:
entity: 北京   label: LOC   pos: [0, 1]
entity: 重庆   label: LOC   pos: [6, 7]
entity: 成都   label: LOC   pos: [12, 13]
input data: 原产玛雅故国的玉米,早已成为华夏大地主要粮食作物之一。
The model detects all entities:
entity: 玛雅   label: LOC   pos: [2, 3]
entity: 华夏   label: LOC   pos: [14, 15]

Configuration Modification

The current classification task (ernie_seqcls_model/config.pbtxt) is by default configured to run the OpenVINO engine on CPU; the sequence labelling task is by default configured to run the Paddle engine on GPU. If you want to run on CPU/GPU or other inference engines, you should modify the configuration. please refer to the configuration document.

Use VisualDL for serving deployment visualization

You can use VisualDL for serving deployment visualization , the above model preparation, deployment, configuration modification and client request operations can all be performed based on VisualDL.

The serving deployment of ERNIE 3.0 by VisualDL only needs the following three steps:

1. Load the model repository: ./text/ernie-3.0/serving/models
2. Download the model resource file: click the ernie_seqcls_model model, click the version number 1 to add the pre-training model, and select the text classification model ernie_3.0_ernie_seqcls_model to download. Click the ernie_tokencls_model model, click the version number 1 to add the pre-training model, and select the text classification model ernie_tokencls_model to download.
3. Start the service: Click the "launch server" button and input the launch parameters.