[Model] Add text classification task for ernie-3.0 (#430)

* move text_cls to ernie-3.0 * Add main page of ernie-3.0 * rename infer -> seq_cls_infer * Fix the links * Add ernie-3.0 python, cpp readme * Fix some cpp readme * Add fastdeploy::FDERROR * Add python readme for ernie-3.0 * update README.md * Add empty line * update readme * Fix readme * remove the - from ernie 3.0 * ernie-3.0 -> ernie 3.0 * Use AutoTokenizer to tokenize * Ernie -> ERNIE
2025-10-17 22:21:48 +08:00 · 2022-11-08 10:54:59 +08:00
parent bdf40e9da5
commit 8fd61e3634
11 changed files with 738 additions and 3 deletions
--- a/examples/text/ernie-3.0/python/README.md
+++ b/examples/text/ernie-3.0/python/README.md
@@ -0,0 +1,71 @@
+# ERNIE 3.0 模型Python部署示例
+
+在部署前，需确认以下两个步骤
+
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
+- 2. FastDeploy Python whl包安装，参考[FastDeploy Python安装](../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
+
+本目录下提供`seq_cls_infer.py`快速完成在CPU/GPU的文本分类任务的部署示例。
+
+## 依赖安装
+
+本项目提供的Python版本的预测器Predictor基于PaddleNLP提供的AutoTokenizer进行分词，并利用fast_tokenizer加速分词, 执行以下命令进行安装。
+
+```bash
+pip install -r requirements.txt
+```
+
+
+## 文本分类任务
+
+### 快速开始
+
+以下示例展示如何基于FastDeploy库完成ERNIE 3.0 Medium模型在CLUE Benchmark 的[AFQMC数据集](https://bj.bcebos.com/paddlenlp/datasets/afqmc_public.zip)上进行文本分类任务的Python预测部署。
+
+```bash
+
+# 下载部署示例代码
+git clone https://github.com/PaddlePaddle/FastDeploy.git
+cd  FastDeploy/examples/text/ernie-3.0/python
+
+# 下载AFQMC数据集的微调后的ERNIE 3.0模型
+wget https://bj.bcebos.com/fastdeploy/models/ernie-3.0/ernie-3.0-medium-zh-afqmc.tgz
+tar xvfz ernie-3.0-medium-zh-afqmc.tgz
+
+# CPU 推理
+python seq_cls_infer.py --device cpu --model_dir ernie-3.0-medium-zh-afqmc
+
+# GPU 推理
+python seq_cls_infer.py --device gpu --model_dir ernie-3.0-medium-zh-afqmc
+
+```
+
+运行完成后返回的结果如下：
+
+```bash
+[INFO] fastdeploy/runtime.cc(469)::Init	Runtime initialized with Backend::ORT in Device::CPU.
+Batch id:0, example id:0, sentence1:花呗收款额度限制, sentence2:收钱码，对花呗支付的金额有限制吗, label:1, similarity:0.5819
+Batch id:1, example id:0, sentence1:花呗支持高铁票支付吗, sentence2:为什么友付宝不支持花呗付款, label:0, similarity:0.9979
+```
+
+### 参数说明
+
+`seq_cls_infer.py` 除了以上示例的命令行参数，还支持更多命令行参数的设置。以下为各命令行参数的说明。
+
+| 参数 |参数说明 |
+|----------|--------------|
+|--model_dir | 指定部署模型的目录， |
+|--batch_size |最大可测的 batch size，默认为 1|
+|--max_length |最大序列长度，默认为 128|
+|--device | 运行的设备，可选范围: ['cpu', 'gpu']，默认为'cpu' |
+|--backend | 支持的推理后端，可选范围: ['onnx_runtime', 'paddle', 'openvino', 'tensorrt', 'paddle_tensorrt']，默认为'onnx_runtime' |
+|--use_fp16 | 是否使用FP16模式进行推理。使用tensorrt和paddle_tensorrt后端时可开启，默认为False |
+|--use_fast| 是否使用FastTokenizer加速分词阶段。默认为True|
+
+## 相关文档
+
+[ERNIE 3.0模型详细介绍](https://github.com/PaddlePaddle/PaddleNLP/tree/release/2.4/model_zoo/ernie-3.0)
+
+[ERNIE 3.0模型导出方法](https://github.com/PaddlePaddle/PaddleNLP/tree/release/2.4/model_zoo/ernie-3.0)
+
+[ERNIE 3.0模型C++部署方法](../cpp/README.md)
--- a/examples/text/ernie-3.0/python/requirements.txt
+++ b/examples/text/ernie-3.0/python/requirements.txt
@@ -0,0 +1,2 @@
+faster_toeknizer
+paddlenlp
--- a/examples/text/ernie-3.0/python/seq_cls_infer.py
+++ b/examples/text/ernie-3.0/python/seq_cls_infer.py
@@ -0,0 +1,182 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import distutils.util
+
+import numpy as np
+import faster_tokenizer
+from paddlenlp.transformers import AutoTokenizer
+import fastdeploy as fd
+
+
+def parse_arguments():
+    import argparse
+    import ast
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model_dir", required=True, help="The directory of model.")
+    parser.add_argument(
+        "--vocab_path",
+        type=str,
+        default="",
+        help="The path of tokenizer vocab.")
+    parser.add_argument(
+        "--device",
+        type=str,
+        default='cpu',
+        choices=['gpu', 'cpu'],
+        help="Type of inference device, support 'cpu' or 'gpu'.")
+    parser.add_argument(
+        "--backend",
+        type=str,
+        default='onnx_runtime',
+        choices=[
+            'onnx_runtime', 'paddle', 'openvino', 'tensorrt', 'paddle_tensorrt'
+        ],
+        help="The inference runtime backend.")
+    parser.add_argument(
+        "--batch_size", type=int, default=1, help="The batch size of data.")
+    parser.add_argument(
+        "--max_length",
+        type=int,
+        default=128,
+        help="The max length of sequence.")
+    parser.add_argument(
+        "--log_interval",
+        type=int,
+        default=10,
+        help="The interval of logging.")
+    parser.add_argument(
+        "--use_fp16",
+        type=distutils.util.strtobool,
+        default=False,
+        help="Wheter to use FP16 mode")
+    parser.add_argument(
+        "--use_fast",
+        type=distutils.util.strtobool,
+        default=False,
+        help="Whether to use fast_tokenizer to accelarate the tokenization.")
+    return parser.parse_args()
+
+
+def batchfy_text(texts, batch_size):
+    batch_texts = []
+    batch_start = 0
+    while batch_start < len(texts):
+        batch_texts += [
+            texts[batch_start:min(batch_start + batch_size, len(texts))]
+        ]
+        batch_start += batch_size
+    return batch_texts
+
+
+class ErnieForSequenceClassificationPredictor(object):
+    def __init__(self, args):
+        self.tokenizer = AutoTokenizer.from_pretrained(
+            'ernie-3.0-medium-zh', use_faster=args.use_fast)
+        self.runtime = self.create_fd_runtime(args)
+        self.batch_size = args.batch_size
+        self.max_length = args.max_length
+
+    def create_fd_runtime(self, args):
+        option = fd.RuntimeOption()
+        model_path = os.path.join(args.model_dir, "infer.pdmodel")
+        params_path = os.path.join(args.model_dir, "infer.pdiparams")
+        option.set_model_path(model_path, params_path)
+        if args.device == 'cpu':
+            option.use_cpu()
+        else:
+            option.use_gpu()
+        if args.backend == 'paddle':
+            option.use_paddle_backend()
+        elif args.backend == 'onnx_runtime':
+            option.use_ort_backend()
+        elif args.backend == 'openvino':
+            option.use_openvino_backend()
+        else:
+            option.use_trt_backend()
+            if args.backend == 'paddle_tensorrt':
+                option.enable_paddle_to_trt()
+                option.enable_paddle_trt_collect_shape()
+            trt_file = os.path.join(args.model_dir, "infer.trt")
+            option.set_trt_input_shape(
+                'input_ids',
+                min_shape=[1, args.max_length],
+                opt_shape=[args.batch_size, args.max_length],
+                max_shape=[args.batch_size, args.max_length])
+            option.set_trt_input_shape(
+                'token_type_ids',
+                min_shape=[1, args.max_length],
+                opt_shape=[args.batch_size, args.max_length],
+                max_shape=[args.batch_size, args.max_length])
+            if args.use_fp16:
+                option.enable_trt_fp16()
+                trt_file = trt_file + ".fp16"
+            option.set_trt_cache_file(trt_file)
+        return fd.Runtime(option)
+
+    def preprocess(self, texts, texts_pair):
+        data = self.tokenizer(
+            texts,
+            texts_pair,
+            max_length=self.max_length,
+            padding=True,
+            truncation=True)
+        input_ids_name = self.runtime.get_input_info(0).name
+        token_type_ids_name = self.runtime.get_input_info(1).name
+        input_map = {
+            input_ids_name: np.array(
+                data["input_ids"], dtype="int64"),
+            token_type_ids_name: np.array(
+                data["token_type_ids"], dtype="int64")
+        }
+        return input_map
+
+    def infer(self, input_map):
+        results = self.runtime.infer(input_map)
+        return results
+
+    def postprocess(self, infer_data):
+        logits = np.array(infer_data[0])
+        max_value = np.max(logits, axis=1, keepdims=True)
+        exp_data = np.exp(logits - max_value)
+        probs = exp_data / np.sum(exp_data, axis=1, keepdims=True)
+        out_dict = {
+            "label": probs.argmax(axis=-1),
+            "confidence": probs.max(axis=-1)
+        }
+        return out_dict
+
+    def predict(self, texts, texts_pair=None):
+        input_map = self.preprocess(texts, texts_pair)
+        infer_result = self.infer(input_map)
+        output = self.postprocess(infer_result)
+        return output
+
+
+if __name__ == "__main__":
+    args = parse_arguments()
+    predictor = ErnieForSequenceClassificationPredictor(args)
+    texts_ds = ["花呗收款额度限制", "花呗支持高铁票支付吗"]
+    texts_pair_ds = ["收钱码，对花呗支付的金额有限制吗", "为什么友付宝不支持花呗付款"]
+    batch_texts = batchfy_text(texts_ds, args.batch_size)
+    batch_texts_pair = batchfy_text(texts_pair_ds, args.batch_size)
+
+    for bs, (texts,
+             texts_pair) in enumerate(zip(batch_texts, batch_texts_pair)):
+        outputs = predictor.predict(texts, texts_pair)
+        for i, (sentence1, sentence2) in enumerate(zip(texts, texts_pair)):
+            print(
+                f"Batch id:{bs}, example id:{i}, sentence1:{sentence1}, sentence2:{sentence2}, label:{outputs['label'][i]}, similarity:{outputs['confidence'][i]:.4f}"
+            )