mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-10-06 17:17:14 +08:00

* 第一次提交 * 补充一处漏翻译 * deleted: docs/en/quantize.md * Update one translation * Update en version * Update one translation in code * Standardize one writing * Standardize one writing * Update some en version * Fix a grammer problem * Update en version for api/vision result * Merge branch 'develop' of https://github.com/charl-u/FastDeploy into develop * Checkout the link in README in vision_results/ to the en documents * Modify a title * Add link to serving/docs/ * Finish translation of demo.md * Update english version of serving/docs/ * Update title of readme * Update some links * Modify a title * Update some links * Update en version of java android README * Modify some titles * Modify some titles * Modify some titles * modify article to document * update some english version of documents in examples * Add english version of documents in examples/visions * Sync to current branch * Add english version of documents in examples * Add english version of documents in examples * Add english version of documents in examples * Update some documents in examples * Update some documents in examples * Update some documents in examples * Update some documents in examples * Update some documents in examples * Update some documents in examples * Update some documents in examples * Update some documents in examples * Update some documents in examples
401 lines
16 KiB
Markdown
401 lines
16 KiB
Markdown
English | [简体中文](README_CN.md)
|
||
|
||
# Universal Information Extraction UIE Python Deployment Example
|
||
|
||
Before deployment, two steps need to be confirmed.
|
||
|
||
- 1. The software and hardware environment meets the requirements. Please refer to [Environment requirements for FastDeploy](../../../../docs/en/build_and_install/download_prebuilt_libraries.md).
|
||
- 2. FastDeploy Python whl pacakage needs installation. Please refer to [FastDeploy Python Installation](../../../../docs/en/build_and_install/download_prebuilt_libraries.md).
|
||
|
||
This directory provides an example that `infer.py` quickly complete CPU deployment conducted by the UIE model with OpenVINO acceleration on CPU/GPU and CPU.
|
||
|
||
## A Quick Start
|
||
```bash
|
||
|
||
# Download deployment sample code
|
||
git clone https://github.com/PaddlePaddle/FastDeploy.git
|
||
cd FastDeploy/examples/text/uie/python
|
||
|
||
# Download the UIE model file and word list. Taking the uie-base model as an example.
|
||
wget https://bj.bcebos.com/fastdeploy/models/uie/uie-base.tgz
|
||
tar -xvfz uie-base.tgz
|
||
|
||
# CPU Inference
|
||
python infer.py --model_dir uie-base --device cpu
|
||
# GPU Inference
|
||
python infer.py --model_dir uie-base --device gpu
|
||
# Use OpenVINO for inference
|
||
python infer.py --model_dir uie-base --device cpu --backend openvino --cpu_num_threads 8
|
||
```
|
||
|
||
The results after running are as follows(only the output of the NER task is captured).
|
||
```bash
|
||
1. Named Entity Recognition Task
|
||
The extraction schema: ['时间', '选手', '赛事名称']
|
||
[{'时间': {'end': 6,
|
||
'probability': 0.9857379794120789,
|
||
'start': 0,
|
||
'text': '2月8日上午'},
|
||
'赛事名称': {'end': 23,
|
||
'probability': 0.8503087162971497,
|
||
'start': 6,
|
||
'text': '北京冬奥会自由式滑雪女子大跳台决赛'},
|
||
'选手': {'end': 31,
|
||
'probability': 0.8981553912162781,
|
||
'start': 28,
|
||
'text': '谷爱凌'}}]
|
||
|
||
The extraction schema: ['肿瘤的大小', '肿瘤的个数', '肝癌级别', '脉管内癌栓分级']
|
||
[{'肝癌级别': {'end': 20,
|
||
'probability': 0.9243271350860596,
|
||
'start': 13,
|
||
'text': 'II-III级'},
|
||
'肿瘤的个数': {'end': 84,
|
||
'probability': 0.7538408041000366,
|
||
'start': 82,
|
||
'text': '1个'},
|
||
'肿瘤的大小': {'end': 100,
|
||
'probability': 0.8341134190559387,
|
||
'start': 87,
|
||
'text': '4.2×4.0×2.8cm'},
|
||
'脉管内癌栓分级': {'end': 70,
|
||
'probability': 0.9083293080329895,
|
||
'start': 67,
|
||
'text': 'M0级'}}]
|
||
......
|
||
```
|
||
|
||
### Description of command line arguments
|
||
|
||
`infer.py` supports more command line parameters than the above example. The following is a description of each command line parameter.
|
||
|
||
| Argument | Description |
|
||
|----------|--------------|
|
||
|--model_dir | The specified directory of model. |
|
||
|--batch_size | The batch size of inputs. |
|
||
|--max_length | The max length of sequence. Default to 128|
|
||
|--device | The device of runtime, choices: ['cpu', 'gpu']. Default to 'cpu' |
|
||
|--backend | The backend of runtime, choices: ['onnx_runtime', 'paddle_inference', 'openvino', 'tensorrt', 'paddle_tensorrt']. Default to 'paddle_inference'. |
|
||
|--use_fp16 | Whether to use fp16 precision to infer. It can be turned on when 'tensorrt' or 'paddle_tensorrt' backend is selected. Default to False.|
|
||
|
||
## The way to use the UIE model in each extraction task
|
||
|
||
In the UIE model, schema represents the structured information to be extracted, so the UIE model can support different information extraction tasks by setting different schemas.
|
||
|
||
### Initialize UIEModel
|
||
|
||
```python
|
||
import fastdeploy
|
||
from fastdeploy.text import UIEModel
|
||
model_dir = "uie-base"
|
||
model_path = os.path.join(model_dir, "inference.pdmodel")
|
||
param_path = os.path.join(model_dir, "inference.pdiparams")
|
||
vocab_path = os.path.join(model_dir, "vocab.txt")
|
||
|
||
runtime_option = fastdeploy.RuntimeOption()
|
||
schema = ["时间", "选手", "赛事名称"]
|
||
|
||
# Initialise UIE model
|
||
uie = UIEModel(
|
||
model_path,
|
||
param_path,
|
||
vocab_path,
|
||
position_prob=0.5,
|
||
max_length=128,
|
||
schema=schema,
|
||
runtime_option=runtime_option)
|
||
```
|
||
|
||
### Entity Extraction
|
||
|
||
The initialization stage sets the schema```["time", "player", "event name"]``` to extract the time, player and event name from the input text.
|
||
|
||
```python
|
||
>>> from pprint import pprint
|
||
>>> results = uie.predict(
|
||
["2月8日上午北京冬奥会自由式滑雪女子大跳台决赛中中国选手谷爱凌以188.25分获得金牌!"], return_dict=True)
|
||
>>> pprint(results)
|
||
|
||
# An output example
|
||
# [{'时间': {'end': 6,
|
||
# 'probability': 0.9857379794120789,
|
||
# 'start': 0,
|
||
# 'text': '2月8日上午'},
|
||
# '赛事名称': {'end': 23,
|
||
# 'probability': 0.8503087162971497,
|
||
# 'start': 6,
|
||
# 'text': '北京冬奥会自由式滑雪女子大跳台决赛'},
|
||
# '选手': {'end': 31,
|
||
# 'probability': 0.8981553912162781,
|
||
# 'start': 28,
|
||
# 'text': '谷爱凌'}}]
|
||
|
||
```
|
||
|
||
For example, if the target entity types are "肿瘤的大小", "肿瘤的个数", "肝癌级别" and "脉管内癌栓分级", the following statements can be executed.
|
||
|
||
```python
|
||
>>> schema = ["肿瘤的大小", "肿瘤的个数", "肝癌级别", "脉管内癌栓分级"]
|
||
>>> uie.set_schema(schema)
|
||
>>> results = uie.predict(
|
||
[
|
||
"(右肝肿瘤)肝细胞性肝癌(II-III级,梁索型和假腺管型),肿瘤包膜不完整,紧邻肝被膜,侵及周围肝组织,"
|
||
"未见脉管内癌栓(MVI分级:M0级)及卫星子灶形成。(肿物1个,大小4.2×4.0×2.8cm)。"
|
||
],
|
||
return_dict=True)
|
||
>>> pprint(results)
|
||
|
||
# An output example
|
||
# [{'肝癌级别': {'end': 20,
|
||
# 'probability': 0.9243271350860596,
|
||
# 'start': 13,
|
||
# 'text': 'II-III级'},
|
||
# '肿瘤的个数': {'end': 84,
|
||
# 'probability': 0.7538408041000366,
|
||
# 'start': 82,
|
||
# 'text': '1个'},
|
||
# '肿瘤的大小': {'end': 100,
|
||
# 'probability': 0.8341134190559387,
|
||
# 'start': 87,
|
||
# 'text': '4.2×4.0×2.8cm'},
|
||
# '脉管内癌栓分级': {'end': 70,
|
||
# 'probability': 0.9083293080329895,
|
||
# 'start': 67,
|
||
# 'text': 'M0级'}}]
|
||
```
|
||
|
||
|
||
### Relation Extraction
|
||
|
||
Relation Extraction (RE) refers to identifying entities from text and extracting semantic relationships between them to obtain triadic information, i.e. <subject, predicate, object>.
|
||
|
||
For example, if we take "contest name" as the extracted entity, and the relations are "主办方", "承办方" and "已举办次数", then we can execute the following statements.
|
||
```python
|
||
>>> schema = {"竞赛名称": ["主办方", "承办方", "已举办次数"]}
|
||
>>> uie.set_schema(schema)
|
||
>>> results = uie.predict(
|
||
[
|
||
"2022语言与智能技术竞赛由中国中文信息学会和中国计算机学会联合主办,百度公司、中国中文信息学会评测工作"
|
||
"委员会和中国计算机学会自然语言处理专委会承办,已连续举办4届,成为全球最热门的中文NLP赛事之一。"
|
||
],
|
||
return_dict=True)
|
||
>>> pprint(results)
|
||
|
||
# An output example
|
||
# [{'竞赛名称': {'end': 13,
|
||
# 'probability': 0.7825401425361633,
|
||
# 'relation': {'主办方': [{'end': 22,
|
||
# 'probability': 0.8421716690063477,
|
||
# 'start': 14,
|
||
# 'text': '中国中文信息学会'},
|
||
# {'end': 30,
|
||
# 'probability': 0.7580805420875549,
|
||
# 'start': 23,
|
||
# 'text': '中国计算机学会'}],
|
||
# '已举办次数': [{'end': 82,
|
||
# 'probability': 0.4671304225921631,
|
||
# 'start': 80,
|
||
# 'text': '4届'}],
|
||
# '承办方': [{'end': 39,
|
||
# 'probability': 0.8292709589004517,
|
||
# 'start': 35,
|
||
# 'text': '百度公司'},
|
||
# {'end': 55,
|
||
# 'probability': 0.7000502943992615,
|
||
# 'start': 40,
|
||
# 'text': '中国中文信息学会评测工作委员会'},
|
||
# {'end': 72,
|
||
# 'probability': 0.6193484663963318,
|
||
# 'start': 56,
|
||
# 'text': '中国计算机学会自然语言处理专委会'}]},
|
||
# 'start': 0,
|
||
# 'text': '2022语言与智能技术竞赛'}}]
|
||
```
|
||
|
||
### Event Extraction
|
||
|
||
Event Extraction (EE) refers to extracting predefined Trigger and Argument from natural language texts and combining them into structured event information.
|
||
|
||
For example, if the targets are"地震强度", "时间", "震中位置" and "引源深度" for the event "地震", we can execute the following codes.
|
||
|
||
```python
|
||
>>> schema = {"地震触发词": ["地震强度", "时间", "震中位置", "震源深度"]}
|
||
>>> uie.set_schema(schema)
|
||
>>> results = uie.predict(
|
||
[
|
||
"中国地震台网正式测定:5月16日06时08分在云南临沧市凤庆县(北纬24.34度,东经99.98度)发生3.5级地震,"
|
||
"震源深度10千米。"
|
||
],
|
||
return_dict=True)
|
||
>>> pprint(results)
|
||
|
||
# An output example
|
||
# [{'地震触发词': {'end': 58,
|
||
# 'probability': 0.9977425932884216,
|
||
# 'relation': {'地震强度': [{'end': 56,
|
||
# 'probability': 0.9980800747871399,
|
||
# 'start': 52,
|
||
# 'text': '3.5级'}],
|
||
# '时间': [{'end': 22,
|
||
# 'probability': 0.9853301644325256,
|
||
# 'start': 11,
|
||
# 'text': '5月16日06时08分'}],
|
||
# '震中位置': [{'end': 50,
|
||
# 'probability': 0.7874020934104919,
|
||
# 'start': 23,
|
||
# 'text': '云南临沧市凤庆县(北纬24.34度,东经99.98度)'}],
|
||
# '震源深度': [{'end': 67,
|
||
# 'probability': 0.9937973618507385,
|
||
# 'start': 63,
|
||
# 'text': '10千米'}]},
|
||
# 'start': 56,
|
||
# 'text': '地震'}}]
|
||
```
|
||
|
||
### Opinion Extraction
|
||
|
||
opinion extraction refers to the extraction of evaluation dimensions and opinions contained in the text.
|
||
|
||
For example, if the extraction target is the evaluation dimensions and their corresponding opinions and sentiment tendencies. We can execute the following codes:
|
||
|
||
```python
|
||
>>> schema = {"评价维度": ["观点词", "情感倾向[正向,负向]"]}
|
||
>>> uie.set_schema(schema)
|
||
>>> results = uie.predict(
|
||
["店面干净,很清静,服务员服务热情,性价比很高,发现收银台有排队"], return_dict=True)
|
||
>>> pprint(results)
|
||
|
||
# An output example
|
||
# [{'评价维度': {'end': 20,
|
||
# 'probability': 0.9817039966583252,
|
||
# 'relation': {'情感倾向[正向,负向]': [{'end': 0,
|
||
# 'probability': 0.9966142177581787,
|
||
# 'start': 0,
|
||
# 'text': '正向'}],
|
||
# '观点词': [{'end': 22,
|
||
# 'probability': 0.9573966264724731,
|
||
# 'start': 21,
|
||
# 'text': '高'}]},
|
||
# 'start': 17,
|
||
# 'text': '性价比'}}]
|
||
```
|
||
|
||
### Sentiment Classification
|
||
|
||
Sentence-level sentiment classification, i.e., determining a sentence has a "positive" sentiment or "negative" sentiment. We can execute the following codes:
|
||
|
||
```python
|
||
>>> schema = ["情感倾向[正向,负向]"]
|
||
>>> uie.set_schema(schema)
|
||
>>> results = uie.predict(["这个产品用起来真的很流畅,我非常喜欢"], return_dict=True)
|
||
>>> pprint(results)
|
||
|
||
# An output example
|
||
# [{'情感倾向[正向,负向]': {'end': 0,
|
||
# 'probability': 0.9990023970603943,
|
||
# 'start': 0,
|
||
# 'text': '正向'}}]
|
||
```
|
||
|
||
### Cross-task Extraction
|
||
|
||
For example, in a legal scenario where both entity extraction and relation extraction need to be performed. We can execute the following codes.
|
||
|
||
```python
|
||
>>> schema = ["法院", {"原告": "委托代理人"}, {"被告": "委托代理人"}]
|
||
>>> uie.set_schema(schema)
|
||
>>> results = uie.predict(
|
||
[
|
||
"北京市海淀区人民法院\n民事判决书\n(199x)建初字第xxx号\n原告:张三。\n委托代理人李四,北京市 A律师"
|
||
"事务所律师。\n被告:B公司,法定代表人王五,开发公司总经理。\n委托代理人赵六,北京市 C律师事务所律师。"
|
||
],
|
||
return_dict=True)
|
||
>>> pprint(results)
|
||
# An output example
|
||
# [{'原告': {'end': 37,
|
||
# 'probability': 0.9949813485145569,
|
||
# 'relation': {'委托代理人': [{'end': 46,
|
||
# 'probability': 0.7956855297088623,
|
||
# 'start': 44,
|
||
# 'text': '李四'}]},
|
||
# 'start': 35,
|
||
# 'text': '张三'},
|
||
# '法院': {'end': 10,
|
||
# 'probability': 0.9221072793006897,
|
||
# 'start': 0,
|
||
# 'text': '北京市海淀区人民法院'},
|
||
# '被告': {'end': 67,
|
||
# 'probability': 0.8437348008155823,
|
||
# 'relation': {'委托代理人': [{'end': 92,
|
||
# 'probability': 0.7267124652862549,
|
||
# 'start': 90,
|
||
# 'text': '赵六'}]},
|
||
# 'start': 64,
|
||
# 'text': 'B公司'}}]
|
||
```
|
||
|
||
## UIEModel Python Interface
|
||
|
||
```python
|
||
fd.text.uie.UIEModel(model_file,
|
||
params_file,
|
||
vocab_file,
|
||
position_prob=0.5,
|
||
max_length=128,
|
||
schema=[],
|
||
runtime_option=None,
|
||
model_format=ModelFormat.PADDLE,
|
||
schema_language=SchemaLanguage.ZH)
|
||
```
|
||
|
||
UIEModel loading and initialization. Among them, `model_file`, `params_file` are Paddle inference documents exported by trained models. Please refer to [Model export](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/uie/README.md#%E6%A8%A1%E5%9E%8B%E9%83%A8%E7%BD%B2). `vocab_file`refers to the vocabulary file. The vocabulary of the UIE model UIE can be downloaded in [UIE configuration file](https://github.com/PaddlePaddle/PaddleNLP/blob/5401f01af85f1c73d8017c6b3476242fce1e6d52/model_zoo/uie/utils.py).
|
||
|
||
**Parameter**
|
||
|
||
> * **model_file**(str): Model file path
|
||
> * **params_file**(str): Parameter file path
|
||
> * **vocab_file**(str): Vocabulary file
|
||
> * **position_prob**(str): Position probability. The model will output positions with probability greater than `position_prob`, default is 0.5
|
||
> * **max_length**(int): Maximized length of input text. Input text subscript exceeding `max_length` will be truncated. Default is 128
|
||
> * **schema**(list|dict): Target information for extraction tasks
|
||
> * **runtime_option**(RuntimeOption): Backend inference configuration, the default is None, i.e., the default configuration
|
||
> * **model_format**(ModelFormat): Model format, and default is Paddle format
|
||
> * **schema_language**(SchemaLanguage): Schema language, and default is ZH(Chinese). Currently supported language:ZH(Chinese),EN(English)
|
||
|
||
### set_schema Function
|
||
|
||
> ```python
|
||
> set_schema(schema)
|
||
> ```
|
||
> Set schema interface of the UIE model.
|
||
>
|
||
> **Parameter**
|
||
> > * **schema**(list|dict): Enter the data to be extracted from the text.
|
||
>
|
||
> **Return**
|
||
> Blank.
|
||
|
||
### predict Function
|
||
|
||
> ```python
|
||
> UIEModel.predict(texts, return_dict=False)
|
||
> ```
|
||
>
|
||
> Model prediction interface where input text list directly output extraction results.
|
||
>
|
||
> **Parameter**
|
||
>
|
||
> > * **texts**(list(str)): Enter the data to be extracted from the text.
|
||
> > * **return_dict**(bool): Whether to output UIE results in the form of dictionary, and default is False。
|
||
> **Return**
|
||
>
|
||
> > Return`dict(str, list(fastdeploy.text.C.UIEResult))`。
|
||
|
||
## Related Documents
|
||
|
||
[Details for UIE model](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/uie/README.md)
|
||
|
||
[How to export a UIE model](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/model_zoo/uie/README.md#%E6%A8%A1%E5%9E%8B%E9%83%A8%E7%BD%B2)
|
||
|
||
[UIE C++ deployment](../cpp/README.md)
|