English | 简体中文
PaddleSeg Serving Deployment Demo
The PaddleSeg serving deployment Demo is built with FastDeploy Serving. FastDeploy Serving is a service-oriented deployment framework suitable for high-concurrency and high-throughput requests encapsulated based on the Triton Inference Server framework. It is a complete and high-performance service-oriented deployment framework that can be used in actual production. If you don’t need high-concurrency and high-throughput scenarios, and just want to quickly test the feasibility of online deployment of the model, please refer to fastdeploy_serving
Environment
Before serving deployment, it is necessary to confirm the hardware and software environment requirements of the service image and the image pull command, please refer to FastDeploy service deployment
Launch Serving
# Download demo code
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd FastDeploy/examples/vision/segmentation/paddleseg/serving
#Download PP_LiteSeg model file
wget  https://bj.bcebos.com/paddlehub/fastdeploy/PP_LiteSeg_B_STDC2_cityscapes_with_argmax_infer.tgz
tar -xvf PP_LiteSeg_B_STDC2_cityscapes_with_argmax_infer.tgz
# Move the model files to models/infer/1
mv yolov5s.onnx models/infer/1/
# Pull fastdeploy image, x.y.z is FastDeploy version, example 1.0.2.
docker pull paddlepaddle/fastdeploy:x.y.z-gpu-cuda11.4-trt8.4-21.10
# Run the docker. The docker name is fd_serving, and the current directory is mounted as the docker's /serving directory
nvidia-docker run -it --net=host --name fd_serving -v `pwd`/:/serving paddlepaddle/fastdeploy:x.y.z-gpu-cuda11.4-trt8.4-21.10  bash
# Start the service (Without setting the CUDA_VISIBLE_DEVICES environment variable, it will have scheduling privileges for all GPU cards)
CUDA_VISIBLE_DEVICES=0 fastdeployserver --model-repository=/serving/models --backend-config=python,shm-default-byte-size=10485760
Output the following contents if serving is launched
......
I0928 04:51:15.784517 206 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
I0928 04:51:15.785177 206 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
I0928 04:51:15.826578 206 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002
Client Requests
Execute the following command in the physical machine to send a grpc request and output the result
#Download test images
wget https://paddleseg.bj.bcebos.com/dygraph/demo/cityscapes_demo.png
#Installing client-side dependencies
python3 -m pip install tritonclient\[all\]
# Send requests
python3 paddleseg_grpc_client.py
When the request is sent successfully, the results are returned in json format and printed out:
Modify Configs
The default is to run ONNXRuntime on CPU. If developers need to run it on GPU or other inference engines, please see the  Configs File to modify the configs in models/runtime/config.pbtxt.
