apps/FastDeploy

Fork 0

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2025-12-24 13:28:13 +08:00

Go to file

zhuzixuan 8a9e7b53af

CE Compile Job / ce_job_pre_check (push) Has been cancelled

Details

Deploy GitHub Pages / deploy (push) Has been cancelled

Details

CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled

Details

CE Compile Job / FD-Clone-Linux (push) Has been cancelled

Details

CE Compile Job / Show Code Archive Output (push) Has been cancelled

Details

CE Compile Job / BUILD_SM8090 (push) Has been cancelled

Details

CE Compile Job / BUILD_SM8689 (push) Has been cancelled

Details

CE Compile Job / CE_UPLOAD (push) Has been cancelled

Details

Publish Job / publish_pre_check (push) Has been cancelled

Details

Publish Job / print_publish_pre_check_outputs (push) Has been cancelled

Details

Publish Job / FD-Clone-Linux (push) Has been cancelled

Details

Publish Job / Show Code Archive Output (push) Has been cancelled

Details

Publish Job / BUILD_SM8090 (push) Has been cancelled

Details

Publish Job / BUILD_SM8689 (push) Has been cancelled

Details

Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled

Details

Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled

Details

Publish Job / Run FD Image Build (push) Has been cancelled

Details

Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled

Details

Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled

Details

Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled

Details

Publish Job / Run Base Tests (push) Has been cancelled

Details

Publish Job / Run Accuracy Tests (push) Has been cancelled

Details

Publish Job / Run Stable Tests (push) Has been cancelled

Details

CI Images Build / FD-Clone-Linux (push) Has been cancelled

Details

CI Images Build / Show Code Archive Output (push) Has been cancelled

Details

CI Images Build / CI Images Build (push) Has been cancelled

Details

CI Images Build / BUILD_SM8090 (push) Has been cancelled

Details

CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled

Details

CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled

Details

CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled

Details

CI Images Build / Run Base Tests (push) Has been cancelled

Details

CI Images Build / Run Accuracy Tests (push) Has been cancelled

Details

CI Images Build / Run Stable Tests (push) Has been cancelled

Details

CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled

Details

[Docs]Supplement the English and Chinese user documentation for Tool calling (#4895 )

* tool calling文档编写，v1.0

* tool calling文档编写，v1.0

* tool calling文档编写，v1.0

* tool calling doc，v1.1

* tool calling doc，v1.1

* tool calling doc，v1.1

* tool calling doc，v1.1

2025-11-08 20:05:14 +08:00

.github

Add instructions for copilot reviewer

2025-11-07 11:19:27 +08:00

benchmarks

[Benchmark] Enhance benchmark output logging (#4682 )

2025-11-06 16:53:31 +08:00

custom_ops

[Metax] support ERNIE-4.5-VL-28B (#4820 )

2025-11-07 04:55:49 -08:00

dockerfiles

[CI] fix docker_build error and add tag-base (#4810 )

2025-11-05 21:57:54 +08:00

docs

[Docs]Supplement the English and Chinese user documentation for Tool calling (#4895 )

2025-11-08 20:05:14 +08:00

examples/splitwise

[Feature] [PD] add simple router and refine splitwise deployment (#4709 )

2025-11-06 14:56:02 +08:00

fastdeploy

[Feature] Enable FastDeploy to support adding the “--api-key” authentication parameter. (#4806 )

2025-11-08 18:24:02 +08:00

scripts

[XPU] fix ep_tp all2all ci (#4876 )

2025-11-07 16:37:23 +08:00

tests

[Feature] Enable FastDeploy to support adding the “--api-key” authentication parameter. (#4806 )

2025-11-08 18:24:02 +08:00

tools

[CI] fix docker_build error of ciuse (#4886 )

2025-11-07 19:44:21 +08:00

.clang-format

c++ code format (#4527 )

2025-10-22 17:59:50 +08:00

.flake8

update flake8 version to support pre-commit in python3.12 (#3000 )

2025-07-24 01:43:31 -07:00

.gitignore

[Optimize]support machete weight only gemm (#3561 )

2025-08-28 09:49:58 +08:00

.gitmodules

add ignore=all for deepgemm (#4118 )

2025-09-15 21:52:00 +08:00

.pre-commit-config.yaml

c++ code format (#4527 )

2025-10-22 17:59:50 +08:00

build.sh

[Intel HPU] Support intel hpu platform (#4161 )

2025-09-24 12:27:50 +08:00

LICENSE

[LLM] First commit the llm deployment code

2025-06-09 19:20:15 +08:00

mkdocs.yml

updata mkdocs.yml (#4804 )

2025-11-04 19:30:26 +08:00

pyproject.toml

Fix target_version (#3159 )

2025-08-28 14:17:54 +08:00

README_CN.md

Update README_CN.md

2025-11-06 17:19:22 +08:00

README_EN.md

Update README_EN.md

2025-11-06 17:19:07 +08:00

README.md

[Doc] Update docs for v2.3.0rc0 (#4828 )

2025-11-05 19:45:53 +08:00

requirements_dcu.txt

[BugFix] fix workers=1 (#4364 )

2025-10-15 17:06:25 +08:00

requirements_iluvatar.txt

[Iluvatar GPU] Adapt VL model (#4313 )

2025-10-17 16:13:38 +08:00

requirements_metaxgpu.txt

[BugFix] fix workers=1 (#4364 )

2025-10-15 17:06:25 +08:00

requirements.txt

[Feature] [PD] add simple router and refine splitwise deployment (#4709 )

2025-11-06 14:56:02 +08:00

setup.py

[FastDeploy Cli] Bench Command eval and throughput (#4239 )

2025-10-10 16:17:44 +08:00

README_EN.md

English | 简体中文

Installation | Quick Start | Supported Models

FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

News

[2025-09] FastDeploy v2.2 is newly released! It now offers compatibility with models in the HuggingFace ecosystem, has further optimized performance, and newly adds support for baidu/ERNIE-21B-A3B-Thinking!

About

FastDeploy is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers production-ready, out-of-the-box deployment solutions with core acceleration technologies:

🚀 Load-Balanced PD Disaggregation: Industrial-grade solution featuring context caching and dynamic instance role switching. Optimizes resource utilization while balancing SLO compliance and throughput.
🔄 Unified KV Cache Transmission: Lightweight high-performance transport library with intelligent NVLink/RDMA selection.
🤝 OpenAI API Server and vLLM Compatible: One-command deployment with vLLM interface compatibility.
🧮 Comprehensive Quantization Format Support: W8A16, W8A8, W4A16, W4A8, W2A16, FP8, and more.
⏩ Advanced Acceleration Techniques: Speculative decoding, Multi-Token Prediction (MTP) and Chunked Prefill.
🖥️ Multi-Hardware Support: NVIDIA GPU, Kunlunxin XPU, Hygon DCU, Ascend NPU, Iluvatar GPU, Enflame GCU, MetaX GPU, Intel Gaudi etc.

Requirements

OS: Linux
Python: 3.10 ~ 3.12

Installation

FastDeploy supports inference deployment on NVIDIA GPUs, Kunlunxin XPUs, Iluvatar GPUs, Enflame GCUs, Hygon DCUs and other hardware. For detailed installation instructions:

Note: We are actively working on expanding hardware support. Additional hardware platforms including Ascend NPU are currently under development and testing. Stay tuned for updates!

Get Started

Learn how to use FastDeploy through our documentation:

Supported Models

Learn how to download models, enable using the torch format, and more:

Full Supported Models List

Advanced Usage

Acknowledgement

FastDeploy is licensed under the Apache-2.0 open-source license. During development, portions of vLLM code were referenced and incorporated to maintain interface compatibility, for which we express our gratitude.

Description

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.

android graphcore intel jetson kunlun object-detection onnx onnxruntime openvino picodet rockchip serving stable-diffusion tensorrt uie yolov5 yolov8

Readme Apache-2.0 410 MiB

Languages

Python 54.3%

C++ 24.1%

Cuda 20.6%

Shell 0.8%

C 0.1%