essos 79f18331b6
Some checks failed
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
[CI]【Hackathon 9th Sprint No.51】NO.51 功能模块 fastdeploy/scheduler/dp_scheduler.py 单测补充 (#5046)
* update test utils

* Add comprehensive unit tests for DP scheduler functionality

- Add test_dp_scheduler.py with full-featured unit tests supporting both normal and standalone modes
- Add test_dp_scheduler_simple.py with lightweight mock-based tests for easy execution
- Add comprehensive README.md documenting test architecture and usage
- Tests cover DPLocalScheduler and DPScheduler classes with focus on:
  - Request lifecycle management and TTL support
  - Response handling and routing
  - Resource-based scheduling and constraint handling
  - Multi-threading and concurrent operations
  - Splitwise role support (prefill vs decode)
  - Error handling and edge cases
  - Thread-safe operations with proper synchronization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Remove tests/multimodal/test_utils.py

This file appears to be duplicate or misplaced, removing it to clean up the test structure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* update

* fix

* rm unused file

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-21 10:52:33 +08:00
2025-11-11 10:28:46 +08:00
2025-11-19 10:20:14 +08:00
2025-10-22 17:59:50 +08:00
2025-11-18 17:56:12 +08:00
2025-08-28 14:17:54 +08:00
2025-11-12 11:03:23 +08:00
2025-11-12 11:03:23 +08:00

English | 简体中文

PaddlePaddle%2FFastDeploy | Trendshift
Installation | Quick Start | Supported Models


FastDeploy : Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

News

[2025-11] FastDeploy v2.3 is newly released! It adds deployment support for two major models, ERNIE-4.5-VL-28B-A3B-Thinking and PaddleOCR-VL-0.9B, across multiple hardware platforms. It further optimizes comprehensive inference performance and brings more deployment features and usability enhancements. For all the upgrade details, refer to the v2.3 Release Note.

[2025-09] FastDeploy v2.2: It now offers compatibility with models in the HuggingFace ecosystem, has further optimized performance, and newly adds support for baidu/ERNIE-21B-A3B-Thinking!

About

FastDeploy is an inference and deployment toolkit for large language models and visual language models based on PaddlePaddle. It delivers production-ready, out-of-the-box deployment solutions with core acceleration technologies:

  • 🚀 Load-Balanced PD Disaggregation: Industrial-grade solution featuring context caching and dynamic instance role switching. Optimizes resource utilization while balancing SLO compliance and throughput.
  • 🔄 Unified KV Cache Transmission: Lightweight high-performance transport library with intelligent NVLink/RDMA selection.
  • 🤝 OpenAI API Server and vLLM Compatible: One-command deployment with vLLM interface compatibility.
  • 🧮 Comprehensive Quantization Format Support: W8A16, W8A8, W4A16, W4A8, W2A16, FP8, and more.
  • Advanced Acceleration Techniques: Speculative decoding, Multi-Token Prediction (MTP) and Chunked Prefill.
  • 🖥️ Multi-Hardware Support: NVIDIA GPU, Kunlunxin XPU, Hygon DCU, Iluvatar GPU, Enflame GCU, MetaX GPU, Intel Gaudi etc.

Requirements

  • OS: Linux
  • Python: 3.10 ~ 3.12

Installation

FastDeploy supports inference deployment on NVIDIA GPUs, Kunlunxin XPUs, Iluvatar GPUs, Enflame GCUs, Hygon DCUs and other hardware. For detailed installation instructions:

Get Started

Learn how to use FastDeploy through our documentation:

Supported Models

Learn how to download models, enable using the torch format, and more:

Advanced Usage

Acknowledgement

FastDeploy is licensed under the Apache-2.0 open-source license. During development, portions of vLLM code were referenced and incorporated to maintain interface compatibility, for which we express our gratitude.

Description
️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
Readme Apache-2.0 410 MiB
Languages
Python 54.3%
C++ 24.1%
Cuda 20.6%
Shell 0.8%
C 0.1%