polish code with new pre-commit rule (#2923)

This commit is contained in:
Zero Rains
2025-07-19 23:19:27 +08:00
committed by GitHub
parent b8676d71a8
commit 25698d56d1
424 changed files with 14307 additions and 13518 deletions

View File

@@ -10,22 +10,22 @@ This project implements an efficient **Speculative Decoding** inference framewor
- **Ngram**
- **MTP (Multi-Token Prediction)**
- ✅ Supported: TP Sharding
- ✅ Supported: Shared Prefix
- ✅ Supported: TP Sharding + PD Separation
- **MTP (Multi-Token Prediction)**
- ✅ Supported: TP Sharding
- ✅ Supported: Shared Prefix
- ✅ Supported: TP Sharding + PD Separation
- ⏳ Coming Soon: EP + DP + PD Separation
- ⏳ Coming Soon: Support Chunk-prefill
- ⏳ Coming Soon: Multi-layer MTP Layer
- ⏳ Coming Soon: Multi-layer MTP Layer
---
### Coming Soon
- Draft Model
- Eagle
- Hydra
- Medusa
- Draft Model
- Eagle
- Hydra
- Medusa
- ...
---
@@ -54,7 +54,7 @@ This project implements an efficient **Speculative Decoding** inference framewor
## 🚀 Using Multi-Token Prediction (MTP)
For detailed theory, refer to:
For detailed theory, refer to:
📄 [DeepSeek-V3 Paper](https://arxiv.org/pdf/2412.19437)
### TP Sharding Mode
@@ -147,4 +147,4 @@ python -m fastdeploy.entrypoints.openai.api_server \
--config ${path_to_FastDeploy}benchmarks/yaml/eb45t-32k-wint4-mtp-h100-tp4.yaml \
--speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "model": "${mtp_model_path}"}'
```
```