mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2025-12-24 13:28:13 +08:00
polish code with new pre-commit rule (#2923)
This commit is contained in:
@@ -10,22 +10,22 @@ This project implements an efficient **Speculative Decoding** inference framewor
|
||||
|
||||
- **Ngram**
|
||||
|
||||
- **MTP (Multi-Token Prediction)**
|
||||
- ✅ Supported: TP Sharding
|
||||
- ✅ Supported: Shared Prefix
|
||||
- ✅ Supported: TP Sharding + PD Separation
|
||||
- **MTP (Multi-Token Prediction)**
|
||||
- ✅ Supported: TP Sharding
|
||||
- ✅ Supported: Shared Prefix
|
||||
- ✅ Supported: TP Sharding + PD Separation
|
||||
- ⏳ Coming Soon: EP + DP + PD Separation
|
||||
- ⏳ Coming Soon: Support Chunk-prefill
|
||||
- ⏳ Coming Soon: Multi-layer MTP Layer
|
||||
- ⏳ Coming Soon: Multi-layer MTP Layer
|
||||
|
||||
---
|
||||
|
||||
### Coming Soon
|
||||
|
||||
- Draft Model
|
||||
- Eagle
|
||||
- Hydra
|
||||
- Medusa
|
||||
- Draft Model
|
||||
- Eagle
|
||||
- Hydra
|
||||
- Medusa
|
||||
- ...
|
||||
|
||||
---
|
||||
@@ -54,7 +54,7 @@ This project implements an efficient **Speculative Decoding** inference framewor
|
||||
|
||||
## 🚀 Using Multi-Token Prediction (MTP)
|
||||
|
||||
For detailed theory, refer to:
|
||||
For detailed theory, refer to:
|
||||
📄 [DeepSeek-V3 Paper](https://arxiv.org/pdf/2412.19437)
|
||||
|
||||
### TP Sharding Mode
|
||||
@@ -147,4 +147,4 @@ python -m fastdeploy.entrypoints.openai.api_server \
|
||||
--config ${path_to_FastDeploy}benchmarks/yaml/eb45t-32k-wint4-mtp-h100-tp4.yaml \
|
||||
--speculative-config '{"method": "mtp", "num_speculative_tokens": 1, "model": "${mtp_model_path}"}'
|
||||
|
||||
```
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user