This website requires JavaScript.
Explore
Help
Sign In
apps
/
FastDeploy
Watch
1
Star
0
Fork
0
You've already forked FastDeploy
mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced
2025-12-24 13:28:13 +08:00
Code
Issues
Actions
2
Packages
Projects
Releases
Wiki
Activity
Files
c499bd9e90ddb34b8da68c9b7d71a580cbb9f208
FastDeploy
/
fastdeploy
/
spec_decode
History
freeliuzc
2d1dade5e2
[Speculative Decoding][MTP] Support static CacheKV C8 quantization and optimize memory usage (
#5155
)
...
* support static cachekv c8 quantization in mtp mode * optimize memory allocation
2025-11-21 15:10:13 +08:00
..
__init__.py
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00
base.py
[Speculative Decoding][MTP]Support attn mask offset (
#4641
)
2025-11-03 10:08:01 +08:00
mtp.py
[Speculative Decoding][MTP] Support static CacheKV C8 quantization and optimize memory usage (
#5155
)
2025-11-21 15:10:13 +08:00
ngram.py
[Executor]CUDAGraph support Speculate Decode (
#3769
)
2025-10-09 21:18:29 +08:00