bukejiyu
3200a80de3
[v1 loader]support fp8 ( #3593 )
...
* support fp8
* update ci
2025-08-26 02:42:46 -07:00
Yuan Xiaolan
5f56d289a7
fix is_permuted ( #3098 )
...
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:58:05 +08:00
Yuan Xiaolan
3214fb5393
support model loading for w4a8 offline quant ( #3064 )
...
支持W4A8 EP 对离线量化权重的load
2025-07-29 21:54:37 +08:00
lizhenyun01
29c3292f02
support c4 attn && fix cache
2025-07-24 12:00:52 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
Yuanle Liu
240bdac2a4
[feat] support fa3 backend for pd disaggregated ( #2695 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* support fa3 backend run in pd disaggregated
* delete use_fast_ffn
2025-07-03 22:33:27 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00