cmcamdy
9f4977eb74
[xpu] support mtp for xpu(mix) ( #5274 )
...
* [XPU] support kernel for mtp(base)
* [XPU] support kernel for mtp(base)
* format
* format
* format
* fix gather next token
* fix step && add test
* fix
* mv pre/post process
* add adjust batch / gather next token for mtp
* fix code style
* fix mtp kenrel name
* fix mtp kernel test
* mv xpu pre/post process
* mv xpu pre/post process
* [xpu] support mtp
* fix code style
2025-12-01 11:03:14 +08:00
kevin
8aec3acc8c
fix mm type bug ( #5300 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-11-29 20:48:14 +08:00
kevin
048ca60013
fix aksk check bug
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-29 09:14:28 +08:00
fmiao2372
2c7683d551
[Intel HPU] change MoE weights and scales from list to tensor and add… ( #5289 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [Intel HPU] change MoE weights and scales from list to tensor and add q/k rms norm
* update doc
* move HPU_CHUNK_SIZE into envs
2025-11-28 19:17:05 +08:00
Yonghua Li
a535050b11
[FDConfig] remove engine client args, use fd_config instead ( #5217 )
...
* [refactor] remove engine client args, use fd_config instead
* [chore] update
* [fix] fix
* [fix] fix
* [chore] rename config to fd_config
* [fix] fix run_batch
* [ci] add ci case for engine client
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-11-28 01:20:54 -08:00
周周周
73886204d4
[Others] clean code ( #5235 )
2025-11-28 15:40:49 +08:00
kevin
2d69d91ab8
add aksk check ( #5273 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-28 14:28:24 +08:00
bukejiyu
1539fd6056
[BugFix]Set default OMP_NUM_THREADS=3 and fix extra GPU memory usage in DeepSeek ( #5219 )
...
* fix bug
* update
* update
* update
* fix copy
* update
2025-11-28 14:22:04 +08:00
Daci
7dc06cac6e
[BugFix] race condition [is_fetching] causing multiple fetch requests ( #5238 )
...
* RouterArgs port str -> int
* fix race condition [is_fetching] causing multiple fetch requests
* bugfix: Delete duplicate input_ids tensor creation
2025-11-28 13:41:36 +08:00
Yuanle Liu
35479b691f
[BugFix] fix tsp o_proj bias add ( #5284 )
...
* fix tsp bias add
* fix
* fix
2025-11-28 13:39:55 +08:00
lizhenyun01
aba4fc657f
[Feature] support flash_mask_attention backend ( #5134 )
...
* [Feature] suppert flash_mask_attention backend
* fix unittest
* clean code
2025-11-28 10:12:16 +08:00
chen
35f85baf09
[BugFix]fix v1 loader lm head fp32 ( #5270 )
2025-11-27 20:12:56 +08:00
cmcamdy
5a67a6d960
[XPU] support kernel for mtp(base) ( #4748 )
...
* [XPU] support kernel for mtp(base)
* [XPU] support kernel for mtp(base)
* format
* format
* format
* fix gather next token
* fix step && add test
* fix
* mv pre/post process
* add adjust batch / gather next token for mtp
* fix code style
* fix mtp kenrel name
* fix mtp kernel test
* mv xpu pre/post process
* mv xpu pre/post process
2025-11-27 15:05:44 +08:00
fl0w2o48
e63d715fc3
[BugFix][Metrics] Fix Prometheus Multiprocess Metrics Issues and Add ZMQ Communication Metrics ( #5185 )
...
* [Feature] add metrics for ZMQ and fix multiprocess metrics
* fix test_metrics.py
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-11-27 15:05:09 +08:00
Yuanle Liu
ef5aa5c03b
[BugFix] fix cuda-python requirement ( #5261 )
...
* fix cuda-python requirement
* update
* fix
2025-11-27 13:58:18 +08:00
GoldPancake
cfc5b0ccf9
[BugFix] fix mtp logprob bugs in chunk prefill ( #5244 )
...
* fix mtp logprob bugs in chunk prefill
* fix
* fix
2025-11-27 11:31:29 +08:00
SunLei
c424e08dc5
[Speculative Decoding] split draft_tokens into standalone post-processing path ( #5205 )
...
* refactor(mtp): split draft_tokens into standalone post-processing path for MTP + logprobs
* Restore Request.__repr__ implementation
* ci
* add envs
* fix unittest
2025-11-27 11:22:41 +08:00
Yuanle Liu
cb56d46694
[Optimization] Refine row parallel bias and nranks and moe all_reduce ( #5247 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* rename nranks to tp_size and fix bias in v1 loader
* fix
* update
2025-11-26 05:09:09 -08:00
kevin
bf30f45738
[BugFix] fix vl performance bug ( #5181 )
...
* fix vl performance bug
* update code
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-11-26 21:06:52 +08:00
chen
209970836e
[BugFix] BF16 MoE Cutlass Backend Support EP ( #5242 )
2025-11-26 19:16:22 +08:00
freeliuzc
ba915e03e1
[BugFix]Fix attention mask bug in D-Node of PD-split mode ( #5245 )
2025-11-26 17:56:28 +08:00
xiaoxiaohehe001
61fc368066
[Fix] fix eplb noaux ( #5239 )
...
* fix eplb noaux
* fix eplb noaux
2025-11-26 17:50:51 +08:00
kxz2002
bc118c3d2d
fix prompt_token_ids is None in request dict ( #5241 )
2025-11-26 17:10:45 +08:00
chen
00d0ef5134
check ( #5237 )
2025-11-26 17:07:26 +08:00
freeliuzc
214942e1ae
fix kernel output extract ( #5208 )
2025-11-26 16:48:42 +08:00
Yonghua Li
cead6b26fa
[Metrics] Update time_to_first_token to include tokenization & queue time, and remove redundant metrics ( #4993 )
...
* [update] update time_to_first_tokens to include queue time, and remove first_token_latency and infer_latency
* [doc] update docs
* [ci] fix test
* [chore] delete redundant code
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-11-26 14:42:17 +08:00
Daci
f25ee3a26f
[Feature] enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1 ( #5140 )
...
* enable guided decoding ENABLE_V1_KVCACHE_SCHEDULER = 1
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-11-26 10:22:35 +08:00
kxz2002
2d787590c4
[Feature] The 45VL supports prompt_token_ids + messages input. ( #5148 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support prompt_token_ids + messages
* fix bug
* refact code structure
* support cache mm items
* refact code structure
* delete test cases
* modify unit test
* add unit test
* add unit test
* fix append
* add check for messages
2025-11-25 23:11:44 +08:00
Yuanle Liu
66e096d509
[FDConfig] disable use_sequence_parallel_moe default ( #5222
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* disable use_sequence_parallel_moe default
* update
2025-11-25 21:49:10 +08:00
kevin
df2be1cf16
[BugFix] fix mm_positions type error ( #5182 )
...
* fix mm_positions type error
* update code
* update code
* update code
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-11-25 19:28:18 +08:00
Yonghua Li
09379183e2
[BugFix] fix work metrics not returned by metrics api ( #4912 )
...
* [BugFix] fix work metrics not returned by metrics api
* [fix] fix conflict
* [fix] fix ci
2025-11-25 19:12:29 +08:00
freeliuzc
5c8c2d47eb
[Speculative Decoding][MTP]Update extract_mtp_weight script and optimize config ( #5183 )
...
* update extract_mtp_model
* modify config usage
2025-11-25 14:09:03 +08:00
chenjian
09b47c7111
[Bug fix] Send first token in D instance ( #5199 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* [Bug fix] Send first token in D instance
* fix
2025-11-24 23:42:20 +08:00
Yuanle Liu
f69e0839f7
dummy import fd ( #5192 )
2025-11-24 20:23:07 +08:00
kevin
8e4e3ff510
[Feature] support eplb in api_server ( #4782 )
...
* support eplb in api_server
* update code
* add eplb test case
* update eplb
* support tp+dp eplb
* update test cese
* update code
* update code
* fix bug
* update copilot review
* update test case name
2025-11-24 20:22:29 +08:00
xiaozude
d5bd64336a
[Metax] support ENABLE_V1_KVCACHE_SCHEDULER ( #5163 )
2025-11-24 19:19:49 +08:00
xiaoxiaohehe001
e150a418d4
support moe offline quant ( #5142 )
2025-11-24 18:59:18 +08:00
Juncai
af03da5127
[BugFix] fix release block ids ( #5184 )
...
* fix release block ids
* up
2025-11-24 16:48:09 +08:00
xiaoxiaohehe001
95f3c8c641
[Fix] Fix eplb bug and support fp8 load weight ( #5178 )
...
* fix eplb part2
* fix eplb part2
* fix eplb part2
2025-11-24 15:31:37 +08:00
kevin
cceaba1c8d
[Feature] remove to_numpy ( #5162 )
...
* remove to_numpy
* update code
* update name
* update code
* update code
* update code
2025-11-21 21:54:26 +08:00
kevin
c068a4f642
[Feature] dyc8 support prefixcache ( #5125 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* dyc8 support prefixcache
* fix cache_trans test case
* update code
2025-11-21 19:46:26 +08:00
GoldPancake
ab3a2e45ff
fix mtp reschedule ( #5165 )
2025-11-21 19:31:35 +08:00
chenjian
3ea1b44a58
[Optimization] Improve perf for fd response token with internal adapter ( #4992 )
...
* [Optimize] Improve perf for fd response token with internal adapter
* fix
* fix bug
* fix ci
* fix ci
* fix ci
* fix ci
2025-11-21 19:02:03 +08:00
Yuanle Liu
5bcf79d780
[BugFix] fix num of rdma_comm_ports check ( #5168 )
...
* fix num of rdma_comm_ports check
* update
* update
* update
2025-11-21 18:31:14 +08:00
Jiang-Jia-Jun
d2298dcb0c
[Polish] Simplify __repr__ method in Request class ( #5153 )
...
Remove detailed string representation for Request class.
2025-11-21 17:21:06 +08:00
xiaoxiaohehe001
6471dade4a
[Fix] Fix noaux ep test ( #5161 )
...
* support noaux eplb
* noaux_eplb
* noaux_eplb
* noaux_eplb
* noaux_eplb
2025-11-21 16:36:41 +08:00
Juncai
f9b0545a7f
[PD Disaggregation] [Refine] Refine splitwise deployment ( #5151 )
...
* Refine splitwise deployment
* up
2025-11-21 15:30:24 +08:00
freeliuzc
2d1dade5e2
[Speculative Decoding][MTP] Support static CacheKV C8 quantization and optimize memory usage ( #5155 )
...
* support static cachekv c8 quantization in mtp mode
* optimize memory allocation
2025-11-21 15:10:13 +08:00
lizhenyun01
3c36283d7d
[ENV] support AK SK ENCPOINT while get the multi_modal's feature ( #5159 )
2025-11-20 23:07:57 -08:00
bukejiyu
34f59d9800
[RL]Fix missing is_distributed attribute ( #5150 )
...
* fix
* update
2025-11-21 14:14:25 +08:00